Flowise Self-Host Guide: Assistants, Document Processing Pipelines, Evaluations, and Stack Integrations

Flowise Self-Host Guide: Assistants, Document Processing Pipelines, Evaluations, and Stack Integrations

The first two guides in this series covered basic Flowise deployment and RAG chatbots, then production hardening with PostgreSQL, custom nodes, and API security. This third guide goes deeper into Flowise's more powerful features: Assistants with persistent memory and tool use, document processing pipelines that handle ingestion automatically, systematic flow evaluation so you know when RAG quality degrades, and integrating Flowise as a core component of your broader self-hosted AI stack alongside n8n and Ollama.

Prerequisites

A production-hardened Flowise instance with PostgreSQL — see our production hardening guide
Flowise version 1.8+ — Assistants and evaluation features require recent releases
Ollama installed and running (for local model integration covered in this guide)
n8n deployed (for workflow automation integration)
At least 4GB RAM — Assistants with persistent threads use more memory than stateless chatflows
Qdrant or pgvector for persistent vector storage (required for Assistants memory)

Verify your stack is ready:

# Check Flowise version
docker exec flowise node -e \
  "const pkg = require('./node_modules/flowise/package.json'); console.log(pkg.version);"

# Verify Ollama is reachable from Flowise container
docker exec flowise curl -s http://172.17.0.1:11434/api/tags | jq '[.models[].name]'

# Check PostgreSQL is being used (not SQLite)
docker exec flowise env | grep DATABASE_TYPE
# Should return: DATABASE_TYPE=postgres

# Verify Qdrant is running (if used for Assistants memory):
curl http://localhost:6333/collections | jq '.result.collections[].name'

Flowise Assistants: Persistent Memory and Tool-Using Agents

Flowise Assistants are different from regular chatflows. Where a chatflow is stateless (or uses basic buffer memory within a session), an Assistant maintains persistent threads — conversation history, memory, and context that survive across sessions and can be resumed days or weeks later. This makes Assistants the right choice for personal AI assistants, long-running project support agents, and customer service bots that remember previous interactions.

Creating Your First Assistant

Go to Assistants → Add New. An Assistant combines an LLM, system instructions, a set of tools, and optionally a file storage backend for knowledge retrieval. The key configuration decisions:

Model — choose a model with strong instruction-following: GPT-4o or Claude 3.5 Sonnet for cloud, or Llama 3.1:70B via Ollama for on-premise
Instructions — the system prompt. More specific than a chatflow prompt — define the assistant's persona, capabilities, limitations, and exactly how it should handle edge cases
Tools — the actions the assistant can take: code interpreter, retrieval, web search, custom API tools
Thread storage — how conversation threads are persisted (database, in-memory)

Configuring Persistent Thread Storage

# Flowise Assistants store thread history in PostgreSQL when DATABASE_TYPE=postgres
# Each unique sessionId gets its own thread with full history

# The thread storage schema in PostgreSQL:
# Table: chat_message — stores all messages per session
# Table: chat_flow — stores flow configurations

# Query active threads and their message counts:
docker exec flowise_db psql -U flowise flowise -c "
  SELECT
    session_id,
    COUNT(*) as message_count,
    MIN(created_date) as first_message,
    MAX(created_date) as last_message
  FROM chat_message
  WHERE chat_flow_id = 'YOUR_ASSISTANT_CHATFLOW_ID'
  GROUP BY session_id
  ORDER BY last_message DESC
  LIMIT 20;"

# When calling the Assistant API, pass a consistent sessionId to maintain thread:
curl -X POST https://flowise.yourdomain.com/api/v1/prediction/YOUR_ASSISTANT_ID \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "Can you remind me what we decided about the database schema?",
    "overrideConfig": {
      "sessionId": "user-12345-project-alpha"  // Unique per user+context
    }
  }' | jq .text

# The assistant retrieves conversation history for this sessionId
# and responds with full context of previous messages

Building a Multi-Tool Assistant

# Assistant configuration with multiple tools (configure in Flowise UI or export JSON):

# Tool 1: Calculator (built-in)
# Tool 2: Web Browser / SearXNG search
# Tool 3: Custom CRM lookup (from custom node)
# Tool 4: Code execution (via Code Interpreter node)

# System instructions template for a multi-tool assistant:
SYSTEM_PROMPT="""
You are a senior technical assistant for the engineering team at Acme Corp.

## Capabilities
- Answer technical questions about our codebase and architecture
- Look up customer information when engineers need context
- Search the web for documentation on external libraries
- Write and execute code snippets to verify solutions
- Remember our ongoing projects and decisions from previous conversations

## How you work
1. When asked about customers, ALWAYS use the CRM lookup tool before answering
2. For code questions, provide working examples and test them with the code executor
3. For web searches, cite your sources explicitly
4. For ongoing project discussions, proactively reference relevant previous decisions

## What you don't do
- Access production systems directly
- Share customer PII in code examples
- Make commitments on behalf of the team
- Provide security advice without recommending security review
"""

# Export this flow as JSON for version control:
curl https://flowise.yourdomain.com/api/v1/chatflows/YOUR_ASSISTANT_ID \
  -H 'Authorization: Bearer YOUR_API_KEY' | \
  jq '.flowData' > assistant-v1.json
git add assistant-v1.json
git commit -m "Update engineering assistant system prompt v1"

Automated Document Processing Pipelines

Manual document upload through the Flowise UI doesn't scale. Production knowledge bases need automated pipelines: watch a folder, an S3 bucket, or a webhook for new documents, process and chunk them, index into the vector store, and keep the knowledge base current without human intervention.

Webhook-Triggered Document Ingestion

#!/usr/bin/env python3
# ingest-document.py
# Uploads a document to a Flowise knowledge base and waits for indexing to complete
# Use as a webhook handler or call from CI/CD pipelines

import requests
import time
import sys
import os
from pathlib import Path

FLOWISE_URL = os.environ["FLOWISE_URL"]
FLOWISE_API_KEY = os.environ["FLOWISE_API_KEY"]
DATASET_ID = os.environ["FLOWISE_DATASET_ID"]  # The knowledge base ID

def ingest_document(file_path: str, metadata: dict = None) -> dict:
    """Upload and index a document into a Flowise knowledge base."""
    path = Path(file_path)

    if not path.exists():
        raise FileNotFoundError(f"Document not found: {file_path}")

    # Upload the document
    with open(path, 'rb') as f:
        response = requests.post(
            f"{FLOWISE_URL}/api/v1/document-store/loader/process",
            headers={"Authorization": f"Bearer {FLOWISE_API_KEY}"},
            files={"files": (path.name, f, 'application/octet-stream')},
            data={
                "docStoreId": DATASET_ID,
                "splitterName": "recursiveCharacterTextSplitter",
                "splitterConfig": '{"chunkSize": 512, "chunkOverlap": 50}',
                "metadata": str(metadata or {})
            }
        )

    response.raise_for_status()
    result = response.json()
    print(f"Uploaded: {path.name} | Chunks: {result.get('chunks', 'unknown')}")
    return result

def ingest_url(url: str, metadata: dict = None) -> dict:
    """Scrape and index a web page into a Flowise knowledge base."""
    response = requests.post(
        f"{FLOWISE_URL}/api/v1/document-store/loader/process",
        headers={
            "Authorization": f"Bearer {FLOWISE_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "docStoreId": DATASET_ID,
            "loaderName": "cheerioWebScraper",
            "loaderConfig": {"url": url},
            "splitterName": "recursiveCharacterTextSplitter",
            "splitterConfig": {"chunkSize": 512, "chunkOverlap": 50},
            "metadata": metadata or {}
        }
    )
    response.raise_for_status()
    return response.json()

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python3 ingest-document.py ")
        sys.exit(1)

    target = sys.argv[1]
    if target.startswith("http"):
        result = ingest_url(target, metadata={"source": target, "ingested_by": "pipeline"})
    else:
        result = ingest_document(target, metadata={"filename": target})

    print(f"Ingestion complete: {result}")

Folder Watcher for Automatic Ingestion

#!/usr/bin/env python3
# watch-and-ingest.py
# Watches a directory for new documents and auto-ingests them into Flowise
# Run as a persistent service: systemd or Docker container

import time
import hashlib
import json
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from ingest_document import ingest_document  # From script above

WATCH_DIR = "/opt/knowledge-docs"
STATE_FILE = "/tmp/ingested-docs.json"
SUPPORTED_EXTENSIONS = {".pdf", ".txt", ".md", ".docx", ".csv", ".html"}

def file_hash(path: Path) -> str:
    return hashlib.sha256(path.read_bytes()).hexdigest()[:16]

def load_state() -> dict:
    if Path(STATE_FILE).exists():
        return json.loads(Path(STATE_FILE).read_text())
    return {}

def save_state(state: dict):
    Path(STATE_FILE).write_text(json.dumps(state, indent=2))

class DocumentIngestionHandler(FileSystemEventHandler):
    def __init__(self):
        self.state = load_state()

    def on_created(self, event):
        if event.is_directory:
            return
        self.process_file(Path(event.src_path))

    def on_modified(self, event):
        if event.is_directory:
            return
        self.process_file(Path(event.src_path))

    def process_file(self, path: Path):
        if path.suffix.lower() not in SUPPORTED_EXTENSIONS:
            return

        current_hash = file_hash(path)
        if self.state.get(str(path)) == current_hash:
            print(f"Skipping unchanged: {path.name}")
            return

        print(f"Ingesting: {path.name}")
        try:
            ingest_document(
                str(path),
                metadata={"filename": path.name, "directory": str(path.parent)}
            )
            self.state[str(path)] = current_hash
            save_state(self.state)
        except Exception as e:
            print(f"Error ingesting {path.name}: {e}")

# Run the watcher:
observer = Observer()
handler = DocumentIngestionHandler()
observer.schedule(handler, WATCH_DIR, recursive=True)
observer.start()
print(f"Watching: {WATCH_DIR}")
try:
    while True:
        time.sleep(10)
except KeyboardInterrupt:
    observer.stop()
observer.join()

# Install as a systemd service:
# [Unit]
# Description=Flowise Document Watcher
# After=network.target
# [Service]
# ExecStart=/usr/bin/python3 /opt/scripts/watch-and-ingest.py
# Restart=always
# Environment="FLOWISE_URL=https://flowise.yourdomain.com"
# Environment="FLOWISE_API_KEY=your-key"
# Environment="FLOWISE_DATASET_ID=your-dataset-id"
# [Install]
# WantedBy=multi-user.target

Flow Evaluation: Knowing When RAG Quality Degrades

Without systematic evaluation, you only find out your RAG pipeline is returning wrong answers when a user complains. Flowise has a built-in evaluation system, but the real value comes from building an automated test suite that runs on every knowledge base update.

Using Flowise's Built-In Evaluation

Go to Evaluations → Create Evaluation. The evaluation system lets you define question/expected-answer pairs and measure how well your flow performs against them. Key metrics Flowise tracks:

Faithfulness — does the answer stay faithful to the retrieved context? Catches hallucinations.
Answer Relevance — is the answer relevant to the question? Catches off-topic responses.
Context Precision — did the retriever fetch useful chunks? Low precision = poor chunking or embedding.
Context Recall — were all relevant chunks retrieved? Low recall = missing content or threshold too high.

Automated Evaluation Pipeline

#!/usr/bin/env python3
# evaluate-flow.py
# Runs a test suite against a Flowise chatflow and reports quality metrics
# Run after every knowledge base update to catch regressions

import requests
import json
import os
from dataclasses import dataclass
from typing import Optional

FLOWISE_URL = os.environ["FLOWISE_URL"]
FLOWISE_API_KEY = os.environ["FLOWISE_API_KEY"]
CHATFLOW_ID = os.environ["FLOWISE_CHATFLOW_ID"]

@dataclass
class TestCase:
    question: str
    expected_keywords: list[str]   # Keywords that MUST appear in a good answer
    forbidden_keywords: list[str]  # Keywords that indicate hallucination
    context_hint: Optional[str] = None  # Expected source document content

# Define your test suite — these should cover your actual use cases:
TEST_CASES = [
    TestCase(
        question="What is our refund policy for digital products?",
        expected_keywords=["30 days", "refund", "digital"],
        forbidden_keywords=["physical", "shipping"],
    ),
    TestCase(
        question="How do I reset my API key?",
        expected_keywords=["settings", "API", "regenerate"],
        forbidden_keywords=["password", "account deletion"],
    ),
    TestCase(
        question="What are the system requirements?",
        expected_keywords=["RAM", "operating system"],
        forbidden_keywords=["we don't know", "unclear"],
    ),
    # Test that the bot handles out-of-scope questions correctly:
    TestCase(
        question="What is the weather in New York?",
        expected_keywords=["can't", "don't have", "not able"],  # Should decline gracefully
        forbidden_keywords=["sunny", "°F", "°C"],
    ),
]

def ask_flow(question: str, session_id: str) -> dict:
    response = requests.post(
        f"{FLOWISE_URL}/api/v1/prediction/{CHATFLOW_ID}",
        headers={
            "Authorization": f"Bearer {FLOWISE_API_KEY}",
            "Content-Type": "application/json"
        },
        json={"question": question, "overrideConfig": {"sessionId": session_id}},
        timeout=60
    )
    response.raise_for_status()
    return response.json()

def evaluate_answer(answer: str, test_case: TestCase) -> dict:
    answer_lower = answer.lower()

    found_expected = [kw for kw in test_case.expected_keywords if kw.lower() in answer_lower]
    found_forbidden = [kw for kw in test_case.forbidden_keywords if kw.lower() in answer_lower]

    keyword_score = len(found_expected) / max(len(test_case.expected_keywords), 1)
    has_hallucination = len(found_forbidden) > 0

    return {
        "pass": keyword_score >= 0.6 and not has_hallucination,
        "keyword_score": round(keyword_score * 100, 1),
        "found_expected": found_expected,
        "found_forbidden": found_forbidden,
        "answer_length": len(answer),
    }

# Run the evaluation:
import uuid
results = []
pass_count = 0

print(f"Running {len(TEST_CASES)} evaluations against chatflow: {CHATFLOW_ID}")
print("=" * 60)

for i, test in enumerate(TEST_CASES):
    session_id = f"eval-{uuid.uuid4().hex[:8]}"
    response = ask_flow(test.question, session_id)
    answer = response.get("text", "")
    result = evaluate_answer(answer, test)
    results.append({"test": test.question, **result})

    status = "✅ PASS" if result["pass"] else "❌ FAIL"
    print(f"{status} ({result['keyword_score']}%) | {test.question[:60]}")
    if not result["pass"]:
        if result["found_forbidden"]:
            print(f"   Hallucination indicators: {result['found_forbidden']}")
        print(f"   Missing keywords: {set(test.expected_keywords) - set(result['found_expected'])}")

    if result["pass"]:
        pass_count += 1

overall_score = pass_count / len(TEST_CASES) * 100
print("=" * 60)
print(f"Overall: {pass_count}/{len(TEST_CASES)} passed ({overall_score:.1f}%)")

# Exit with non-zero if quality is below threshold (blocks CI/CD if RAG degrades)
import sys
if overall_score < 80.0:
    print("QUALITY BELOW THRESHOLD (80%) — review knowledge base before deploying")
    sys.exit(1)

Integrating Evaluation into CI/CD

# .github/workflows/knowledge-base-update.yml
name: Update Knowledge Base and Validate

on:
  push:
    paths:
      - 'docs/**'  # Trigger when documentation changes

jobs:
  update-and-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: pip install requests watchdog

      - name: Ingest updated documents
        env:
          FLOWISE_URL: ${{ secrets.FLOWISE_URL }}
          FLOWISE_API_KEY: ${{ secrets.FLOWISE_API_KEY }}
          FLOWISE_DATASET_ID: ${{ secrets.FLOWISE_DATASET_ID }}
        run: |
          # Find changed docs and ingest them
          git diff --name-only HEAD~1 HEAD -- 'docs/*.md' | while read file; do
            echo "Ingesting: $file"
            python3 scripts/ingest-document.py "$file"
          done

      - name: Wait for indexing to complete
        run: sleep 30  # Flowise indexing is async — wait for it

      - name: Run quality evaluation
        env:
          FLOWISE_URL: ${{ secrets.FLOWISE_URL }}
          FLOWISE_API_KEY: ${{ secrets.FLOWISE_API_KEY }}
          FLOWISE_CHATFLOW_ID: ${{ secrets.FLOWISE_CHATFLOW_ID }}
        run: python3 scripts/evaluate-flow.py
        # Script exits 1 if quality < 80% — fails the pipeline

Integrating Flowise with Your Self-Hosted Stack

Flowise doesn't have to be isolated. As part of a broader self-hosted AI stack, it works best when it's connected to your other tools — n8n for workflow automation, Ollama for local models, and Uptime Kuma for monitoring.

Flowise + n8n: Triggering Flows from Automation Workflows

# n8n HTTP Request node configuration for calling Flowise:

# In n8n, add an HTTP Request node with:
# Method: POST
# URL: https://flowise.yourdomain.com/api/v1/prediction/YOUR_CHATFLOW_ID
# Authentication: Generic Credential Type → Header Auth
# Header Name: Authorization
# Header Value: Bearer YOUR_FLOWISE_API_KEY

# Request body (JSON):
# {
#   "question": "{{ $json.customer_message }}",  // From previous n8n node
#   "overrideConfig": {
#     "sessionId": "{{ $json.customer_id }}"
#   }
# }

# Flowise response contains: { "text": "...", "sourceDocuments": [...] }
# Access in subsequent n8n nodes as: {{ $json.text }}

# Practical use case: Customer support triage in n8n
# 1. Email node receives customer email
# 2. HTTP Request → Flowise (classify intent + draft response)
# 3. IF node (based on Flowise's intent classification)
# 4a. Low-complexity: send Flowise draft directly
# 4b. High-complexity: create Helpdesk ticket with Flowise summary
# 5. Slack notification to support team with context

# Example n8n workflow config (import as JSON):
cat > flowise-support-workflow.json << 'EOF'
{
  "name": "AI Customer Support Triage",
  "nodes": [
    {
      "name": "Receive Email",
      "type": "n8n-nodes-base.emailReadImap",
      "position": [100, 300]
    },
    {
      "name": "Classify with Flowise",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "=https://flowise.yourdomain.com/api/v1/prediction/CHATFLOW_ID",
        "jsonBody": {
          "question": "=Classify this customer email and draft a response:\n\n{{ $json.text }}",
          "overrideConfig": { "sessionId": "={{ $json.messageId }}" }
        }
      }
    }
  ]
}
EOF

Deep Ollama Integration with Custom Parameters

# Flowise ChatOllama node configuration for production use:
# (Set in the node properties panel in the flow canvas)

# Model: llama3.1:8b (for speed) or llama3.1:70b (for quality)
# Base URL: http://172.17.0.1:11434  (host IP from Docker network)
# Temperature: 0.1 (lower = more deterministic, better for factual RAG)
# Context Window: 8192 (match to the Ollama model's actual context)
# Num Predict: 2048 (max tokens to generate per response)

# For embedding with Ollama (required for local RAG):
# Use OllamaEmbeddings node
# Model: nomic-embed-text (best local embedding model)
# Pull it first:
curl http://localhost:11434/api/pull -d '{"name": "nomic-embed-text"}'

# Test the full local stack:
docker exec flowise curl -X POST http://172.17.0.1:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "test"}],
    "stream": false
  }' | jq .message.content
# If this works from inside Flowise container, the Ollama integration will work

# Performance tuning for Ollama in Flowise flows:
# Llama3.1 8B on CPU-only: ~5-15 tokens/sec — good enough for RAG
# Llama3.1 8B on NVIDIA GPU: ~60-100 tokens/sec — production-grade
# Use GPU_LAYERS in Ollama for GPU offloading:
# OLLAMA_NUM_GPU=99 (offload all layers to GPU)
# Set in Ollama's systemd service or Docker env

Monitoring Flowise with Uptime Kuma

# Configure these monitors in Uptime Kuma for your Flowise stack:

# Monitor 1: Flowise API health
# Type: HTTP(s) - Keyword
# URL: https://flowise.yourdomain.com/api/v1/version
# Keyword: "version"
# Interval: 60s

# Monitor 2: Flowise chatflow response (functional check)
# Type: HTTP(s) - Keyword
# URL: https://flowise.yourdomain.com/api/v1/prediction/YOUR_CHATFLOW_ID
# Method: POST
# Body: {"question": "health check"}
# Keyword: "text"  (Flowise responses contain a 'text' field)
# Interval: 300s (5 minutes — don't waste LLM tokens checking too often)

# Monitor 3: PostgreSQL for Flowise (TCP)
# Type: TCP Port
# Host: flowise-db.internal
# Port: 5432

# Monitor 4: Qdrant vector store
# Type: HTTP(s)
# URL: http://qdrant:6333/collections
# Accepted status: 200-299

# Push monitor for document ingestion pipeline:
# In watch-and-ingest.py, add heartbeat after successful ingestion:
PUSH_URL = os.environ.get("UPTIME_KUMA_PUSH_URL", "")
if PUSH_URL:
    requests.get(f"{PUSH_URL}?status=up&msg=Document+ingestion+OK&ping=")

# Alert: if no document has been ingested for 24+ hours in a repo that should
# be updating daily, the pipeline may be stuck

Tips, Gotchas, and Troubleshooting

Assistant Threads Growing Too Large

# Long threads eventually hit the model's context window
# Symptoms: responses become slower, then start ignoring early conversation

# Check thread sizes in PostgreSQL:
docker exec flowise_db psql -U flowise flowise -c "
  SELECT
    session_id,
    COUNT(*) as messages,
    SUM(LENGTH(content)) as total_chars,
    ROUND(SUM(LENGTH(content)) / 4.0) as estimated_tokens
  FROM chat_message
  GROUP BY session_id
  HAVING COUNT(*) > 50
  ORDER BY estimated_tokens DESC
  LIMIT 10;"

# Solutions:
# 1. Use Summary Memory instead of Buffer Memory
#    Summary Memory compresses old conversation into a summary
#    when it exceeds a token threshold — preserves context without growing forever

# 2. Set max token limit in the memory node:
#    BufferMemory → Max Token Limit: 4000
#    When exceeded, oldest messages are dropped

# 3. Periodically archive old sessions:
docker exec flowise_db psql -U flowise flowise -c "
  DELETE FROM chat_message
  WHERE session_id IN (
    SELECT DISTINCT session_id
    FROM chat_message
    WHERE created_date < NOW() - INTERVAL '90 days'
  );"
  # Run monthly to clean up old sessions

Document Ingestion Pipeline Failing Silently

# Check Flowise logs for ingestion errors:
docker logs flowise --since 1h | grep -iE '(error|upload|embed|index)'

# Verify the document store ID is correct:
curl https://flowise.yourdomain.com/api/v1/document-store \
  -H 'Authorization: Bearer YOUR_API_KEY' | jq '[.[] | {id: .id, name: .name}]'

# Test ingestion with a simple text file:
echo "Test document for ingestion" > /tmp/test.txt
python3 ingest-document.py /tmp/test.txt

# Common failure modes:
# 1. File type not supported — check SUPPORTED_EXTENSIONS in watcher
# 2. File too large — check FLOWISE_FILE_SIZE_LIMIT env var and Nginx client_max_body_size
# 3. Embedding model unavailable — if using Ollama, check it's running
curl http://172.17.0.1:11434/api/tags | jq '[.models[].name]'

# 4. Vector store quota exceeded (Qdrant):
curl http://localhost:6333/collections/YOUR_COLLECTION | jq '.result.points_count'
# Check disk space: df -h

Evaluation Tests Passing But Users Still Complaining

# Test cases that pass but don't reflect real user queries = misleading coverage
# Fix: gather real questions from your users and add them to the test suite

# Extract actual queries from Flowise logs:
docker exec flowise_db psql -U flowise flowise -c "
  SELECT content, created_date
  FROM chat_message
  WHERE role = 'userMessage'
    AND created_date > NOW() - INTERVAL '7 days'
  ORDER BY created_date DESC
  LIMIT 50;" | head -60

# Look for:
# - Questions that got empty or very short answers (< 100 chars)
# - Repeated questions (users asking the same thing multiple times = not satisfied)
# - Questions with unusual phrasing that your test cases don't cover

# Add the top failure categories to your test suite and re-run
# Your test coverage should reflect your actual user population's questions

Pro Tips

Use Flowise as a routing layer between n8n and multiple LLMs — instead of hardcoding model choices in n8n workflows, send all LLM requests through Flowise. Change the underlying model in Flowise without touching n8n. This is the same pattern as LiteLLM but with visual flow management.
Build a test chatflow that mirrors production but uses cheaper models — create a staging version of your production RAG flow pointing at GPT-4o-mini instead of GPT-4o. Run evaluations against staging, not production — same knowledge base, much lower cost per evaluation run.
Use Flowise's upsert endpoint for incremental document updates — when a document changes, call the upsert endpoint with the same document ID rather than deleting and re-adding. Flowise updates only the changed chunks, preserving vector store efficiency.
Export complete flow state for disaster recovery — export both the flow JSON and the vector store dump. For Qdrant: curl http://localhost:6333/collections/YOUR_COLLECTION/points/scroll. For pgvector: pg_dump the table. Without both, a restored Flowise instance will have flows that reference non-existent vectors.
Version-pin Flowise and test updates on staging before production — Flowise node APIs change between minor versions. A node configuration that worked in 1.7 may behave differently in 1.8. Maintain a staging Flowise instance, update it first, run your evaluation suite, then promote to production.

Wrapping Up

The three Flowise guides together cover the complete lifecycle: basic deployment and first RAG app, production hardening with PostgreSQL, custom nodes, and security, and this guide's Assistants with persistent memory, automated document pipelines, systematic quality evaluation, and integration with your broader self-hosted stack.

The evaluation pipeline is the piece most teams skip — and it's what separates AI apps that feel reliable from ones that feel unpredictable. Once you have test cases that mirror your actual user queries and a CI/CD step that blocks deployments when quality drops below threshold, you've built something you can actually stake your product reputation on.

Need a Complete Self-Hosted AI Stack Built for Your Team?

Flowise, n8n, Ollama, LiteLLM, and Uptime Kuma working together as a coherent AI platform — with proper evaluation pipelines, document ingestion automation, and monitoring — is a significant architecture project. The sysbrix team designs and builds complete self-hosted AI stacks for engineering teams that need production reliability, not proof-of-concept demos.

Talk to Us →

in Guides

# AI Agents Automation Flowise LLM Self-Hosted

Flowise Self-Host Guide: Production RAG Tuning, Custom Nodes, Multi-Tenant Deployments, and API Security

Go beyond basic Flowise flows — learn how to tune RAG retrieval quality with advanced chunking and reranking, build custom nodes for proprietary APIs, lock down your deployment with authentication and rate limiting, and scale Flowise for team use with PostgreSQL persistence.