RAG Module
Retrieval-Augmented Generation for grounding LLM responses in your documents.
Overview
The RAG (Retrieval-Augmented Generation) module allows you to upload documents and automatically inject relevant context into your LLM conversations. This grounds the model's responses in your data, reducing hallucinations and enabling domain-specific knowledge.
Document Upload
PDF, TXT, MD file support
Smart Chunking
Intelligent text splitting with overlap
Semantic Search
Find relevant context automatically
- Support for PDF, TXT, and Markdown files
- Smart text chunking with configurable overlap
- TF-IDF or sentence-transformers embeddings
- SQLite + NumPy vector storage
- Semantic search with top-k retrieval
- Source citations in results
How It Works
text
RAG Pipeline: ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Document │ ──▶ │ Chunker │ ──▶ │ Embedder ││ Upload │ │ (split) │ │ (vectorize) │└─────────────┘ └─────────────┘ └──────┬──────┘ │ ▼┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Response │ ◀── │ LLM │ ◀── │ Vector ││ + Source │ │ (generate) │ │ Store │└─────────────┘ └─────────────┘ └─────────────┘ At query time:1. Query is embedded2. Similar chunks retrieved from vector store3. Context injected into LLM prompt4. Response includes source citationsThe chunker uses smart text splitting that respects paragraph boundaries and includes overlap to maintain context across chunks.
Document Upload
Upload File
bash
# Upload a PDF documentcurl -X POST http://localhost:8000/api/rag/documents/upload \ -F "file=@knowledge_base.pdf" # Upload a text filecurl -X POST http://localhost:8000/api/rag/documents/upload \ -F "file=@documentation.txt" # Upload markdowncurl -X POST http://localhost:8000/api/rag/documents/upload \ -F "file=@readme.md"Upload Raw Content
bash
# Add document by contentcurl -X POST http://localhost:8000/api/rag/documents \ -H "Content-Type: application/json" \ -d '{ "content": "Your document content here...", "metadata": { "title": "Company Policies", "source": "HR Department" } }'Python API
python
import requests # Upload a filewith open("document.pdf", "rb") as f: response = requests.post( "http://localhost:8000/api/rag/documents/upload", files={"file": f} ) doc_id = response.json()["id"] print(f"Document uploaded: {doc_id}") # Add content directlyresponse = requests.post( "http://localhost:8000/api/rag/documents", json={ "content": "ZSE is an ultra memory-efficient LLM inference engine...", "metadata": {"title": "ZSE Overview"} })List Documents
bash
# List all documentscurl http://localhost:8000/api/rag/documents # Response:# {# "documents": [# {"id": "abc123", "title": "Company Policies", "chunks": 15},# {"id": "def456", "title": "Product Manual", "chunks": 42}# ]# }Delete Document
bash
# Delete a documentcurl -X DELETE http://localhost:8000/api/rag/documents/abc123Searching Documents
Semantic Search
bash
# Search for relevant contentcurl -X POST http://localhost:8000/api/rag/search \ -H "Content-Type: application/json" \ -d '{ "query": "What is the return policy?", "top_k": 5 }' # Response:# {# "results": [# {# "content": "Returns are accepted within 30 days...",# "score": 0.89,# "document_id": "abc123",# "metadata": {"title": "Company Policies", "chunk": 3}# },# ...# ]# }Get Context for Chat
bash
# Get context formatted for chat injectioncurl -X POST http://localhost:8000/api/rag/context \ -H "Content-Type: application/json" \ -d '{ "query": "How do I configure the server?", "top_k": 3, "include_sources": true }' # Response:# {# "context": "Based on the documentation:\n\n1. Edit config.yaml...",# "sources": [# {"title": "Configuration Guide", "relevance": 0.92},# {"title": "Quick Start", "relevance": 0.78}# ]# }Chat Integration
The RAG module integrates seamlessly with the chat API to automatically inject relevant context:
RAG-Enhanced Chat
python
import requests # Chat with RAG contextresponse = requests.post( "http://localhost:8000/v1/chat/completions", json={ "model": "qwen-7b", "messages": [ {"role": "user", "content": "What's our refund policy?"} ], "rag": { "enabled": True, "top_k": 3 } }) result = response.json()print(result["choices"][0]["message"]["content"])# The response will be grounded in your uploaded documentsPlayground Integration
The ZSE playground at /chat includes RAG controls in the sidebar. Upload documents and toggle RAG to see context-aware responses.
For best results, upload documents that are specific to your use case. The more relevant your documents, the better the RAG context.
API Reference
| Endpoint | Method | Description |
|---|---|---|
| /api/rag/documents | POST | Add document by content |
| /api/rag/documents/upload | POST | Upload document file |
| /api/rag/documents | GET | List all documents |
| /api/rag/documents/{id} | DELETE | Delete document |
| /api/rag/search | POST | Search documents |
| /api/rag/context | POST | Get context for chat |