- Dual RAG instances (Ollama + OpenAI) for on-the-fly provider switching - Provider dropdown in chat UI, /api/providers endpoint - Markdown rendering for assistant responses - Server logs include provider and model name for each LLM response - README: OpenAI setup, add_pdfs, API docs, provider switch Co-authored-by: Cursor <cursoragent@cursor.com>
Local RAG Setup
Minimal RAG implementation with LangChain, FAISS, and support for either Ollama or OpenAI (API-key needed).
Dependencies
langchain- Core frameworklangchain-community- Loaders, vectorstoreslangchain-ollama- Ollama integrationlangchain-openai- OpenAI integrationlangchain-text-splitters- Text splittinglangchain-huggingface- HuggingFace embeddingsfaiss-cpu- Vector searchsentence-transformers- Embeddingspypdf- PDF loadingfastapi- Web serveruvicorn- ASGI server
Installation
conda create -n local_rag python=3.10 -y
conda activate local_rag
pip install -r requirements.txt
Setup
Ollama (optional)
ollama serve
ollama pull mistral
OpenAI (optional)
Set the API key when using OpenAI:
export OPENAI_API_KEY="your-key"
Add Documents
Option 1: Add PDFs from a folder via script. Edit DATA_ROOT in add_pdfs.py to point at your folder, then run:
python add_pdfs.py
The script clears the existing vector store and indexes all PDFs recursively. Supports .pdf, .txt, .md.
Option 2: Use local_rag.py programmatically:
from local_rag import LocalRAG
rag = LocalRAG()
rag.add_documents(["path/to/doc1.pdf", "path/to/doc2.txt"])
Chat GUI
Start the server:
uvicorn server:app --reload
Open http://localhost:8000. The chat UI provides:
- Provider switch – Toggle between Ollama and OpenAI without restart (OpenAI requires
OPENAI_API_KEY) - Conversation history – Multi-turn chat with context
- Markdown – Assistant replies rendered as markdown (headings, code, lists, links)
Ensure the vector store is populated and at least one provider (Ollama or OpenAI) is configured.
API
POST /api/chat–{ "message": "...", "history": [...], "llm_provider": "ollama"|"openai" }GET /api/providers–{ "ollama": true, "openai": true|false }GET /api/health– Health and vectorstore status
How it works
- Load documents – PDFs or text via PyPDFLoader / TextLoader
- Chunk – RecursiveCharacterTextSplitter (2000 chars, 400 overlap)
- Embed – sentence-transformers/all-MiniLM-L6-v2
- Store – FAISS vector store (similarity search with scores)
- Query – Retrieve chunks, optionally rephrase with conversation history, generate answer with selected LLM
Description
Languages
Python
65.4%
HTML
34.6%