Local RAG Setup

Minimal RAG implementation with LangChain, FAISS, and support for either Ollama or OpenAI (API-key needed).

Dependencies

  • langchain - Core framework
  • langchain-community - Loaders, vectorstores
  • langchain-ollama - Ollama integration
  • langchain-openai - OpenAI integration
  • langchain-text-splitters - Text splitting
  • langchain-huggingface - HuggingFace embeddings
  • faiss-cpu - Vector search
  • sentence-transformers - Embeddings
  • pypdf - PDF loading
  • fastapi - Web server
  • uvicorn - ASGI server

Installation

conda create -n local_rag python=3.10 -y
conda activate local_rag
pip install -r requirements.txt

Setup

Ollama (optional)

ollama serve
ollama pull mistral

OpenAI (optional)

Set the API key when using OpenAI:

export OPENAI_API_KEY="your-key"

Add Documents

Option 1: Add PDFs from a folder via script. Edit DATA_ROOT in add_pdfs.py to point at your folder, then run:

python add_pdfs.py

The script clears the existing vector store and indexes all PDFs recursively. Supports .pdf, .txt, .md.

Option 2: Use local_rag.py programmatically:

from local_rag import LocalRAG
rag = LocalRAG()
rag.add_documents(["path/to/doc1.pdf", "path/to/doc2.txt"])

Chat GUI

Start the server:

uvicorn server:app --reload

Open http://localhost:8000. The chat UI provides:

  • Provider switch Toggle between Ollama and OpenAI without restart (OpenAI requires OPENAI_API_KEY)
  • Conversation history Multi-turn chat with context
  • Markdown Assistant replies rendered as markdown (headings, code, lists, links)

Ensure the vector store is populated and at least one provider (Ollama or OpenAI) is configured.

API

  • POST /api/chat { "message": "...", "history": [...], "llm_provider": "ollama"|"openai" }
  • GET /api/providers { "ollama": true, "openai": true|false }
  • GET /api/health Health and vectorstore status

How it works

  1. Load documents PDFs or text via PyPDFLoader / TextLoader
  2. Chunk RecursiveCharacterTextSplitter (2000 chars, 400 overlap)
  3. Embed sentence-transformers/all-MiniLM-L6-v2
  4. Store FAISS vector store (similarity search with scores)
  5. Query Retrieve chunks, optionally rephrase with conversation history, generate answer with selected LLM
Description
locally running RAG using Ollama
Readme 5.1 MiB
Languages
Python 65.4%
HTML 34.6%