Local RAG Setup

Minimal RAG implementation with LangChain, Ollama, and FAISS.

Dependencies

langchain - Core framework
langchain-community - Community integrations (loaders, vectorstores)
langchain-ollama - Ollama integration
langchain-text-splitters - Text splitting utilities
langchain-huggingface - HuggingFace embeddings
faiss-cpu - Vector search
sentence-transformers - Embeddings
pypdf - PDF loading
fastapi - Web server and API
uvicorn - ASGI server

Installation

# Create conda environment
conda create -n local_rag python=3.10 -y
conda activate local_rag

# Install dependencies
pip install -r requirements.txt

Setup Ollama

# Make sure Ollama is running
ollama serve

# Pull a model (in another terminal)
ollama pull llama2

Usage

Edit local_rag.py and uncomment the example code:

# Add documents
rag.add_documents([
    "path/to/document1.pdf",
    "path/to/document2.txt"
])

# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")

Run:

python local_rag.py

Chat GUI (FastAPI)

A simple web chat interface is included. Start the server:

uvicorn server:app --reload

Then open http://localhost:8000 in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in local_rag.py once) and that Ollama is running.

How it works

Load documents - PDFs or text files
Split into chunks - 1000 chars with 200 overlap
Create embeddings - Using sentence-transformers
Store in FAISS - Fast similarity search
Query - Retrieve relevant chunks and generate answer with Ollama