Local RAG Setup

Minimal RAG implementation with LangChain, Ollama, and FAISS.

Dependencies

  • langchain - Core framework
  • langchain-community - Community integrations (loaders, vectorstores)
  • langchain-ollama - Ollama integration
  • langchain-text-splitters - Text splitting utilities
  • langchain-huggingface - HuggingFace embeddings
  • faiss-cpu - Vector search
  • sentence-transformers - Embeddings
  • pypdf - PDF loading
  • fastapi - Web server and API
  • uvicorn - ASGI server

Installation

# Create conda environment
conda create -n local_rag python=3.10 -y
conda activate local_rag

# Install dependencies
pip install -r requirements.txt

Setup Ollama

# Make sure Ollama is running
ollama serve

# Pull a model (in another terminal)
ollama pull llama2

Usage

Edit local_rag.py and uncomment the example code:

# Add documents
rag.add_documents([
    "path/to/document1.pdf",
    "path/to/document2.txt"
])

# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")

Run:

python local_rag.py

Chat GUI (FastAPI)

A simple web chat interface is included. Start the server:

uvicorn server:app --reload

Then open http://localhost:8000 in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in local_rag.py once) and that Ollama is running.

How it works

  1. Load documents - PDFs or text files
  2. Split into chunks - 1000 chars with 200 overlap
  3. Create embeddings - Using sentence-transformers
  4. Store in FAISS - Fast similarity search
  5. Query - Retrieve relevant chunks and generate answer with Ollama
Description
locally running RAG using Ollama
Readme 5.1 MiB
Languages
Python 65.4%
HTML 34.6%