localRAG/README.md
Philipp Mock 3fd6b8dd0d - added three iwm articles as test data
- added simple webview and FastAPI server
2026-02-03 10:57:05 +01:00

1.8 KiB

Local RAG Setup

Minimal RAG implementation with LangChain, Ollama, and FAISS.

Dependencies

  • langchain - Core framework
  • langchain-community - Community integrations (loaders, vectorstores)
  • langchain-ollama - Ollama integration
  • langchain-text-splitters - Text splitting utilities
  • langchain-huggingface - HuggingFace embeddings
  • faiss-cpu - Vector search
  • sentence-transformers - Embeddings
  • pypdf - PDF loading
  • fastapi - Web server and API
  • uvicorn - ASGI server

Installation

# Create conda environment
conda create -n local_rag python=3.10 -y
conda activate local_rag

# Install dependencies
pip install -r requirements.txt

Setup Ollama

# Make sure Ollama is running
ollama serve

# Pull a model (in another terminal)
ollama pull llama2

Usage

Edit local_rag.py and uncomment the example code:

# Add documents
rag.add_documents([
    "path/to/document1.pdf",
    "path/to/document2.txt"
])

# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")

Run:

python local_rag.py

Chat GUI (FastAPI)

A simple web chat interface is included. Start the server:

uvicorn server:app --reload

Then open http://localhost:8000 in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in local_rag.py once) and that Ollama is running.

How it works

  1. Load documents - PDFs or text files
  2. Split into chunks - 1000 chars with 200 overlap
  3. Create embeddings - Using sentence-transformers
  4. Store in FAISS - Fast similarity search
  5. Query - Retrieve relevant chunks and generate answer with Ollama

That's it! Simple and minimal.