# Local RAG Setup Minimal RAG implementation with LangChain, Ollama, and FAISS. ## Dependencies - `langchain` - Core framework - `langchain-community` - Community integrations (loaders, vectorstores) - `langchain-ollama` - Ollama integration - `langchain-text-splitters` - Text splitting utilities - `langchain-huggingface` - HuggingFace embeddings - `faiss-cpu` - Vector search - `sentence-transformers` - Embeddings - `pypdf` - PDF loading - `fastapi` - Web server and API - `uvicorn` - ASGI server ## Installation ```bash # Create conda environment conda create -n local_rag python=3.10 -y conda activate local_rag # Install dependencies pip install -r requirements.txt ``` ## Setup Ollama ```bash # Make sure Ollama is running ollama serve # Pull a model (in another terminal) ollama pull llama2 ``` ## Usage Edit `local_rag.py` and uncomment the example code: ```python # Add documents rag.add_documents([ "path/to/document1.pdf", "path/to/document2.txt" ]) # Query question = "What is this document about?" answer = rag.query(question) print(f"Answer: {answer}") ``` Run: ```bash python local_rag.py ``` ## Chat GUI (FastAPI) A simple web chat interface is included. Start the server: ```bash uvicorn server:app --reload ``` Then open [http://localhost:8000](http://localhost:8000) in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in `local_rag.py` once) and that Ollama is running. ## How it works 1. **Load documents** - PDFs or text files 2. **Split into chunks** - 1000 chars with 200 overlap 3. **Create embeddings** - Using sentence-transformers 4. **Store in FAISS** - Fast similarity search 5. **Query** - Retrieve relevant chunks and generate answer with Ollama