localRAG/README.md
2026-02-03 10:58:41 +01:00

79 lines
1.8 KiB
Markdown

# Local RAG Setup
Minimal RAG implementation with LangChain, Ollama, and FAISS.
## Dependencies
- `langchain` - Core framework
- `langchain-community` - Community integrations (loaders, vectorstores)
- `langchain-ollama` - Ollama integration
- `langchain-text-splitters` - Text splitting utilities
- `langchain-huggingface` - HuggingFace embeddings
- `faiss-cpu` - Vector search
- `sentence-transformers` - Embeddings
- `pypdf` - PDF loading
- `fastapi` - Web server and API
- `uvicorn` - ASGI server
## Installation
```bash
# Create conda environment
conda create -n local_rag python=3.10 -y
conda activate local_rag
# Install dependencies
pip install -r requirements.txt
```
## Setup Ollama
```bash
# Make sure Ollama is running
ollama serve
# Pull a model (in another terminal)
ollama pull llama2
```
## Usage
Edit `local_rag.py` and uncomment the example code:
```python
# Add documents
rag.add_documents([
"path/to/document1.pdf",
"path/to/document2.txt"
])
# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")
```
Run:
```bash
python local_rag.py
```
## Chat GUI (FastAPI)
A simple web chat interface is included. Start the server:
```bash
uvicorn server:app --reload
```
Then open [http://localhost:8000](http://localhost:8000) in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in `local_rag.py` once) and that Ollama is running.
## How it works
1. **Load documents** - PDFs or text files
2. **Split into chunks** - 1000 chars with 200 overlap
3. **Create embeddings** - Using sentence-transformers
4. **Store in FAISS** - Fast similarity search
5. **Query** - Retrieve relevant chunks and generate answer with Ollama