localRAG/README.md

# Local RAG Setup

Minimal RAG implementation with LangChain, Ollama, and FAISS.

## Dependencies

- `langchain` - Core framework
- `langchain-community` - Community integrations (loaders, vectorstores)
- `langchain-ollama` - Ollama integration
- `langchain-text-splitters` - Text splitting utilities
- `langchain-huggingface` - HuggingFace embeddings
- `faiss-cpu` - Vector search
- `sentence-transformers` - Embeddings
- `pypdf` - PDF loading
- `fastapi` - Web server and API
- `uvicorn` - ASGI server

## Installation

```bash
# Create conda environment
conda create -n local_rag python=3.10 -y
conda activate local_rag

# Install dependencies
pip install -r requirements.txt
```

## Setup Ollama

```bash
# Make sure Ollama is running
ollama serve

# Pull a model (in another terminal)
ollama pull llama2
```

## Usage

Edit `local_rag.py` and uncomment the example code:

```python
# Add documents
rag.add_documents([
    "path/to/document1.pdf",
    "path/to/document2.txt"
])

# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")
```

Run:
```bash
python local_rag.py
```

## Chat GUI (FastAPI)

A simple web chat interface is included. Start the server:

```bash
uvicorn server:app --reload
```

Then open [http://localhost:8000](http://localhost:8000) in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in `local_rag.py` once) and that Ollama is running.

## How it works

1. **Load documents** - PDFs or text files
2. **Split into chunks** - 1000 chars with 200 overlap
3. **Create embeddings** - Using sentence-transformers
4. **Store in FAISS** - Fast similarity search
5. **Query** - Retrieve relevant chunks and generate answer with Ollama