localRAG/README.md

206 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Local RAG Setup
Minimal RAG implementation with LangChain, FAISS, and either **Ollama** (local) or **OpenAI** (API key). A web chat UI is included.
---
## What you need (before you start)
- **Python 3.10 or newer** ([python.org](https://www.python.org/downloads/))
- **Git** (optional, only if you clone the project)
- Either:
- **Ollama** installed and running ([ollama.com](https://ollama.com)), with at least one model pulled, **or**
- An **OpenAI API key** (if you use OpenAI in the chat)
---
## Install dependencies (step by step)
Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/) if you do not have Conda yet.
All commands below assume your terminal is open **in the project folder** (the folder that contains `requirements.txt`).
```bash
conda create -n local_rag python=3.10 -y
conda activate local_rag
pip install --upgrade pip
pip install -r requirements.txt
```
Use `conda activate local_rag` in every new terminal session before running `python` or `uvicorn` for this project.
### OpenAI (only if you use the OpenAI provider in the chat)
In the same terminal **before** starting the server:
```bash
export OPENAI_API_KEY="your-key-here"
```
On Windows (Command Prompt): `set OPENAI_API_KEY=your-key-here`
---
## Run Ollama (only if you use Ollama)
In a **separate** terminal:
```bash
ollama serve
```
In another terminal, pull a model once (example):
```bash
ollama pull gpt-oss:20b
```
The model name must match what you configure in `server.py` (see [Configuration reference](#configuration-reference)).
---
## Build the vector store from a folder of PDFs
The project includes [add_pdfs.py](add_pdfs.py). It finds every **`.pdf`** file under a folder you choose (including subfolders), then chunks, embeds, and saves to FAISS.
**Two modes** (set in the script):
| Setting | Behavior |
|---------|----------|
| `CLEAR_VECTORSTORE_FIRST = True` | Deletes the existing vector store folder, then builds a **new** index from the PDFs under `DATA_ROOT`. Use this for a full rebuild. |
| `CLEAR_VECTORSTORE_FIRST = False` | Keeps the current index (if it exists) and **merges** chunks from the PDFs under `DATA_ROOT` into it. Use this to add another batch of PDFs without wiping what you already indexed. |
**Steps:**
1. Open [add_pdfs.py](add_pdfs.py) in a text editor.
2. Set **`DATA_ROOT`** to the folder that contains your PDFs (absolute path or path relative to how you run the script).
3. Set **`CLEAR_VECTORSTORE_FIRST`** to `True` (fresh index) or `False` (append to existing store).
4. Optionally set **`VECTORSTORE_PATH`** (default: `./vectorstore`). It must match **`VECTORSTORE_PATH`** in [server.py](server.py) so the chat loads the same index.
5. From the project folder, with `conda activate local_rag` (or your chosen env name):
```bash
python add_pdfs.py
```
Indexing can take a long time for many large PDFs. When it finishes, you should see `Vector store saved to ...`.
**Note:** This script only indexes **PDF** files. To add `.txt` or `.md` files, use the Python snippet below or call `add_documents` yourself.
---
## Add more documents later (alternative to add_pdfs)
You can also merge files by hand with a short script (any mix of supported types):
```python
from local_rag import LocalRAG
rag = LocalRAG(vectorstore_path="./vectorstore") # same path as server.py
rag.add_documents([
"path/to/new1.pdf",
"path/to/notes.txt",
])
```
`add_documents` merges new chunks into the existing FAISS store and saves it again—the same behavior as [add_pdfs.py](add_pdfs.py) with `CLEAR_VECTORSTORE_FIRST = False`.
---
## Swap or experiment with different vector stores
The vector index is stored on disk under the folder given by **`VECTORSTORE_PATH`** (default `./vectorstore`). That folder contains files such as `index.faiss` and `index.pkl`.
**To use a different index:**
1. Set **`VECTORSTORE_PATH`** in both [server.py](server.py) and any script you use to build the index (e.g. [add_pdfs.py](add_pdfs.py)) to the **same** path, e.g. `./vectorstore_experiment`.
2. Rebuild the index (run `add_pdfs.py` or `add_documents`) so that folder is created.
3. **Restart** the web server so it loads the new path at startup.
**Tips:**
- Keep multiple copies of the folder (e.g. `vectorstore_backup`, `vectorstore_papers_only`) and swap `VECTORSTORE_PATH` to switch between them.
- If you change **chunk size**, **embedding model**, or **FAISS** usage in code, treat the old index as incompatible: use a new `VECTORSTORE_PATH` or delete the old folder and rebuild.
---
## Run the chat web app
With the Conda environment activated (`conda activate local_rag`) and (if needed) `OPENAI_API_KEY` set:
```bash
uvicorn server:app --reload
```
Open [http://127.0.0.1:8000](http://127.0.0.1:8000) or [http://localhost:8000](http://localhost:8000).
- Use the **LLM provider** dropdown: **Ollama** or **OpenAI** (OpenAI only works if the server was started with a valid `OPENAI_API_KEY`).
- You need a **non-empty vector store** (see above) for answers to work.
---
## API (short reference)
| Endpoint | Purpose |
|----------|---------|
| `POST /api/chat` | Body: `message`, optional `history`, optional `llm_provider` (`ollama` or `openai`) |
| `GET /api/providers` | Which providers are available (`openai` false if no API key at startup) |
| `GET /api/health` | Server and whether a vector store is loaded |
---
## How it works (high level)
1. **Load documents** PDFs via `PyPDFLoader`, text via `TextLoader`.
2. **Chunk** `RecursiveCharacterTextSplitter` (defaults in [local_rag.py](local_rag.py)).
3. **Embed** Hugging Face `sentence-transformers/all-MiniLM-L6-v2`.
4. **Store** FAISS; retrieval uses `similarity_search_with_score`.
5. **Query** Optional rephrase with chat history, retrieval, then answer from the LLM.
---
## Configuration reference (what to edit)
These are the main places to change behavior without restructuring the app.
### [server.py](server.py)
| What | Where |
|------|--------|
| Ollama model name | `OLLAMA_MODEL = "..."` |
| OpenAI model name | `OPENAI_MODEL = "..."` |
| Where the FAISS index is loaded from | `VECTORSTORE_PATH = "./vectorstore"` (must match your indexing script) |
### [local_rag.py](local_rag.py) `LocalRAG.__init__`
| What | Where (approx.) |
|------|------------------|
| Default vector store folder | Parameter `vectorstore_path="./vectorstore"` |
| Embedding model | `HuggingFaceEmbeddings(model_name="sentence-transformers/...")` |
| Chunk size and overlap | Module-level `CHUNK_SIZE` and `CHUNK_OVERLAP` (used by `RecursiveCharacterTextSplitter` when adding documents) |
| Default Ollama / OpenAI model strings | Parameters `ollama_model`, `openai_model`, `ollama_base_url` |
Changing the embedding model or chunk settings requires **rebuilding** the vector store (old index is not compatible).
### [local_rag.py](local_rag.py) `query_with_history`
| What | Where |
|------|--------|
| Default number of chunks retrieved (`k`) | Module-level `RETRIEVAL_K` (overrides: pass `k=` to `query` / `query_with_history`) |
| Extra text appended only to the **FAISS query** (biases retrieval, not the final answer phrasing) | `QUERY_ADDITIONAL_INSTRUCTIONS` (concatenated to the search query before embedding) |
| **Rephrase** prompt (standalone question when there is chat history) | String `rephrase_prompt = f"""..."""` inside `query_with_history` |
| **Answer** prompt opening instructions only | Module-level `ANSWER_PROMPT` (edit the role / style lines). The block from chat history through `Answer:` is built in `query_with_history` |
### [add_pdfs.py](add_pdfs.py)
| What | Where |
|------|--------|
| Folder to scan for PDFs | `DATA_ROOT = Path("...")` |
| Output vector store folder | `VECTORSTORE_PATH = "./vectorstore"` (keep in sync with `server.py`) |
| Wipe index vs merge | `CLEAR_VECTORSTORE_FIRST = True` (delete and rebuild) or `False` (append to existing index) |
---
## Dependencies (for developers)
See [requirements.txt](requirements.txt) for the full list (LangChain, FAISS, sentence-transformers, FastAPI, uvicorn, etc.).