206 lines
8.0 KiB
Markdown
206 lines
8.0 KiB
Markdown
# Local RAG Setup
|
||
|
||
Minimal RAG implementation with LangChain, FAISS, and either **Ollama** (local) or **OpenAI** (API key). A web chat UI is included.
|
||
|
||
---
|
||
|
||
## What you need (before you start)
|
||
|
||
- **Python 3.10 or newer** ([python.org](https://www.python.org/downloads/))
|
||
- **Git** (optional, only if you clone the project)
|
||
- Either:
|
||
- **Ollama** installed and running ([ollama.com](https://ollama.com)), with at least one model pulled, **or**
|
||
- An **OpenAI API key** (if you use OpenAI in the chat)
|
||
|
||
---
|
||
|
||
## Install dependencies (step by step)
|
||
|
||
Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/) if you do not have Conda yet.
|
||
|
||
All commands below assume your terminal is open **in the project folder** (the folder that contains `requirements.txt`).
|
||
|
||
```bash
|
||
conda create -n local_rag python=3.10 -y
|
||
conda activate local_rag
|
||
pip install --upgrade pip
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
Use `conda activate local_rag` in every new terminal session before running `python` or `uvicorn` for this project.
|
||
|
||
### OpenAI (only if you use the OpenAI provider in the chat)
|
||
|
||
In the same terminal **before** starting the server:
|
||
|
||
```bash
|
||
export OPENAI_API_KEY="your-key-here"
|
||
```
|
||
|
||
On Windows (Command Prompt): `set OPENAI_API_KEY=your-key-here`
|
||
|
||
---
|
||
|
||
## Run Ollama (only if you use Ollama)
|
||
|
||
In a **separate** terminal:
|
||
|
||
```bash
|
||
ollama serve
|
||
```
|
||
|
||
In another terminal, pull a model once (example):
|
||
|
||
```bash
|
||
ollama pull gpt-oss:20b
|
||
```
|
||
|
||
The model name must match what you configure in `server.py` (see [Configuration reference](#configuration-reference)).
|
||
|
||
---
|
||
|
||
## Build the vector store from a folder of PDFs
|
||
|
||
The project includes [add_pdfs.py](add_pdfs.py). It finds every **`.pdf`** file under a folder you choose (including subfolders), then chunks, embeds, and saves to FAISS.
|
||
|
||
**Two modes** (set in the script):
|
||
|
||
| Setting | Behavior |
|
||
|---------|----------|
|
||
| `CLEAR_VECTORSTORE_FIRST = True` | Deletes the existing vector store folder, then builds a **new** index from the PDFs under `DATA_ROOT`. Use this for a full rebuild. |
|
||
| `CLEAR_VECTORSTORE_FIRST = False` | Keeps the current index (if it exists) and **merges** chunks from the PDFs under `DATA_ROOT` into it. Use this to add another batch of PDFs without wiping what you already indexed. |
|
||
|
||
**Steps:**
|
||
|
||
1. Open [add_pdfs.py](add_pdfs.py) in a text editor.
|
||
2. Set **`DATA_ROOT`** to the folder that contains your PDFs (absolute path or path relative to how you run the script).
|
||
3. Set **`CLEAR_VECTORSTORE_FIRST`** to `True` (fresh index) or `False` (append to existing store).
|
||
4. Optionally set **`VECTORSTORE_PATH`** (default: `./vectorstore`). It must match **`VECTORSTORE_PATH`** in [server.py](server.py) so the chat loads the same index.
|
||
5. From the project folder, with `conda activate local_rag` (or your chosen env name):
|
||
|
||
```bash
|
||
python add_pdfs.py
|
||
```
|
||
|
||
Indexing can take a long time for many large PDFs. When it finishes, you should see `Vector store saved to ...`.
|
||
|
||
**Note:** This script only indexes **PDF** files. To add `.txt` or `.md` files, use the Python snippet below or call `add_documents` yourself.
|
||
|
||
---
|
||
|
||
## Add more documents later (alternative to add_pdfs)
|
||
|
||
You can also merge files by hand with a short script (any mix of supported types):
|
||
|
||
```python
|
||
from local_rag import LocalRAG
|
||
|
||
rag = LocalRAG(vectorstore_path="./vectorstore") # same path as server.py
|
||
rag.add_documents([
|
||
"path/to/new1.pdf",
|
||
"path/to/notes.txt",
|
||
])
|
||
```
|
||
|
||
`add_documents` merges new chunks into the existing FAISS store and saves it again—the same behavior as [add_pdfs.py](add_pdfs.py) with `CLEAR_VECTORSTORE_FIRST = False`.
|
||
|
||
---
|
||
|
||
## Swap or experiment with different vector stores
|
||
|
||
The vector index is stored on disk under the folder given by **`VECTORSTORE_PATH`** (default `./vectorstore`). That folder contains files such as `index.faiss` and `index.pkl`.
|
||
|
||
**To use a different index:**
|
||
|
||
1. Set **`VECTORSTORE_PATH`** in both [server.py](server.py) and any script you use to build the index (e.g. [add_pdfs.py](add_pdfs.py)) to the **same** path, e.g. `./vectorstore_experiment`.
|
||
2. Rebuild the index (run `add_pdfs.py` or `add_documents`) so that folder is created.
|
||
3. **Restart** the web server so it loads the new path at startup.
|
||
|
||
**Tips:**
|
||
|
||
- Keep multiple copies of the folder (e.g. `vectorstore_backup`, `vectorstore_papers_only`) and swap `VECTORSTORE_PATH` to switch between them.
|
||
- If you change **chunk size**, **embedding model**, or **FAISS** usage in code, treat the old index as incompatible: use a new `VECTORSTORE_PATH` or delete the old folder and rebuild.
|
||
|
||
---
|
||
|
||
## Run the chat web app
|
||
|
||
With the Conda environment activated (`conda activate local_rag`) and (if needed) `OPENAI_API_KEY` set:
|
||
|
||
```bash
|
||
uvicorn server:app --reload
|
||
```
|
||
|
||
Open [http://127.0.0.1:8000](http://127.0.0.1:8000) or [http://localhost:8000](http://localhost:8000).
|
||
|
||
- Use the **LLM provider** dropdown: **Ollama** or **OpenAI** (OpenAI only works if the server was started with a valid `OPENAI_API_KEY`).
|
||
- You need a **non-empty vector store** (see above) for answers to work.
|
||
|
||
---
|
||
|
||
## API (short reference)
|
||
|
||
| Endpoint | Purpose |
|
||
|----------|---------|
|
||
| `POST /api/chat` | Body: `message`, optional `history`, optional `llm_provider` (`ollama` or `openai`) |
|
||
| `GET /api/providers` | Which providers are available (`openai` false if no API key at startup) |
|
||
| `GET /api/health` | Server and whether a vector store is loaded |
|
||
|
||
---
|
||
|
||
## How it works (high level)
|
||
|
||
1. **Load documents** – PDFs via `PyPDFLoader`, text via `TextLoader`.
|
||
2. **Chunk** – `RecursiveCharacterTextSplitter` (defaults in [local_rag.py](local_rag.py)).
|
||
3. **Embed** – Hugging Face `sentence-transformers/all-MiniLM-L6-v2`.
|
||
4. **Store** – FAISS; retrieval uses `similarity_search_with_score`.
|
||
5. **Query** – Optional rephrase with chat history, retrieval, then answer from the LLM.
|
||
|
||
---
|
||
|
||
## Configuration reference (what to edit)
|
||
|
||
These are the main places to change behavior without restructuring the app.
|
||
|
||
### [server.py](server.py)
|
||
|
||
| What | Where |
|
||
|------|--------|
|
||
| Ollama model name | `OLLAMA_MODEL = "..."` |
|
||
| OpenAI model name | `OPENAI_MODEL = "..."` |
|
||
| Where the FAISS index is loaded from | `VECTORSTORE_PATH = "./vectorstore"` (must match your indexing script) |
|
||
|
||
### [local_rag.py](local_rag.py) – `LocalRAG.__init__`
|
||
|
||
| What | Where (approx.) |
|
||
|------|------------------|
|
||
| Default vector store folder | Parameter `vectorstore_path="./vectorstore"` |
|
||
| Embedding model | `HuggingFaceEmbeddings(model_name="sentence-transformers/...")` |
|
||
| Chunk size and overlap | Module-level `CHUNK_SIZE` and `CHUNK_OVERLAP` (used by `RecursiveCharacterTextSplitter` when adding documents) |
|
||
| Default Ollama / OpenAI model strings | Parameters `ollama_model`, `openai_model`, `ollama_base_url` |
|
||
|
||
Changing the embedding model or chunk settings requires **rebuilding** the vector store (old index is not compatible).
|
||
|
||
### [local_rag.py](local_rag.py) – `query_with_history`
|
||
|
||
| What | Where |
|
||
|------|--------|
|
||
| Default number of chunks retrieved (`k`) | Module-level `RETRIEVAL_K` (overrides: pass `k=` to `query` / `query_with_history`) |
|
||
| Extra text appended only to the **FAISS query** (biases retrieval, not the final answer phrasing) | `QUERY_ADDITIONAL_INSTRUCTIONS` (concatenated to the search query before embedding) |
|
||
| **Rephrase** prompt (standalone question when there is chat history) | String `rephrase_prompt = f"""..."""` inside `query_with_history` |
|
||
| **Answer** prompt – opening instructions only | Module-level `ANSWER_PROMPT` (edit the role / style lines). The block from chat history through `Answer:` is built in `query_with_history` |
|
||
|
||
### [add_pdfs.py](add_pdfs.py)
|
||
|
||
| What | Where |
|
||
|------|--------|
|
||
| Folder to scan for PDFs | `DATA_ROOT = Path("...")` |
|
||
| Output vector store folder | `VECTORSTORE_PATH = "./vectorstore"` (keep in sync with `server.py`) |
|
||
| Wipe index vs merge | `CLEAR_VECTORSTORE_FIRST = True` (delete and rebuild) or `False` (append to existing index) |
|
||
|
||
---
|
||
|
||
## Dependencies (for developers)
|
||
|
||
See [requirements.txt](requirements.txt) for the full list (LangChain, FAISS, sentence-transformers, FastAPI, uvicorn, etc.).
|