# Local RAG Setup Minimal RAG implementation with LangChain, FAISS, and either **Ollama** (local) or **OpenAI** (API key). A web chat UI is included. --- ## What you need (before you start) - **Python 3.10 or newer** ([python.org](https://www.python.org/downloads/)) - **Git** (optional, only if you clone the project) - Either: - **Ollama** installed and running ([ollama.com](https://ollama.com)), with at least one model pulled, **or** - An **OpenAI API key** (if you use OpenAI in the chat) --- ## Install dependencies (step by step) Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) or [Anaconda](https://www.anaconda.com/) if you do not have Conda yet. All commands below assume your terminal is open **in the project folder** (the folder that contains `requirements.txt`). ```bash conda create -n local_rag python=3.10 -y conda activate local_rag pip install --upgrade pip pip install -r requirements.txt ``` Use `conda activate local_rag` in every new terminal session before running `python` or `uvicorn` for this project. ### OpenAI (only if you use the OpenAI provider in the chat) In the same terminal **before** starting the server: ```bash export OPENAI_API_KEY="your-key-here" ``` On Windows (Command Prompt): `set OPENAI_API_KEY=your-key-here` --- ## Run Ollama (only if you use Ollama) In a **separate** terminal: ```bash ollama serve ``` In another terminal, pull a model once (example): ```bash ollama pull gpt-oss:20b ``` The model name must match what you configure in `server.py` (see [Configuration reference](#configuration-reference)). --- ## Build the vector store from a folder of PDFs The project includes [add_pdfs.py](add_pdfs.py). It finds every **`.pdf`** file under a folder you choose (including subfolders), then chunks, embeds, and saves to FAISS. **Two modes** (set in the script): | Setting | Behavior | |---------|----------| | `CLEAR_VECTORSTORE_FIRST = True` | Deletes the existing vector store folder, then builds a **new** index from the PDFs under `DATA_ROOT`. Use this for a full rebuild. | | `CLEAR_VECTORSTORE_FIRST = False` | Keeps the current index (if it exists) and **merges** chunks from the PDFs under `DATA_ROOT` into it. Use this to add another batch of PDFs without wiping what you already indexed. | **Steps:** 1. Open [add_pdfs.py](add_pdfs.py) in a text editor. 2. Set **`DATA_ROOT`** to the folder that contains your PDFs (absolute path or path relative to how you run the script). 3. Set **`CLEAR_VECTORSTORE_FIRST`** to `True` (fresh index) or `False` (append to existing store). 4. Optionally set **`VECTORSTORE_PATH`** (default: `./vectorstore`). It must match **`VECTORSTORE_PATH`** in [server.py](server.py) so the chat loads the same index. 5. From the project folder, with `conda activate local_rag` (or your chosen env name): ```bash python add_pdfs.py ``` Indexing can take a long time for many large PDFs. When it finishes, you should see `Vector store saved to ...`. **Note:** This script only indexes **PDF** files. To add `.txt` or `.md` files, use the Python snippet below or call `add_documents` yourself. --- ## Add more documents later (alternative to add_pdfs) You can also merge files by hand with a short script (any mix of supported types): ```python from local_rag import LocalRAG rag = LocalRAG(vectorstore_path="./vectorstore") # same path as server.py rag.add_documents([ "path/to/new1.pdf", "path/to/notes.txt", ]) ``` `add_documents` merges new chunks into the existing FAISS store and saves it again—the same behavior as [add_pdfs.py](add_pdfs.py) with `CLEAR_VECTORSTORE_FIRST = False`. --- ## Swap or experiment with different vector stores The vector index is stored on disk under the folder given by **`VECTORSTORE_PATH`** (default `./vectorstore`). That folder contains files such as `index.faiss` and `index.pkl`. **To use a different index:** 1. Set **`VECTORSTORE_PATH`** in both [server.py](server.py) and any script you use to build the index (e.g. [add_pdfs.py](add_pdfs.py)) to the **same** path, e.g. `./vectorstore_experiment`. 2. Rebuild the index (run `add_pdfs.py` or `add_documents`) so that folder is created. 3. **Restart** the web server so it loads the new path at startup. **Tips:** - Keep multiple copies of the folder (e.g. `vectorstore_backup`, `vectorstore_papers_only`) and swap `VECTORSTORE_PATH` to switch between them. - If you change **chunk size**, **embedding model**, or **FAISS** usage in code, treat the old index as incompatible: use a new `VECTORSTORE_PATH` or delete the old folder and rebuild. --- ## Run the chat web app With the Conda environment activated (`conda activate local_rag`) and (if needed) `OPENAI_API_KEY` set: ```bash uvicorn server:app --reload ``` Open [http://127.0.0.1:8000](http://127.0.0.1:8000) or [http://localhost:8000](http://localhost:8000). - Use the **LLM provider** dropdown: **Ollama** or **OpenAI** (OpenAI only works if the server was started with a valid `OPENAI_API_KEY`). - You need a **non-empty vector store** (see above) for answers to work. --- ## API (short reference) | Endpoint | Purpose | |----------|---------| | `POST /api/chat` | Body: `message`, optional `history`, optional `llm_provider` (`ollama` or `openai`) | | `GET /api/providers` | Which providers are available (`openai` false if no API key at startup) | | `GET /api/health` | Server and whether a vector store is loaded | --- ## How it works (high level) 1. **Load documents** – PDFs via `PyPDFLoader`, text via `TextLoader`. 2. **Chunk** – `RecursiveCharacterTextSplitter` (defaults in [local_rag.py](local_rag.py)). 3. **Embed** – Hugging Face `sentence-transformers/all-MiniLM-L6-v2`. 4. **Store** – FAISS; retrieval uses `similarity_search_with_score`. 5. **Query** – Optional rephrase with chat history, retrieval, then answer from the LLM. --- ## Configuration reference (what to edit) These are the main places to change behavior without restructuring the app. ### [server.py](server.py) | What | Where | |------|--------| | Ollama model name | `OLLAMA_MODEL = "..."` | | OpenAI model name | `OPENAI_MODEL = "..."` | | Where the FAISS index is loaded from | `VECTORSTORE_PATH = "./vectorstore"` (must match your indexing script) | ### [local_rag.py](local_rag.py) – `LocalRAG.__init__` | What | Where (approx.) | |------|------------------| | Default vector store folder | Parameter `vectorstore_path="./vectorstore"` | | Embedding model | `HuggingFaceEmbeddings(model_name="sentence-transformers/...")` | | Chunk size and overlap | Module-level `CHUNK_SIZE` and `CHUNK_OVERLAP` (used by `RecursiveCharacterTextSplitter` when adding documents) | | Default Ollama / OpenAI model strings | Parameters `ollama_model`, `openai_model`, `ollama_base_url` | Changing the embedding model or chunk settings requires **rebuilding** the vector store (old index is not compatible). ### [local_rag.py](local_rag.py) – `query_with_history` | What | Where | |------|--------| | Default number of chunks retrieved (`k`) | Module-level `RETRIEVAL_K` (overrides: pass `k=` to `query` / `query_with_history`) | | Extra text appended only to the **FAISS query** (biases retrieval, not the final answer phrasing) | `QUERY_ADDITIONAL_INSTRUCTIONS` (concatenated to the search query before embedding) | | **Rephrase** prompt (standalone question when there is chat history) | String `rephrase_prompt = f"""..."""` inside `query_with_history` | | **Answer** prompt – opening instructions only | Module-level `ANSWER_PROMPT` (edit the role / style lines). The block from chat history through `Answer:` is built in `query_with_history` | ### [add_pdfs.py](add_pdfs.py) | What | Where | |------|--------| | Folder to scan for PDFs | `DATA_ROOT = Path("...")` | | Output vector store folder | `VECTORSTORE_PATH = "./vectorstore"` (keep in sync with `server.py`) | | Wipe index vs merge | `CLEAR_VECTORSTORE_FIRST = True` (delete and rebuild) or `False` (append to existing index) | --- ## Dependencies (for developers) See [requirements.txt](requirements.txt) for the full list (LangChain, FAISS, sentence-transformers, FastAPI, uvicorn, etc.).