Python 3.10 or newer (python.org)
Git (optional, only if you clone the project)
Either:
- Ollama installed and running (ollama.com), with at least one model pulled, or
- An OpenAI API key (if you use OpenAI in the chat)

Install dependencies (step by step)

Install Miniconda or Anaconda if you do not have Conda yet.

All commands below assume your terminal is open in the project folder (the folder that contains requirements.txt).

conda create -n local_rag python=3.10 -y
conda activate local_rag
pip install --upgrade pip
pip install -r requirements.txt

Use conda activate local_rag in every new terminal session before running python or uvicorn for this project.

OpenAI (only if you use the OpenAI provider in the chat)

In the same terminal before starting the server:

export OPENAI_API_KEY="your-key-here"

On Windows (Command Prompt): set OPENAI_API_KEY=your-key-here

Run Ollama (only if you use Ollama)

In a separate terminal:

ollama serve

In another terminal, pull a model once (example):

ollama pull gpt-oss:20b

The model name must match what you configure in server.py (see Configuration reference).

Build the vector store from a folder of PDFs

The project includes add_pdfs.py. It finds every .pdf file under a folder you choose (including subfolders), then chunks, embeds, and saves to FAISS.

Two modes (set in the script):

Setting	Behavior
`CLEAR_VECTORSTORE_FIRST = True`	Deletes the existing vector store folder, then builds a new index from the PDFs under `DATA_ROOT`. Use this for a full rebuild.
`CLEAR_VECTORSTORE_FIRST = False`	Keeps the current index (if it exists) and merges chunks from the PDFs under `DATA_ROOT` into it. Use this to add another batch of PDFs without wiping what you already indexed.

Steps:

Open add_pdfs.py in a text editor.
Set DATA_ROOT to the folder that contains your PDFs (absolute path or path relative to how you run the script).
Set CLEAR_VECTORSTORE_FIRST to True (fresh index) or False (append to existing store).
Optionally set VECTORSTORE_PATH (default: ./vectorstore). It must match VECTORSTORE_PATH in server.py so the chat loads the same index.
From the project folder, with conda activate local_rag (or your chosen env name):

python add_pdfs.py

Indexing can take a long time for many large PDFs. When it finishes, you should see Vector store saved to ....

Note: This script only indexes PDF files. To add .txt or .md files, use the Python snippet below or call add_documents yourself.

Add more documents later (alternative to add_pdfs)

You can also merge files by hand with a short script (any mix of supported types):

from local_rag import LocalRAG

rag = LocalRAG(vectorstore_path="./vectorstore")  # same path as server.py
rag.add_documents([
    "path/to/new1.pdf",
    "path/to/notes.txt",
])

add_documents merges new chunks into the existing FAISS store and saves it again—the same behavior as add_pdfs.py with CLEAR_VECTORSTORE_FIRST = False.

Swap or experiment with different vector stores

The vector index is stored on disk under the folder given by VECTORSTORE_PATH (default ./vectorstore). That folder contains files such as index.faiss and index.pkl.

To use a different index:

Set VECTORSTORE_PATH in both server.py and any script you use to build the index (e.g. add_pdfs.py) to the same path, e.g. ./vectorstore_experiment.
Rebuild the index (run add_pdfs.py or add_documents) so that folder is created.
Restart the web server so it loads the new path at startup.

Tips:

Keep multiple copies of the folder (e.g. vectorstore_backup, vectorstore_papers_only) and swap VECTORSTORE_PATH to switch between them.
If you change chunk size, embedding model, or FAISS usage in code, treat the old index as incompatible: use a new VECTORSTORE_PATH or delete the old folder and rebuild.

Run the chat web app

With the Conda environment activated (conda activate local_rag) and (if needed) OPENAI_API_KEY set:

uvicorn server:app --reload

Open http://127.0.0.1:8000 or http://localhost:8000.

Use the LLM provider dropdown: Ollama or OpenAI (OpenAI only works if the server was started with a valid OPENAI_API_KEY).
You need a non-empty vector store (see above) for answers to work.

API (short reference)

Endpoint	Purpose
`POST /api/chat`	Body: `message`, optional `history`, optional `llm_provider` (`ollama` or `openai`)
`GET /api/providers`	Which providers are available (`openai` false if no API key at startup)
`GET /api/health`	Server and whether a vector store is loaded

How it works (high level)

Load documents – PDFs via PyPDFLoader, text via TextLoader.
Chunk – RecursiveCharacterTextSplitter (defaults in local_rag.py).
Embed – Hugging Face sentence-transformers/all-MiniLM-L6-v2.
Store – FAISS; retrieval uses similarity_search_with_score.
Query – Optional rephrase with chat history, retrieval, then answer from the LLM.

Configuration reference (what to edit)

These are the main places to change behavior without restructuring the app.

server.py

What	Where
Ollama model name	`OLLAMA_MODEL = "..."`
OpenAI model name	`OPENAI_MODEL = "..."`
Where the FAISS index is loaded from	`VECTORSTORE_PATH = "./vectorstore"` (must match your indexing script)

local_rag.py – `LocalRAG.init`

What	Where (approx.)
Default vector store folder	Parameter `vectorstore_path="./vectorstore"`
Embedding model	`HuggingFaceEmbeddings(model_name="sentence-transformers/...")`
Chunk size and overlap	Module-level `CHUNK_SIZE` and `CHUNK_OVERLAP` (used by `RecursiveCharacterTextSplitter` when adding documents)
Default Ollama / OpenAI model strings	Parameters `ollama_model`, `openai_model`, `ollama_base_url`

Changing the embedding model or chunk settings requires rebuilding the vector store (old index is not compatible).

local_rag.py – `query_with_history`

What	Where
Default number of chunks retrieved (`k`)	Module-level `RETRIEVAL_K` (overrides: pass `k=` to `query` / `query_with_history`)
Extra text appended only to the FAISS query (biases retrieval, not the final answer phrasing)	`QUERY_ADDITIONAL_INSTRUCTIONS` (concatenated to the search query before embedding)
Rephrase prompt (standalone question when there is chat history)	String `rephrase_prompt = f"""..."""` inside `query_with_history`
Answer prompt – opening instructions only	Module-level `ANSWER_PROMPT` (edit the role / style lines). The block from chat history through `Answer:` is built in `query_with_history`

add_pdfs.py

What	Where
Folder to scan for PDFs	`DATA_ROOT = Path("...")`
Output vector store folder	`VECTORSTORE_PATH = "./vectorstore"` (keep in sync with `server.py`)
Wipe index vs merge	`CLEAR_VECTORSTORE_FIRST = True` (delete and rebuild) or `False` (append to existing index)

Dependencies (for developers)

See requirements.txt for the full list (LangChain, FAISS, sentence-transformers, FastAPI, uvicorn, etc.).

README.md Unescape Escape

Local RAG Setup

What you need (before you start)