Local RAG Setup
Minimal RAG implementation with LangChain, FAISS, and either Ollama (local) or OpenAI (API key). A web chat UI is included.
What you need (before you start)
- Python 3.10 or newer (python.org)
- Git (optional, only if you clone the project)
- Either:
- Ollama installed and running (ollama.com), with at least one model pulled, or
- An OpenAI API key (if you use OpenAI in the chat)
Install dependencies (step by step)
Install Miniconda or Anaconda if you do not have Conda yet.
All commands below assume your terminal is open in the project folder (the folder that contains requirements.txt).
conda create -n local_rag python=3.10 -y
conda activate local_rag
pip install --upgrade pip
pip install -r requirements.txt
Use conda activate local_rag in every new terminal session before running python or uvicorn for this project.
OpenAI (only if you use the OpenAI provider in the chat)
In the same terminal before starting the server:
export OPENAI_API_KEY="your-key-here"
On Windows (Command Prompt): set OPENAI_API_KEY=your-key-here
Run Ollama (only if you use Ollama)
In a separate terminal:
ollama serve
In another terminal, pull a model once (example):
ollama pull gpt-oss:20b
The model name must match what you configure in server.py (see Configuration reference).
Build the vector store from a folder of PDFs
The project includes add_pdfs.py. It finds every .pdf file under a folder you choose (including subfolders), then chunks, embeds, and saves to FAISS.
Two modes (set in the script):
| Setting | Behavior |
|---|---|
CLEAR_VECTORSTORE_FIRST = True |
Deletes the existing vector store folder, then builds a new index from the PDFs under DATA_ROOT. Use this for a full rebuild. |
CLEAR_VECTORSTORE_FIRST = False |
Keeps the current index (if it exists) and merges chunks from the PDFs under DATA_ROOT into it. Use this to add another batch of PDFs without wiping what you already indexed. |
Steps:
- Open add_pdfs.py in a text editor.
- Set
DATA_ROOTto the folder that contains your PDFs (absolute path or path relative to how you run the script). - Set
CLEAR_VECTORSTORE_FIRSTtoTrue(fresh index) orFalse(append to existing store). - Optionally set
VECTORSTORE_PATH(default:./vectorstore). It must matchVECTORSTORE_PATHin server.py so the chat loads the same index. - From the project folder, with
conda activate local_rag(or your chosen env name):
python add_pdfs.py
Indexing can take a long time for many large PDFs. When it finishes, you should see Vector store saved to ....
Note: This script only indexes PDF files. To add .txt or .md files, use the Python snippet below or call add_documents yourself.
Add more documents later (alternative to add_pdfs)
You can also merge files by hand with a short script (any mix of supported types):
from local_rag import LocalRAG
rag = LocalRAG(vectorstore_path="./vectorstore") # same path as server.py
rag.add_documents([
"path/to/new1.pdf",
"path/to/notes.txt",
])
add_documents merges new chunks into the existing FAISS store and saves it again—the same behavior as add_pdfs.py with CLEAR_VECTORSTORE_FIRST = False.
Swap or experiment with different vector stores
The vector index is stored on disk under the folder given by VECTORSTORE_PATH (default ./vectorstore). That folder contains files such as index.faiss and index.pkl.
To use a different index:
- Set
VECTORSTORE_PATHin both server.py and any script you use to build the index (e.g. add_pdfs.py) to the same path, e.g../vectorstore_experiment. - Rebuild the index (run
add_pdfs.pyoradd_documents) so that folder is created. - Restart the web server so it loads the new path at startup.
Tips:
- Keep multiple copies of the folder (e.g.
vectorstore_backup,vectorstore_papers_only) and swapVECTORSTORE_PATHto switch between them. - If you change chunk size, embedding model, or FAISS usage in code, treat the old index as incompatible: use a new
VECTORSTORE_PATHor delete the old folder and rebuild.
Run the chat web app
With the Conda environment activated (conda activate local_rag) and (if needed) OPENAI_API_KEY set:
uvicorn server:app --reload
Open http://127.0.0.1:8000 or http://localhost:8000.
- Use the LLM provider dropdown: Ollama or OpenAI (OpenAI only works if the server was started with a valid
OPENAI_API_KEY). - You need a non-empty vector store (see above) for answers to work.
API (short reference)
| Endpoint | Purpose |
|---|---|
POST /api/chat |
Body: message, optional history, optional llm_provider (ollama or openai) |
GET /api/providers |
Which providers are available (openai false if no API key at startup) |
GET /api/health |
Server and whether a vector store is loaded |
How it works (high level)
- Load documents – PDFs via
PyPDFLoader, text viaTextLoader. - Chunk –
RecursiveCharacterTextSplitter(defaults in local_rag.py). - Embed – Hugging Face
sentence-transformers/all-MiniLM-L6-v2. - Store – FAISS; retrieval uses
similarity_search_with_score. - Query – Optional rephrase with chat history, retrieval, then answer from the LLM.
Configuration reference (what to edit)
These are the main places to change behavior without restructuring the app.
server.py
| What | Where |
|---|---|
| Ollama model name | OLLAMA_MODEL = "..." |
| OpenAI model name | OPENAI_MODEL = "..." |
| Where the FAISS index is loaded from | VECTORSTORE_PATH = "./vectorstore" (must match your indexing script) |
local_rag.py – LocalRAG.__init__
| What | Where (approx.) |
|---|---|
| Default vector store folder | Parameter vectorstore_path="./vectorstore" |
| Embedding model | HuggingFaceEmbeddings(model_name="sentence-transformers/...") |
| Chunk size and overlap | Module-level CHUNK_SIZE and CHUNK_OVERLAP (used by RecursiveCharacterTextSplitter when adding documents) |
| Default Ollama / OpenAI model strings | Parameters ollama_model, openai_model, ollama_base_url |
Changing the embedding model or chunk settings requires rebuilding the vector store (old index is not compatible).
local_rag.py – query_with_history
| What | Where |
|---|---|
Default number of chunks retrieved (k) |
Module-level RETRIEVAL_K (overrides: pass k= to query / query_with_history) |
| Extra text appended only to the FAISS query (biases retrieval, not the final answer phrasing) | QUERY_ADDITIONAL_INSTRUCTIONS (concatenated to the search query before embedding) |
| Rephrase prompt (standalone question when there is chat history) | String rephrase_prompt = f"""...""" inside query_with_history |
| Answer prompt – opening instructions only | Module-level ANSWER_PROMPT (edit the role / style lines). The block from chat history through Answer: is built in query_with_history |
add_pdfs.py
| What | Where |
|---|---|
| Folder to scan for PDFs | DATA_ROOT = Path("...") |
| Output vector store folder | VECTORSTORE_PATH = "./vectorstore" (keep in sync with server.py) |
| Wipe index vs merge | CLEAR_VECTORSTORE_FIRST = True (delete and rebuild) or False (append to existing index) |
Dependencies (for developers)
See requirements.txt for the full list (LangChain, FAISS, sentence-transformers, FastAPI, uvicorn, etc.).