Add LLM provider switch, markdown chat UI, and update README

- Dual RAG instances (Ollama + OpenAI) for on-the-fly provider switching
- Provider dropdown in chat UI, /api/providers endpoint
- Markdown rendering for assistant responses
- Server logs include provider and model name for each LLM response
- README: OpenAI setup, add_pdfs, API docs, provider switch

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Philipp Mock 2026-02-11 16:09:42 +01:00
parent 650f73a74b
commit 9abda1d867
3 changed files with 131 additions and 46 deletions

View File

@ -1,78 +1,90 @@
# Local RAG Setup # Local RAG Setup
Minimal RAG implementation with LangChain, Ollama, and FAISS. Minimal RAG implementation with LangChain, FAISS, and support for either Ollama or OpenAI (API-key needed).
## Dependencies ## Dependencies
- `langchain` - Core framework - `langchain` - Core framework
- `langchain-community` - Community integrations (loaders, vectorstores) - `langchain-community` - Loaders, vectorstores
- `langchain-ollama` - Ollama integration - `langchain-ollama` - Ollama integration
- `langchain-text-splitters` - Text splitting utilities - `langchain-openai` - OpenAI integration
- `langchain-text-splitters` - Text splitting
- `langchain-huggingface` - HuggingFace embeddings - `langchain-huggingface` - HuggingFace embeddings
- `faiss-cpu` - Vector search - `faiss-cpu` - Vector search
- `sentence-transformers` - Embeddings - `sentence-transformers` - Embeddings
- `pypdf` - PDF loading - `pypdf` - PDF loading
- `fastapi` - Web server and API - `fastapi` - Web server
- `uvicorn` - ASGI server - `uvicorn` - ASGI server
## Installation ## Installation
```bash ```bash
# Create conda environment
conda create -n local_rag python=3.10 -y conda create -n local_rag python=3.10 -y
conda activate local_rag conda activate local_rag
# Install dependencies
pip install -r requirements.txt pip install -r requirements.txt
``` ```
## Setup Ollama ## Setup
### Ollama (optional)
```bash ```bash
# Make sure Ollama is running
ollama serve ollama serve
ollama pull mistral
# Pull a model (in another terminal)
ollama pull llama2
``` ```
## Usage ### OpenAI (optional)
Edit `local_rag.py` and uncomment the example code: Set the API key when using OpenAI:
```bash
export OPENAI_API_KEY="your-key"
```
## Add Documents
**Option 1:** Add PDFs from a folder via script. Edit `DATA_ROOT` in [add_pdfs.py](add_pdfs.py) to point at your folder, then run:
```bash
python add_pdfs.py
```
The script clears the existing vector store and indexes all PDFs recursively. Supports `.pdf`, `.txt`, `.md`.
**Option 2:** Use `local_rag.py` programmatically:
```python ```python
# Add documents from local_rag import LocalRAG
rag.add_documents([ rag = LocalRAG()
"path/to/document1.pdf", rag.add_documents(["path/to/doc1.pdf", "path/to/doc2.txt"])
"path/to/document2.txt"
])
# Query
question = "What is this document about?"
answer = rag.query(question)
print(f"Answer: {answer}")
``` ```
Run: ## Chat GUI
```bash
python local_rag.py
```
## Chat GUI (FastAPI) Start the server:
A simple web chat interface is included. Start the server:
```bash ```bash
uvicorn server:app --reload uvicorn server:app --reload
``` ```
Then open [http://localhost:8000](http://localhost:8000) in your browser. The chat view uses the same RAG system: your messages are answered using the vector store and Ollama. Ensure your vector store is populated (e.g. by running the document-add steps in `local_rag.py` once) and that Ollama is running. Open [http://localhost:8000](http://localhost:8000). The chat UI provides:
- **Provider switch** Toggle between Ollama and OpenAI without restart (OpenAI requires `OPENAI_API_KEY`)
- **Conversation history** Multi-turn chat with context
- **Markdown** Assistant replies rendered as markdown (headings, code, lists, links)
Ensure the vector store is populated and at least one provider (Ollama or OpenAI) is configured.
## API
- `POST /api/chat` `{ "message": "...", "history": [...], "llm_provider": "ollama"|"openai" }`
- `GET /api/providers` `{ "ollama": true, "openai": true|false }`
- `GET /api/health` Health and vectorstore status
## How it works ## How it works
1. **Load documents** - PDFs or text files 1. **Load documents** PDFs or text via PyPDFLoader / TextLoader
2. **Split into chunks** - 1000 chars with 200 overlap 2. **Chunk** RecursiveCharacterTextSplitter (2000 chars, 400 overlap)
3. **Create embeddings** - Using sentence-transformers 3. **Embed** sentence-transformers/all-MiniLM-L6-v2
4. **Store in FAISS** - Fast similarity search 4. **Store** FAISS vector store (similarity search with scores)
5. **Query** - Retrieve relevant chunks and generate answer with Ollama 5. **Query** Retrieve chunks, optionally rephrase with conversation history, generate answer with selected LLM

View File

@ -2,6 +2,7 @@
FastAPI server for Local RAG with chat GUI. FastAPI server for Local RAG with chat GUI.
Run with: uvicorn server:app --reload Run with: uvicorn server:app --reload
""" """
import os
from pathlib import Path from pathlib import Path
from fastapi import FastAPI, HTTPException from fastapi import FastAPI, HTTPException
@ -10,18 +11,28 @@ from pydantic import BaseModel
from local_rag import LocalRAG from local_rag import LocalRAG
# LLM provider: "ollama" or "openai"
LLM_PROVIDER = "openai"
OLLAMA_MODEL = "gpt-oss:20b" OLLAMA_MODEL = "gpt-oss:20b"
OPENAI_MODEL = "gpt-5.2" OPENAI_MODEL = "gpt-5.2"
VECTORSTORE_PATH = "./vectorstore" VECTORSTORE_PATH = "./vectorstore"
rag = LocalRAG(
# Dual RAG instances for on-the-fly provider switching
rag_ollama = LocalRAG(
vectorstore_path=VECTORSTORE_PATH, vectorstore_path=VECTORSTORE_PATH,
llm_provider=LLM_PROVIDER, llm_provider="ollama",
ollama_model=OLLAMA_MODEL, ollama_model=OLLAMA_MODEL,
openai_model=OPENAI_MODEL, openai_model=OPENAI_MODEL,
) )
rag_openai = None
if os.environ.get("OPENAI_API_KEY"):
try:
rag_openai = LocalRAG(
vectorstore_path=VECTORSTORE_PATH,
llm_provider="openai",
ollama_model=OLLAMA_MODEL,
openai_model=OPENAI_MODEL,
)
except Exception as e:
print(f"OpenAI RAG not available: {e}")
app = FastAPI(title="Local RAG Chat", version="1.0.0") app = FastAPI(title="Local RAG Chat", version="1.0.0")
@ -34,6 +45,7 @@ class ChatMessage(BaseModel):
class ChatRequest(BaseModel): class ChatRequest(BaseModel):
message: str message: str
history: list[ChatMessage] = [] # previous turns for conversation context history: list[ChatMessage] = [] # previous turns for conversation context
llm_provider: str = "ollama" # "ollama" | "openai"
class RetrievedChunk(BaseModel): class RetrievedChunk(BaseModel):
@ -58,11 +70,21 @@ def chat_view():
return HTMLResponse(content=html_path.read_text(encoding="utf-8")) return HTMLResponse(content=html_path.read_text(encoding="utf-8"))
def _get_rag(provider: str):
"""Return the RAG instance for the given provider. Fall back to Ollama if OpenAI unavailable."""
if provider == "openai" and rag_openai is not None:
return rag_openai
return rag_ollama
@app.post("/api/chat", response_model=ChatResponse) @app.post("/api/chat", response_model=ChatResponse)
def chat(request: ChatRequest): def chat(request: ChatRequest):
"""Handle a chat message and return the RAG answer.""" """Handle a chat message and return the RAG answer."""
if not request.message or not request.message.strip(): if not request.message or not request.message.strip():
return ChatResponse(answer="", error="Message cannot be empty") return ChatResponse(answer="", error="Message cannot be empty")
if request.llm_provider == "openai" and rag_openai is None:
return ChatResponse(answer="", error="OpenAI not configured. Set OPENAI_API_KEY.")
rag = _get_rag(request.llm_provider)
try: try:
chat_history = [{"role": m.role, "content": m.content} for m in request.history] chat_history = [{"role": m.role, "content": m.content} for m in request.history]
result = rag.query_with_history( result = rag.query_with_history(
@ -81,7 +103,9 @@ def chat(request: ChatRequest):
print(f" [{i + 1}] {chunk.get('source', '')} p.{chunk.get('page', '?')} s={chunk.get('score')} | {preview!r}") print(f" [{i + 1}] {chunk.get('source', '')} p.{chunk.get('page', '?')} s={chunk.get('score')} | {preview!r}")
else: else:
print(f"\n[RAG] Retrieved 0 chunks") print(f"\n[RAG] Retrieved 0 chunks")
print(f"[RAG] LLM response:\n{answer}") provider_label = "OpenAI" if request.llm_provider == "openai" else "Ollama"
model_name = OPENAI_MODEL if request.llm_provider == "openai" else OLLAMA_MODEL
print(f"[RAG] LLM response ({provider_label} / {model_name}):\n{answer}")
return ChatResponse(answer=answer, retrieved=retrieved) return ChatResponse(answer=answer, retrieved=retrieved)
except Exception as e: except Exception as e:
@ -91,10 +115,16 @@ def chat(request: ChatRequest):
@app.get("/api/health") @app.get("/api/health")
def health(): def health():
"""Health check and vector store status.""" """Health check and vector store status."""
has_docs = rag.vectorstore is not None has_docs = rag_ollama.vectorstore is not None
return {"status": "ok", "vectorstore_loaded": has_docs} return {"status": "ok", "vectorstore_loaded": has_docs}
@app.get("/api/providers")
def providers():
"""Return which LLM providers are available."""
return {"ollama": True, "openai": rag_openai is not None}
if __name__ == "__main__": if __name__ == "__main__":
import uvicorn import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000) uvicorn.run(app, host="0.0.0.0", port=8000)

View File

@ -112,6 +112,24 @@
border-top: 1px solid #27272a; border-top: 1px solid #27272a;
background: #18181b; background: #18181b;
} }
#provider-row {
display: flex;
align-items: center;
gap: 0.5rem;
margin-bottom: 0.5rem;
font-size: 0.8rem;
color: #71717a;
}
#provider-row label { flex-shrink: 0; }
#provider {
padding: 0.25rem 0.5rem;
font: inherit;
font-size: 0.85rem;
color: #e4e4e7;
background: #27272a;
border: 1px solid #3f3f46;
border-radius: 6px;
}
#input-row { #input-row {
display: flex; display: flex;
gap: 0.5rem; gap: 0.5rem;
@ -168,6 +186,13 @@
<div id="messages"></div> <div id="messages"></div>
<div id="input-area"> <div id="input-area">
<div id="provider-row">
<label for="provider">LLM provider:</label>
<select id="provider">
<option value="ollama">Ollama</option>
<option value="openai">OpenAI</option>
</select>
</div>
<div id="input-row"> <div id="input-row">
<textarea id="input" rows="1" placeholder="Ask a question…" autofocus></textarea> <textarea id="input" rows="1" placeholder="Ask a question…" autofocus></textarea>
<button type="button" id="send">Send</button> <button type="button" id="send">Send</button>
@ -178,8 +203,22 @@
const messagesEl = document.getElementById('messages'); const messagesEl = document.getElementById('messages');
const inputEl = document.getElementById('input'); const inputEl = document.getElementById('input');
const sendBtn = document.getElementById('send'); const sendBtn = document.getElementById('send');
const providerEl = document.getElementById('provider');
const chatHistory = []; const chatHistory = [];
(async function initProviders() {
try {
const res = await fetch('/api/providers');
const data = await res.json();
if (!data.openai) {
const opt = providerEl.querySelector('option[value="openai"]');
opt.disabled = true;
opt.textContent = 'OpenAI (not configured)';
if (providerEl.value === 'openai') providerEl.value = 'ollama';
}
} catch (_) {}
})();
function appendMessage(role, text, isError = false) { function appendMessage(role, text, isError = false) {
text = text ?? ''; text = text ?? '';
const div = document.createElement('div'); const div = document.createElement('div');
@ -231,7 +270,11 @@
const res = await fetch('/api/chat', { const res = await fetch('/api/chat', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: text, history: history }) body: JSON.stringify({
message: text,
history: history,
llm_provider: providerEl.value
})
}); });
const data = await res.json(); const data = await res.json();
setLoading(false); setLoading(false);