// 08 / Local AI Stack · 17 Apr 2026
// the architecture
With all three services from Part 1 running, a single question fans out to SearXNG for live web results and MeiliSearch for local document chunks, then both are combined and sent to Ollama to synthesise a grounded answer.
User query
│
├── SearXNG ──► web results (title, url, snippet × 10)
│
├── MeiliSearch ──► local doc chunks (content, path × 5)
│
└── Ollama ◄── combined context + original query
│
└──► synthesised answer with source citations
The model never guesses — it's grounded in what was actually retrieved. Web results give it current information; local docs give it project-specific context. That combination is what makes this genuinely useful rather than a toy demo.
// implementation
pip install fastapi uvicorn httpx meilisearch ollama --break-system-packages
from fastapi import FastAPI
import httpx, meilisearch, ollama
app = FastAPI()
meili = meilisearch.Client("http://localhost:7700", "your-key")
@app.get("/search")
async def search(q: str):
# 1 — web results from SearXNG
async with httpx.AsyncClient() as c:
web_resp = await c.get("http://localhost:8080/search",
params={"q": q, "format": "json"}, timeout=10)
web_hits = web_resp.json().get("results", [])[:5]
# 2 — local doc results from MeiliSearch
local_hits = meili.index("documents").search(q, {"limit": 3})["hits"]
# 3 — build combined context
context = "WEB RESULTS:\n"
context += "".join(f"- {r['title']}: {r.get('content','')}\n" for r in web_hits)
context += "\nLOCAL DOCS:\n"
context += "".join(f"- {r['title']} ({r['filename']}): {r['content'][:300]}\n"
for r in local_hits)
# 4 — ask Ollama to synthesise an answer
prompt = f"Using these sources, answer: {q}\n\n{context}"
response = ollama.chat(model="qwen2.5-coder:7b",
messages=[{"role": "user", "content": prompt}])
return {"answer": response["message"]["content"],
"web": web_hits, "local": local_hits}
uvicorn main:app --reload --port 8000
Query via http://localhost:8000/search?q=your+question or browse the auto-generated API docs at http://localhost:8000/docs
// in action
Query: "How do I implement a brickwall limiter in JUCE?" — showing how web results and local project docs combine into a grounded, specific answer.
dsp::LookAheadLimiter from the JUCE DSP module. Set a lookahead of 2–5ms to allow gain reduction to anticipate peaks. Your own notes (grecamp_dsp_notes.md) document a peak follower approach with 0.1ms attack that works well for the power amp simulation stage. Combine with a soft-clip function at the output to handle inter-sample peaks. Sources: juce.com DSP docs, KVR Audio forum, your local GRECAmp notes.