← Back to Work

// 08 / Local AI Stack  ·  17 Apr 2026

Perplexity-Style
AI Search — Local

StackFastAPI · Ollama · SearXNG · MeiliSearch
Modelqwen2.5-coder:7b
Cloud cost£0

// the architecture

One query, three sources, one answer

With all three services from Part 1 running, a single question fans out to SearXNG for live web results and MeiliSearch for local document chunks, then both are combined and sent to Ollama to synthesise a grounded answer.

flow
User query
    
    ├── SearXNG      ──► web results (title, url, snippet × 10)
    
    ├── MeiliSearch  ──► local doc chunks (content, path × 5)
    
    └── Ollama       ◄── combined context + original query
             
             └──► synthesised answer with source citations

The model never guesses — it's grounded in what was actually retrieved. Web results give it current information; local docs give it project-specific context. That combination is what makes this genuinely useful rather than a toy demo.

// implementation

The FastAPI search endpoint

STEP 01

Install dependencies

bash
pip install fastapi uvicorn httpx meilisearch ollama --break-system-packages
STEP 02

The search endpoint

python~/ai-stack/main.py
from fastapi import FastAPI
import httpx, meilisearch, ollama

app   = FastAPI()
meili = meilisearch.Client("http://localhost:7700", "your-key")

@app.get("/search")
async def search(q: str):

    # 1 — web results from SearXNG
    async with httpx.AsyncClient() as c:
        web_resp = await c.get("http://localhost:8080/search",
            params={"q": q, "format": "json"}, timeout=10)
    web_hits = web_resp.json().get("results", [])[:5]

    # 2 — local doc results from MeiliSearch
    local_hits = meili.index("documents").search(q, {"limit": 3})["hits"]

    # 3 — build combined context
    context  = "WEB RESULTS:\n"
    context += "".join(f"- {r['title']}: {r.get('content','')}\n" for r in web_hits)
    context += "\nLOCAL DOCS:\n"
    context += "".join(f"- {r['title']} ({r['filename']}): {r['content'][:300]}\n"
                       for r in local_hits)

    # 4 — ask Ollama to synthesise an answer
    prompt   = f"Using these sources, answer: {q}\n\n{context}"
    response = ollama.chat(model="qwen2.5-coder:7b",
                           messages=[{"role": "user", "content": prompt}])

    return {"answer": response["message"]["content"],
            "web": web_hits, "local": local_hits}
STEP 03

Run it

bash
uvicorn main:app --reload --port 8000
Usage

Query via http://localhost:8000/search?q=your+question or browse the auto-generated API docs at http://localhost:8000/docs

// in action

Example combined query

Query: "How do I implement a brickwall limiter in JUCE?" — showing how web results and local project docs combine into a grounded, specific answer.

Live query result
Web results — SearXNG
JUCE DSP Module Docs — LookAheadLimiter class reference @ juce.com
JUCE Forum — Brickwall limiter implementation discussion
KVR Audio — Best practices for brick-wall limiting in plugins
Local doc results — MeiliSearch
grecamp_dsp_notes.md — "...limiter stage after the power amp simulation uses a simple peak follower with 0.1ms attack..."
plugin_architecture.md — "...output gain stage followed by a soft-clip function to prevent inter-sample peaks..."
Ollama synthesised answer
To implement a brickwall limiter in JUCE, the recommended approach is to use dsp::LookAheadLimiter from the JUCE DSP module. Set a lookahead of 2–5ms to allow gain reduction to anticipate peaks. Your own notes (grecamp_dsp_notes.md) document a peak follower approach with 0.1ms attack that works well for the power amp simulation stage. Combine with a soft-clip function at the output to handle inter-sample peaks. Sources: juce.com DSP docs, KVR Audio forum, your local GRECAmp notes.
local_ai_stack_guide.pdf Full guide — installation, configuration & search endpoint · v1.0 · Apr 2026
Download PDF