Self-hosted AI Stack
Builder / operator
GPU-accelerated local LLMs (Ollama on an RTX 3080), a private chat UI (Open WebUI), and a self-hosted meta-search backend - a ChatGPT-style assistant with live web results and no queries leaving the network.
- Ollama
- Open WebUI
- SearXNG
- Docker
STACK
Ollama runs natively on the GPU for direct hardware access and serves models on an internal port. Open WebUI is the chat front-end; SearXNG provides self-hosted meta-search aggregation; a headless Playwright browser renders result pages into clean text before they reach the model. Open WebUI and the search backend share a dedicated Docker network and communicate by container name - no ports exposed beyond localhost.

MODELS
A mix of open model families chosen per task: a general assistant, a fast-coding variant optimised for latency, a deeper-reasoning model for multi-step problems, and a security-focused variant for threat analysis. Custom Modelfiles raise the context window for agent use - long files and multi-step workflows that would otherwise hit the default limit.
A larger-VRAM workstation handles long-context runs that don't fit on the home GPU. The same local Ollama instance backs the SIEM's AI triage pipeline, so security alert summaries never leave the network.