BACK
INFRA

Self-hosted AI Stack

Builder / operator

GPU-accelerated local LLMs (Ollama on an RTX 3080), a private chat UI (Open WebUI), and a self-hosted meta-search backend - a ChatGPT-style assistant with live web results and no queries leaving the network.

STACK

Ollama runs natively on the GPU for direct hardware access and serves models on an internal port. Open WebUI is the chat front-end; SearXNG provides self-hosted meta-search aggregation; a headless Playwright browser renders result pages into clean text before they reach the model. Open WebUI and the search backend share a dedicated Docker network and communicate by container name - no ports exposed beyond localhost.

Open WebUI chat interface

MODELS

A mix of open model families chosen per task: a general assistant, a fast-coding variant optimised for latency, a deeper-reasoning model for multi-step problems, and a security-focused variant for threat analysis. Custom Modelfiles raise the context window for agent use - long files and multi-step workflows that would otherwise hit the default limit.

A larger-VRAM workstation handles long-context runs that don't fit on the home GPU. The same local Ollama instance backs the SIEM's AI triage pipeline, so security alert summaries never leave the network.