INFRA

Self-hosted AI Stack

Builder / operator

GPU-accelerated local LLMs (Ollama on an RTX 3080), a private chat UI (Open WebUI), and a self-hosted meta-search backend - a ChatGPT-style assistant with live web results and no queries leaving the network.

Ollama
Open WebUI
SearXNG
Docker

GitHub ↗

STACK

Ollama runs natively on the GPU for direct hardware access and serves models on an internal port. Open WebUI is the chat front-end; SearXNG provides self-hosted meta-search aggregation; a headless Playwright browser renders result pages into clean text before they reach the model. Open WebUI and the search backend share a dedicated Docker network and communicate by container name - no ports exposed beyond localhost.

MODELS

A mix of open model families chosen per task: a general assistant, a fast-coding variant optimised for latency, a deeper-reasoning model for multi-step problems, and a security-focused variant for threat analysis. Custom Modelfiles raise the context window for agent use - long files and multi-step workflows that would otherwise hit the default limit.

A larger-VRAM workstation handles long-context runs that don't fit on the home GPU. The same local Ollama instance backs the SIEM's AI triage pipeline, so security alert summaries never leave the network.