OptiMoss.ai ← Back to home
Stack Series · 1 of 9 Updated May 2026

Why Self-host?

You're already using AI tools every day. This is about where that work goes — your prompts, your data, your search history — and why more people are keeping it on hardware they own. Privacy, cost, control, and an honest read on when cloud services are still the right call.

Read →
Stack Series · 2 of 9 Updated May 2026

Open WebUI: your AI interface

The chat window is the part you actually live in. A Docker Compose builder for Open WebUI, the first-launch gotchas worth knowing, and a three-way fork at the end: stay on cloud through OpenRouter, add Ollama for ease, or go to llama.cpp for control.

Read →
Stack Series · 3 of 9 Updated May 2026

Ollama: local models, easy setup

A single binary that pulls models, runs them on your hardware, and exposes a stable API. The most forgiving on-ramp to local inference. A Compose service builder with GPU options, a Hugging Face GGUF pulling trick, a VRAM tier table for picking a model that fits your card, and an honest read on where Ollama is the right call versus llama.cpp.

Read →
Stack Series · 4 of 9 Updated May 2026

llama.cpp: higher performance, more control

The engine underneath Ollama, run directly. More flags than you'll want at first; the right home for any model you actually care about tuning. A llama-swap Compose builder, a tour of the YAML that defines your models, the six flags that do most of the work, and a power-user expander for fitting a 35B MoE on 8 GB of VRAM.

Read →
Stack Series · 5 of 9 Updated May 2026

SearXNG: web-aware conversations

Give your chat interface the web without giving a third party your search history. A self-hosted metasearch engine that aggregates results from elsewhere, returns clean JSON to Open WebUI, and doesn't keep an account on you — with an honest section on the upstream rate limits that show up once agentic chat starts hammering the public engines.

Read →
Stack Series · 6 of 9 Updated May 2026

Remote access: Cloudflare Tunnel

Reach your stack from a phone in a coffee shop without opening a port on your router, exposing a public IP, or asking a guest to install a VPN client. A cloudflared service builder, the nginx pattern for routing multiple subdomains, an email-PIN gate at the edge, and an honest comparison to Tailscale Funnel and Pangolin for readers who would rather not have Cloudflare in the data path.

Read →
Stack Series · 7 of 9 Updated May 2026

Security and privacy

Six articles in, you have a stack. This one is the audit pass: where the data actually lives, who can reach it, and the small set of mistakes that account for most of the trouble in setups of this shape.

Read →
Stack Series · 8 of 9 Updated May 2026

Image generation

Adding pictures to a stack built for text. ComfyUI as the local backend, Open WebUI as the front door, a Compose service builder, a VRAM tier table that tells you where Flux is daily-driver versus tolerable, and the cloud route for hardware on the wrong side of the 8 GB line.

Read →
Stack Series · 9 of 9 Updated May 2026

Open WebUI in practice

Eight articles in, the stack works. Here is how to get more out of the interface you already have: Open WebUI Automations for scheduled prompts, the per-model settings the docs bury (native function calling, stream chunk size, temperature, title generation), and folders that carry their own system prompts for keeping a working install legible.

Read →
Technique Updated May 2026

The Unreasonable Effectiveness of HTML

Asking an AI agent for HTML instead of Markdown changes what comes back — and what you can do with it. A short walkthrough with five live demonstrations, in one file.

Read →
Showcase May 2026

emoji-mosaic

A side-project toy that repaints any image you upload with ~3,944 Noto Color Emoji as the palette. The matcher runs entirely in your browser — perceptual-LAB or RGB, with optional category restriction. Built over a couple of evenings with an AI coding assistant.

Try it →