Stack Series · 9 of 9 Closing · May 2026

Open WebUI in practice

Eight articles in, the stack works. Here is how to get more out of the interface you already have: Open WebUI Automations for scheduled prompts, the per-model settings the docs bury (native function calling, stream chunk size, temperature, title generation), and folders that carry their own system prompts for keeping a working install legible.

Curtis Smith · OptiMoss.ai · part of the Stack Series

§ 0

Where I landed

The series opened with the case for self-hosting and then spent seven articles building. The Docker host now runs a chat interface, a local language model (with an alternate inference engine for power users), a self-hosted search backend, a remote-access edge, and an image-generation pipeline — all behind one URL that asks for a PIN at the door. The closer is not another service. It is the Open WebUI moves that turn the interface from a chat box into a workbench.

§ 1

Automations: scheduled chats, no extra service

An Automation is a saved prompt, a model, a cron expression, and an optional set of tools and knowledge attachments. At the scheduled time, Open WebUI starts a chat as if you had typed the prompt yourself, runs it to completion, and saves the conversation in a folder you nominate. The answer is waiting when you open the folder in the morning.

Setup lives under Workspace → Automations. A new automation asks for:

A name and a cron expression. The UI accepts five-field cron (30 6 * * * for 06:30 daily). I keep mine to a small set: morning briefing, weekly digest, end-of-day notes catch-up.
A model. Same dropdown as the chat picker — local model via Ollama for routine runs, an OpenRouter model when the job needs more horsepower than the laptop can spare. Cost stays bounded because the job runs once.
The prompt. Plain text, the same as a chat message. Open WebUI substitutes the usual variables into prompts — {{CURRENT_DATE}}, {{CURRENT_DATETIME}}, {{CURRENT_WEEKDAY}}, {{USER_NAME}}. Use them; "today is the 14th" answers tend to drift otherwise.
Tools and knowledge. The same toggles available in a normal chat — web search, image generation, attached document collections. A morning briefing that searches the web becomes a checkbox.
The folder the output lands in. The chat appears there with a timestamp, ready to read.

A concrete recipe that exercises three earlier articles at once — Ollama (article 3), SearXNG (article 5), OpenRouter as a fallback (article 2):

# Automation: "Morning briefing"
# Cron: 30 6 * * *
# Model: deepseek/deepseek-v4 (OpenRouter) — fast, cheap, good at structure
# Tools: web_search ✓
# Folder: Briefings

It is {{CURRENT_WEEKDAY}}, {{CURRENT_DATE}}. Produce a brief covering, in this order:

1. AI / open-source ML — releases, papers, notable forks (last 24h)
2. Self-hosting and homelab — useful posts, new tools, security notes
3. Local-first software — anything Tauri, Sciter, or Electron-killer adjacent

For each section, give three to five items. Each item: one
sentence of substance plus a link. Skip anything that is
press-release filler. End with a one-paragraph "what changed
this week" synthesis across the three sections.

The first run is the audit. Read the output, see where the model overreaches, tighten the prompt. Most of my Automations went through three or four revisions before they produced something I actually wanted to read. The cron is the easy part; the prompt is the work.

Power user: when Automations is the wrong tool

Automations runs prompts through Open WebUI's normal chat pipeline: one model per run, one cron per automation, output lands as a chat. That covers most scheduled work. It does not cover jobs that fan out across several models and merge the results, jobs whose output needs to live on a public URL (Automation chats stay behind your login), or jobs that need to call services Open WebUI does not expose as tools.

At that point you are writing a small service — a Python script with a cron entry, or a FastAPI process if you want a real surface — that talks to SearXNG and OpenRouter directly and renders the output yourself. The reach is wider; the maintenance is also yours. If a scheduled job fits in one chat with one model, use Automations. If it does not, the custom build pays back. Most weeks I do not cross the line.

§ 2

Settings the docs bury

Open WebUI's defaults were set when models behaved differently than they do now. A handful are worth changing on day one. Three places to set them, in increasing reach:

Per model, in Workspace → Models → (your model) → Advanced Params. Overrides everything else.
Per user, in User Settings → General → Advanced Parameters. The defaults you see in new chats.
Global defaults for all users, in Admin Panel → Settings → Settings (top right) → Model Parameters. Set once when you stand up a multi-user instance; the rest of this list applies the same way.

Function Calling Default→Native

Workspace → Models → Advanced Params

The default is prompt-based: Open WebUI injects an XML tool-use protocol into the system prompt and parses the model's output. It works on toy cases and falls down on the real ones — missed calls, malformed arguments, no support for interleaving (the model cannot think, search, read a result, then search again). Native uses the provider's actual tool API. Any model from the past year — Gemma 4, Qwen 3.6, GPT-5.5, Claude Opus 4.7 — works as advertised once you flip this. Set it for every model you use with tools, which after a while is every model.

Stream Delta Chunk Size 1→8–16

Workspace → Models → Advanced Params

How many tokens Open WebUI batches before pushing an SSE update to the browser. The default of 1 means every token from the model triggers a render — fine on a 30 tok/s local model, brutal on a 500 tok/s cloud model where the UI starts to chase the stream and the browser pegs a CPU core. Raising this to 8 or 16 collapses the render pressure with no perceptible loss of streaming feel. The fast cloud models stop juddering.

Temperature 0.8→model-dependent

Workspace → Models → Advanced Params

The single sampling parameter worth touching. Coding and structured-output models prefer 0.2–0.4; creative work runs better at 0.7–1.0. Most current model cards publish a recommended value — Qwen 3.6 at 0.7, Gemma 4 at 1.0, DeepSeek V4 coder at 0.0 — and Open WebUI's 0.8 is no model's favorite. top_p, top_k, and repetition_penalty are also exposed; leave them alone unless a model card tells you otherwise.

Reasoning Effort medium→match the task

Workspace → Models → Advanced Params (reasoning models only)

Reasoning models — GPT-5.5, the o-series, Gemini 3 Pro Thinking, the new Qwen and DeepSeek thinking variants — expose an effort dial that trades latency for depth. low answers in seconds and is usually wrong on multi-step problems; high can take a minute. The default of medium is the middle of nothing in particular. I keep two model entries for the heavy reasoners — one on low for quick lookups, one on high for actual problems — and pick from the dropdown.

Title Generation same model→small cheap model

Admin Panel → Settings → Interface

Open WebUI auto-titles new chats by sending a follow-up prompt to whatever model you just used. If that model is Claude Opus 4.7 or GPT-5.5, every new chat costs an extra round-trip on a flagship model to generate three words of label. Point the title-generation slot at a cheap fast model — a 3B local model, or gemini-3-flash through OpenRouter — and the bill drops without you noticing the change.

Context Length 2048→what the model supports

Workspace → Models → Advanced Params

The default of 2048 tokens silently truncates long conversations on models that handle 128k or more. The symptom is the model forgetting what you said ten messages ago. Set this to the model's real ceiling — or just below it, to leave room for the system prompt and tool definitions.

System Prompt empty→model-level defaults

Workspace → Models → Advanced Params

Per-model system prompt is the place for behaviors every chat with that model should inherit — style, format preferences, tool-use defaults. The chat-level system prompt overrides it case by case; the folder-level one (see §3) overrides on top of that.

§ 3

Organizing what you already have

The chat history sidebar gets unmanageable around the 200-chat mark, which on a daily-driver install is a couple of months.

Folders are the obvious move, with a less-obvious payload: every folder carries its own system prompt. Drop a chat into Code review and the model picks up "you are reviewing code; flag bugs, suggest cleaner expressions, do not rewrite without asking" without me typing it. Start a chat in Drafting and it inherits a different prompt — same model, no re-priming, no copy-pasting a preamble. The folder is the context; the chats inside it inherit it. I have lost count of mine somewhere past forty — each one a role I would otherwise be priming from scratch at the start of every session.

Notes attach to a chat as a free-form sidebar. They are not seen by the model unless you explicitly reference them. Their job is the human side: what was I trying to do here, what did I conclude, what to come back to. I use them as the answer to "why does this six-week-old chat exist" — the question Open WebUI's auto-generated title rarely answers well.

Knowledge collections are the third move. Upload a folder of documents — project specs, internal docs, a paper you keep coming back to — and attach the collection to a model. Open WebUI runs retrieval against the collection on every prompt and feeds matched chunks into the context. The mechanism is RAG; the user-facing surface is "the model now knows about my project". For a small team this is the simplest way to give everyone the same shared context without paying a vendor to host it.

Knowledge collections leak into the system prompt

Retrieved chunks are injected with a wrapper that takes up real context space and shifts the model's behavior. If a model starts responding with a tone or vocabulary you did not ask for, check whether a knowledge collection is attached — the documents in it are influencing the voice. The fix is per-conversation: detach the collection for chats where you do not want it.

§ 4

Moves worth knowing exist

Power User

Four features that each deserve their own article. Just the names and shapes, enough to find them later.

Tools. A tool in Open WebUI is a Python function with a typed signature; the model sees it as a callable, the user sees it as a toggle. Write a function that calls your home automation, your weather station, your work API; expose it to the model. Combined with native function calling (see §2), this is how the assistant stops being a chatbot and starts being an agent on your terms — without leaving the interface.

Filters and Pipelines. Filters intercept requests and responses — strip PII before a prompt leaves the box, redact a model's output before it lands in the chat, attach metadata. Pipelines are a heavier abstraction that can route between models, fan out and merge, or implement custom RAG. Both are the escape hatch when "saved prompt in Automations" is no longer enough.

HTML artifact rendering. Open WebUI detects HTML in the model's output and offers an inline preview pane. Pair this with asking the model for HTML instead of Markdown — the topic of a separate article on this site — and the assistant can produce charts, diagrams, small interactive demos, and dashboards that render the moment the stream finishes. The fastest way to see this is to ask any current model to "make a small HTML comparison of three options I am considering" and watch what comes back.

OpenAPI tool servers. Any service that publishes an OpenAPI spec can be added as a tool source without writing Python. A handful of community-maintained servers already exist for filesystem access, time, memory, and more. The pattern is the right shape for letting a model touch a service you already run, on its own surface, instead of building a bespoke integration.

§ 5

What you have now

A private chat interface. A local model for the work you would rather not send out, a cloud model for the work where capability matters more than privacy. A search backend that does not meter you. A remote-access edge that asks for a PIN. An image-generation pipeline. The workspace customizations above. None of it locked to a vendor. All of it yours.

For a team or organization

Automations, folder-level system prompts, knowledge collections, and the per-model settings above are the same surfaces that turn a self-hosted Open WebUI install into a small-team workspace. The technical pattern: one instance, multiple user accounts (signup gated by admin), shared knowledge collections for organizational context, individual workspaces and folders for personal work. The policy pattern: which categories of work go to local models, which go to cloud, which Automations run on which schedules, who can add tools and pipelines. The interface scales to a handful of seats without complaint; the governance is the work.

Where I come in

This stack is the foundation. The work on top — wiring it into a proprietary database, building a retrieval pipeline against your own archives, getting the deployment through a compliance audit, designing the policy layer for a team that did not grow up on these tools — is shaped to your problem, not to a tutorial.

That is what OptiMoss is for. If a piece of the picture is starting to take shape and you would like a second pair of eyes on it, that is a good place to start.

Get in touch →

1 Why self-host? Read

2 Open WebUI: your AI interface Read

3 Ollama: local models, easy setup Read

4 llama.cpp: higher performance, more control Read

5 SearXNG: web-aware conversations Read

6 Remote access: Cloudflare Tunnel Read

7 Security and privacy Read

8 Image generation Read

9 Open WebUI in practice Here