Stack Series · 8 of 9 Pixels · May 2026

Image generation

Adding pictures to a stack built for text. ComfyUI as the local backend, Open WebUI as the front door, a Compose service builder, a VRAM tier table that tells you where Flux is daily-driver versus tolerable, and the cloud route for hardware on the wrong side of the 8 GB line.

Curtis Smith · OptiMoss.ai · part of the Stack Series

§ 0

What this is

Seven articles in, the stack does chat, search, and remote access. Image generation is the next surface most people want to add, and it's the heaviest workload yet. A 9B language model at Q4 fits in 8 GB of VRAM with room to spare; a competitive diffusion model in 2026 doesn't, not without compromises.

The plan: a brief look at the cloud path Open WebUI already supports, then the local install — ComfyUI as a Compose service, wired into Open WebUI so that an "Image" button appears under your chat messages.

§ 1

Two routes, one decision

Open WebUI's image generation settings accept four engine types: ComfyUI (local), AUTOMATIC1111 (local, older), OpenAI, and Gemini. The OpenAI slot also takes any OpenAI-compatible endpoint, which means OpenRouter — and through OpenRouter, most of the current image-model field in one place: OpenAI's gpt-image-2 (April 2026), Google's Nano Banana 2 (Gemini 3.1 Flash Image, February 2026), Alibaba's Qwen-Image 2.0, ByteDance's Seedream, and the Flux.2 family. The choice between local and cloud isn't really about capability — both will produce a usable image — it's about which costs you'd rather pay.

Cost	Local (ComfyUI)	Cloud
Setup	One Compose service, one model download, one workflow wired into Open WebUI.	An API key and three fields in Open WebUI.
Hardware floor	Realistically 8 GB VRAM, comfortably 12 GB+. CPU-only works but the wall-clock cost makes it impractical for anything iterative.	None. Runs on whatever you already have.
Per-image cost	Electricity. Pennies per session, even at heavy use.	Roughly $0.005–0.21 per image depending on model and quality tier (May 2026). OpenRouter's image-model collection is the easiest place to compare.
What leaves the house	Nothing. Prompts and outputs stay on the box.	Every prompt, every reference image, the resulting image. The vendor's policy decides retention.
Iteration speed	SD 1.5: 2–8 seconds. SDXL: 10–30 seconds. Flux: a minute or more on tight VRAM.	Network round trip plus the provider's queue. Usually under 30 seconds.
Output ceiling	State of the open-weights world. Capable, with effort, of matching cloud output in the genres people post about. Behind the cloud's best on photorealism and text rendering.	The vendor's flagship model. Currently ahead on prompt adherence and text in images.

The split most people land on, including me: cloud for the times you want a publication-quality image and don't care that the prompt left the house, local for everything else — iteration, drafts, anything you'd prefer not to send to a vendor. Open WebUI supports both at once.

§ 2

The cloud route, briefly

If your hardware is on the wrong side of the 8 GB line, or you'd rather not run a second GPU-hungry service on the box, the cloud setup in Open WebUI is a few minutes' work.

In Admin Panel → Settings → Images, set Engine to OpenAI or Gemini. Paste the API key. Pick a model — gpt-image-2 for OpenAI, gemini-3.1-flash-image-preview (Nano Banana 2) or gemini-3-pro-image (Nano Banana Pro) for Gemini. Save. An "Image" generation button appears under chat messages; tap it and the assistant's most recent text gets sent as the prompt.

If you'd rather route through OpenRouter — already on this stack from the article 2 cloud-model setup — set the OpenAI engine's Base URL to https://openrouter.ai/api/v1 and the key to your OpenRouter key. From there the model dropdown includes everything OpenRouter exposes, which in May 2026 covers openai/gpt-image-2, google/gemini-3.1-flash-image-preview, the Flux.2 family, Qwen-Image 2.0, and Seedream alongside the OpenAI and Google flagships. One key, one bill, the same Open WebUI configuration.

What you trade for the easy setup is the same trade as any cloud LLM. The prompt and the resulting image are seen by the vendor and stored according to their policy — OpenAI's enterprise tier defaults to zero retention; the free and standard tiers do not. Each Gemini, Qwen, and Seedream offering has its own terms; OpenRouter passes traffic through to whichever upstream you pick. If the image gets posted publicly anyway, the privacy question is mostly answered for you; if you're generating drafts of something you'd rather keep to yourself, the cloud route is the wrong floor.

Pricing is the other thing. gpt-image-2 meters by image token — roughly $0.006 at the low-quality tier, $0.05 at standard, $0.21 at high quality for a 1024×1024 image. Nano Banana 2 sits in the low single cents per image; Qwen-Image and Seedream on OpenRouter are similar. That's small money for a handful of images, real money if you're iterating on a series of two hundred.

§ 3

What ComfyUI actually is

ComfyUI is a graph editor for diffusion models. Where AUTOMATIC1111 (the older standard) is a tabbed form with sliders, ComfyUI is a canvas of connected nodes — load checkpoint, encode prompt, sample, decode, save image. You build a workflow by dragging, you save it as JSON, and the same JSON is what tells the server how to produce an image.

The node graph isn't the point on its own. The point is that every step of a diffusion pipeline is a separate operation — text encoding, latent sampling, VAE decode, optional upscale, optional ControlNet conditioning — and exposing each step gives you composability that a fixed UI doesn't. Want a two-pass workflow that generates a small image and upscales it through a second model? Two more nodes. Want to share that workflow with someone? Send them the JSON; they get the exact pipeline you used. The same JSON is also the API contract — which is the bridge that makes the Open WebUI integration work.

The trade is a learning curve. The default workflow that ships with ComfyUI is a usable starting point; the moment you want anything beyond it, you're reading node documentation and building graphs. For a one-off image now and then, that's friction. For the kind of work where iteration is the goal, it pays back.

A note worth carrying from article 7: there is no first-party Docker image for ComfyUI. The community has produced several, and any of them you pick is a supply-chain choice. The Compose builder below uses yanwk/comfyui-boot — maintained continuously for years, bundles ComfyUI-Manager. The power-user expander in §4 covers building your own image when that's not enough.

§ 4

Installing

One Compose service, two volumes, and a network setting that puts ComfyUI on the same Docker network as Open WebUI. The builder below produces the service block.

ComfyUI · Compose service builder

GPU access

If you also run Ollama (article 3) or llama.cpp (article 4), you and the diffusion model are sharing a card. Pin specific devices if your machine has more than one; otherwise let both services see the GPU and serialize on contention.

All NVIDIA GPUsRecommended for a single-GPU box. ComfyUI and the LLM contend for memory; if you generate an image while a model is loaded, expect a brief slowdown. One specific GPU (device 1)For multi-GPU hosts where you want to leave device 0 for the LLM and run image generation off device 1.

Port exposure

Open WebUI reaches ComfyUI over the Docker network, no host port required. Expose 127.0.0.1:8188 only if you want to open ComfyUI's own web UI in a browser — needed for building workflows visually.

Bind to localhostRecommended. Reachable in a browser at http://localhost:8188 and from other containers as http://comfyui:8188. Internal onlyNo host port. You won't be able to use ComfyUI's web UI directly — workflow building has to happen on another machine or via the API.

Model storage

Diffusion models are large (2–24 GB each) and you'll accumulate several. A bind mount lets you drop new safetensors into a directory on the host without entering the container; a named volume is tidier but obscures the path.

Bind mountRecommended. Maps ./comfyui/models on the host into the container. Drop checkpoints, LoRAs, and VAEs into the matching subdirectories. Named volumeDocker manages the location. Tidier; awkward when you want to copy a model in or out.

Custom nodes

ComfyUI-Manager (bundled in the yanwk/comfyui-boot image) installs custom nodes from a registry. Useful, and also the place where third-party code with full container privileges arrives. Bind-mount the custom_nodes directory so you can see what's been installed.

Bind mount (visible)Recommended. ./comfyui/custom_nodes on the host; you can audit what's there. Named volumeLess friction, less visibility.

Append to docker-compose.yml

Append the block to the same docker-compose.yml from article 2. The service joins the default Compose network alongside open-webui, ollama, and the rest, and is reachable by other services as http://comfyui:8188.

$ cd ~/openwebui
$ docker compose up -d comfyui
$ docker compose logs -f comfyui   # watch the first-run model index build

The first start is slower than subsequent ones — the image pulls the bundled Python environment and ComfyUI-Manager extensions, and on a cold cache that's a few minutes of disk activity. When the logs settle on To see the GUI go to: http://0.0.0.0:8188, open that URL in a browser. You should see an empty graph with a default workflow loaded.

Power user: building your own ComfyUI image

If you'd rather not depend on a community image, the build itself is short. A minimal Dockerfile clones the upstream repo at a pinned commit, installs the Python dependencies, and runs main.py --listen. The trade is that you maintain it — pinning the commit means you also pin yourself out of upstream fixes until you bump.

# Dockerfile.comfyui
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
    python3 python3-pip python3-venv git && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone https://github.com/comfyanonymous/ComfyUI.git . && \
    git checkout <pin to a commit hash you've reviewed>
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8188
CMD ["python3", "main.py", "--listen", "0.0.0.0"]

Reference it from Compose with build: { context: ., dockerfile: Dockerfile.comfyui } instead of image:. Custom nodes you'd install separately by mounting them into /app/custom_nodes, or by extending the Dockerfile with a pinned set of git clone lines. The upside: you read every line of what runs. The downside: you read every line of what runs.

§ 5

Picking a model for your VRAM

Diffusion models are the largest single thing you'll put on disk in this stack. SDXL is around 6.5 GB; Flux.1-dev is 23 GB; Qwen-Image 2.0 at 7B parameters lands somewhere between them; the GGUF-quantized variants of each are smaller. The active-memory cost during generation is higher than the file size suggests because the model, the VAE, and the working latents all live in VRAM at once.

Two open-weights releases in early 2026 changed what's reasonable to expect at the low end. Qwen-Image 2.0 (Alibaba, February) is a 7B-parameter MMDiT that sits at or near the top of AI Arena's text-to-image leaderboard while fitting where SDXL fits. Flux.2 (Black Forest Labs, also 2026) is the successor to Flux.1, with FP8 and GGUF variants released alongside the full weights. SD 1.5 and SDXL are still fast and still have active fine-tune ecosystems around them — but for a fresh install in 2026, Qwen-Image 2.0 and Flux.2 are the names worth knowing.

VRAM	What runs comfortably	Reasonable picks (mid-2026)
CPU only	SD 1.5 at 512×512, ~2–3 minutes per image. Not for iteration.	SD 1.5 base or a SD 1.5 fine-tune
4–6 GB	SD 1.5 at native resolution; SDXL with `--lowvram` and patience.	SD 1.5, Realistic Vision (1.5 fine-tune), small SDXL Turbo variants
8 GB mine	SDXL natively, with care on batch size. Qwen-Image 2.0 GGUF Q4 or Flux.1-schnell GGUF Q4 with offload — slow but usable.	SDXL base, Juggernaut XL, Flux.1-schnell GGUF Q4, Qwen-Image 2.0 quants
12 GB	SDXL with batch and multiple LoRAs. Qwen-Image 2.0 at FP8 comfortably. Flux schnell at FP8.	SDXL fine-tunes, Qwen-Image 2.0 FP8, Flux.1-schnell FP8
16 GB	Qwen-Image 2.0 natively. Flux.1-dev or Flux.2 at FP8 / GGUF Q8 with reasonable speed.	Qwen-Image 2.0, Flux.1-dev Q8, Flux.2 quants
24 GB+	Flux.2 or Flux.1-dev at FP16 natively. Multiple models loaded at once.	Flux.2, Flux.1-dev, any open-weights flagship

ComfyUI expects a specific layout under the bind-mounted models directory. Node references look up files by relative path; a checkpoint in the wrong folder silently won't appear in the dropdown.

# comfyui/models/ — drop files into the matching subdirectory
checkpoints/   # full diffusion models (.safetensors, .ckpt)
vae/           # standalone VAEs (when not bundled in the checkpoint)
loras/         # LoRA adapters
controlnet/    # ControlNet weights
clip/          # text encoder weights (Flux needs these separately)
unet/          # GGUF-quantized model files for Flux et al
upscale_models/ # ESRGAN, etc

Flux on 8 GB: the GGUF Q4 variants run, the output is good, and a 1024×1024 image takes 60–120 seconds after the model loads. Fine for occasional use; a slog for iteration. The threshold where Flux feels native is somewhere around 12 GB.

Power user: a note on quantization and VAE precision

GGUF for diffusion is a younger pattern than GGUF for LLMs but has matured fast — ComfyUI-GGUF is the node pack that loads them. The same Q4 / Q8 axis applies: Q4 fits more, Q8 hallucinates less. For Flux specifically, Q4_K_S is the smallest quant that produces consistent text rendering and recognizable celebrity faces (to the extent that any open model does); Q3 drops both noticeably.

A separate VRAM-saving move people miss: the VAE at FP16 instead of FP32 costs a few hundred megabytes less without any visible quality difference. Most workflows have an FP16 VAE option somewhere in the node settings; on tight VRAM it's a free win.

§ 6

Wiring it to Open WebUI

Open WebUI sends a workflow to ComfyUI with the prompt substituted into named placeholder nodes; ComfyUI runs the graph and returns the image; Open WebUI displays it inline.

First, in ComfyUI's own web UI, build the workflow you want to use. The default workflow that loads with the empty canvas is a complete pipeline — checkpoint loader, two text encoders for positive and negative prompts, a sampler, a VAE decoder, and a save-image node. Run it once with a test prompt; confirm it produces an image. Then export it as API JSON.

The export menu is the small gear icon, top-right. Toggle Enable Dev mode Options, then use Save (API Format) from the main menu. This produces a JSON file that's different from the workflow JSON you might save normally — it's the format the ComfyUI API accepts, keyed by node IDs.

In Open WebUI, go to Admin Panel → Settings → Images. Set:

Engine: ComfyUI
ComfyUI Base URL: http://comfyui:8188 (the service name from Compose; not localhost — Open WebUI lives in its own container)
ComfyUI Workflow: paste the API JSON
ComfyUI Workflow Nodes: tell Open WebUI which node IDs correspond to which field. The minimum mapping is the positive prompt's CLIPTextEncode node ID (so the chat content gets substituted), the model node, the sampler's seed input, and the image dimensions. Open WebUI surfaces these as labeled rows; fill in the node ID number from your exported JSON.
Image Generation: toggle on at the top of the settings page.

Back in a chat, the "Image" button now appears under each assistant message. Tap it; Open WebUI sends the message text as the prompt and posts the result inline. The full integration walkthrough in the Open WebUI docs has the screenshots if the field labels move between versions.

The workflow is fixed per setting

Open WebUI's image settings hold one workflow. Switching models mid-chat means editing the workflow JSON or maintaining a second admin setting and toggling between them. If you want to swap quickly between, say, an SDXL pipeline and a Flux pipeline, build both workflows in ComfyUI, save both as API JSON, and keep them in a notes file you paste from.

The other gotcha: the seed. The default Open WebUI mapping passes a fixed seed unless you tell it to randomise. The result is that every image for the same prompt looks identical, which is a feature when iterating on a workflow and a frustration when you want variety. In the node mapping, set the seed input to Random rather than Fixed.

ComfyUI's own UI at localhost:8188 is a fine front end on its own — if you find yourself fighting the one-workflow-per-setting limit, or you want the full node graph in front of you for serious image work, just open it directly. That depth is its own article. The reason to wire ComfyUI through Open WebUI anyway is the chat side: any model in the picker — including ones that don't natively generate images, like Claude Opus 4.7 — can write the prompt and call the tool. The LLM doing the prompting matters as much as the diffusion model doing the rendering.

One model setting is worth flipping while you're in the wiring mood. In Workspace → Models → (your model) → Advanced Params, set Function Calling to Native. The Image button works either way, but the moment you want the model to call image generation on its own — "draw the chart we just discussed" mid-conversation, or interleave a search with an image — the default prompt-based mode misses or mangles the call on most current models. Native uses the provider's real tool API and gets out of the way. Article 9 picks up that thread along with the rest of the per-model settings Open WebUI hides behind defaults.

§ 7

How this works for me on 8 GB

Power user

SD 1.5 and its fine-tunes — fast and reliable. Image in 3–6 seconds, no VRAM drama, can run in the background while the LLM is loaded. The output is dated against current cloud models, but for thumbnails, sketches, anything you're going to overpaint or use as a draft, it's the right tool. I keep SD 1.5 base and a Realistic Vision fine-tune installed and reach for them more often than I expected to.

SDXL — usable, with care. A vanilla SDXL workflow at 1024×1024 fits in 8 GB with the FP16 VAE and no batching. 10–20 seconds per image. Adding a single LoRA is fine; chaining two or three starts pushing things into --lowvram territory, which slows generation by maybe 2x but doesn't break it.

Flux — works, painfully. Flux.1-schnell GGUF Q4 generates a 1024×1024 image in around 90 seconds on this card after the model loads. The first load itself is 30+ seconds because the unet, the VAE, and the two separate text encoders (CLIP-L and T5) all have to come into VRAM. The output justifies the wait — Flux's prompt adherence and text rendering are visibly ahead of SDXL — but I don't reach for it for iteration. When I want a Flux image, I queue one, do something else for two minutes, and come back.

Qwen-Image 2.0 — the one I'd start with today. Smaller than Flux (7B vs 12B), Apache 2.0 licensed, and at or near the top of AI Arena's text-to-image leaderboard. The GGUF quants are new enough that I don't have settled timing numbers yet, but a 7B MMDiT should fit roughly where SDXL fits. If you're starting fresh on tight VRAM in mid-2026, it's the model I'd point you to before Flux.

Image generation isn't a daily-use part of my stack the way chat and search are. SD 1.5 covers quick visuals; for quality I can show someone, the cloud API costs 7 cents and 20 seconds and beats anything I'd produce locally in an evening. The local install is there for the cases in between — prompts I don't want to send out, late-night iteration where 90 seconds per Flux image is fine because I'm not in a hurry.

Anyone with 16 GB or more reads this section differently. At that floor, Flux is the daily driver and the cloud is the exception.

§ 8

Where this fits

One article remains, and it's the one where these pieces stop being a list and start being a tool — article 9, Open WebUI in practice: scheduled prompts that quietly run your stack while you sleep, plus the handful of Open WebUI customizations that the documentation tends to bury.

For a small team or organization

Image generation is the part of the stack with the messiest legal surface, and pretending otherwise is how organizations get into trouble. Three questions to settle before letting anyone on the team click the button:

What models are allowed. Open-weights checkpoints from Hugging Face come with a license — Stable Diffusion has its CreativeML Open RAIL++-M, Flux has the FLUX.1 [dev] Non-Commercial License for the dev variant (commercial use requires a paid license), Flux schnell and Qwen-Image 2.0 both ship under Apache 2.0. "Non-commercial" is the trip wire; a model that's fine for an evening project is sometimes not fine for a logo on a brochure.
Where the training data came from. The open base models were trained on web-scraped image sets and the resulting outputs are the subject of ongoing litigation. Cloud providers have started carrying indemnification for outputs from their hosted models; you don't get that with a self-hosted Flux install. For internal drafts the risk is academic; for anything that ships to customers, the cloud's indemnity is a real product feature.
What gets prompted. A team member typing "image of our new logo concept on a billboard" into a cloud API has just sent that concept to the vendor. The same prompt into the local ComfyUI install hasn't. Make the routing explicit in policy — which categories of work go where — rather than leaving it to whichever button is more convenient.

A practical layer on top: keep Open WebUI's image settings on the local engine by default, give individual users a separate cloud-key configuration only when they need it, and put the audit trail (which user, which engine, which prompt) into the regular access-log review. The stack is small enough that the policy is one paragraph; the trouble starts when there isn't one.

1 Why self-host? Read

2 Open WebUI: your AI interface Read

3 Ollama: local models, easy setup Read

4 llama.cpp: higher performance, more control Read

5 SearXNG: web-aware conversations Read

6 Remote access: Cloudflare Tunnel Read

7 Security and privacy Read

8 Image generation Here

9 Open WebUI in practice Read