Set up ComfyUI for your own content pipeline (on a 4 GB consumer GPU)
This is the companion tutorial to Skaldborn’s art content pipeline. That post explained the architecture; this one walks through the local Stable Diffusion half end to end, on the same 4 GB consumer GPU it runs on in production, and then shows how to drive PixelLab’s hosted API to turn a concept image into an 8-direction sprite. Ubuntu (or any modern Linux distro) is assumed.
You don’t have to have read post 04 to use this. The install pattern, the launch-flag set, the FaceDetailer + upscale graph, and the PixelLab integration generalize to any content pipeline that uses ComfyUI as a generation backend with a sprite-renderer downstream. The model choices below are tuned for fantasy/RPG art; substitute your own if your domain is different.
The canonical scripts and a tighter reference live at clunasco/skald-forge. This post is the narrative version with the lessons and the small landmines baked in. Time budget: about 30 minutes for the local stack if your hardware cooperates, plus a handful of minutes once you have a PixelLab API key.
Table of contents
- Why this is hard on a 4 GB card
- 1. Prerequisites
- 2. Clone skald-forge and install ComfyUI
- 3. Install the FaceDetailer custom nodes
- 4. Configure your CivitAI token
- 5. Download the production model stack
- 6. Launch ComfyUI
- 7. Build the production workflow
- 8. From concept to sprite: PixelLab
- 9. Tuning for more VRAM
- 10. Troubleshooting
- How Skaldborn consumes this
- What’s next
Why this is hard on a 4 GB card
Default ComfyUI will OOM on SD 1.5 generation if you have 4 GB of VRAM. That’s annoying. What’s especially annoying is where it OOMs: the VAE decode at the end of generation produces a VRAM spike that fires after the 30+ seconds of sampling. You wait, you see the image almost complete, and the process dies on decode. Every generation. Until you add --cpu-vae to the launch script and the spike moves to system RAM.
Three of the launch flags in this guide are there specifically to make 4 GB work. Drop any of them on a small card and the symptoms range from “OOM during sampling, stack trace early” (you find out fast) to “OOM at decode, watch your generation die at the finish line, repeatedly” (you waste hours). The flag table in section 6 tells you which is which.
If you have more VRAM, you’ll want to drop some of these flags as you go up. Section 9 covers that.
1. Prerequisites
Hardware. An NVIDIA GPU with at least 4 GB of VRAM. Reference rig: RTX 3050 Laptop 4 GB.
Software.
- Ubuntu 22.04+ (or any modern Linux distro)
- A working NVIDIA driver — recent enough to support CUDA 12.4. On Ubuntu,
sudo apt install nvidia-driver-550(or newer) is the usual path; reboot after. - The handful of base packages this guide assumes:
sudo apt update
sudo apt install -y python3 python3-venv git tmux curl
Verify CUDA is alive before going further:
nvidia-smi
You should see your GPU and driver version. If this fails, fix the driver before continuing.
A CivitAI account + API token is needed for the production checkpoints/LoRAs (CivitAI gates downloads behind auth). Get one at civitai.com/user/account. Save it; you’ll need it in section 4.
A PixelLab account + API token is needed for the rotation step in section 8. Free trial credit is enough to test the flow end to end. Get one at pixellab.ai/signup; generate the token from your account settings.
2. Clone skald-forge and install ComfyUI
git clone https://github.com/clunasco/skald-forge.git ~/workspace/projects/skald-forge
cd ~/workspace/projects/skald-forge
./install_comfyui.sh
install_comfyui.sh is idempotent — re-run it any time to update. Six things happen:
- Creates
comfyui-venv/(a Python venv, sibling toComfyUI/). - Clones comfyanonymous/ComfyUI into
ComfyUI/. - Installs PyTorch with the CUDA 12.4 wheels (
torch,torchvision,torchaudiofrom the PyTorch CUDA 12.4 index). - Installs ComfyUI’s
requirements.txt. - Installs ComfyUI-Manager into
ComfyUI/custom_nodes/. - Prints a CUDA-visibility check at the end. You should see
cuda avail : Trueand your GPU name.
If step 6 says False, you’re holding a CPU-only PyTorch wheel — usually a cached environment from a prior install. The script pins the CUDA 12.4 index, so this only happens if something else got there first. Force a clean reinstall:
source comfyui-venv/bin/activate
pip install --force-reinstall torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu124
3. Install the FaceDetailer custom nodes
The production graph uses two custom-node packages from ltdrdata:
- Impact Pack — provides
FaceDetailer, the inpainting node that re-runs generation on the face region after the main pass. SD 1.5 at full-body 512×768 produces faces that are functionally mush;FaceDetaileris the fix. - Impact Subpack — provides the
UltralyticsDetectorProvidernode that loadsface_yolov8n.ptfor the face bounding box.
Install both:
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack.git
git clone https://github.com/ltdrdata/ComfyUI-Impact-Subpack.git
cd -
source comfyui-venv/bin/activate
pip install -r ComfyUI/custom_nodes/ComfyUI-Impact-Pack/requirements.txt
pip install -r ComfyUI/custom_nodes/ComfyUI-Impact-Subpack/requirements.txt
deactivate
Once ComfyUI is running you can also install missing custom nodes through the Manager UI (“Manager” button → “Install Missing Custom Nodes”). The git-clone path is what skald-forge documents because it’s reproducible from a fresh checkout without a running server.
4. Configure your CivitAI token
download_models.sh auto-loads .env from the repo root. Create it:
cat > .env <<'EOF'
CIVITAI_TOKEN=your_civitai_api_token_here
EOF
chmod 600 .env
.env is gitignored at the skald-forge level. Don’t commit it.
5. Download the production model stack
Six files. Total disk: ~6.2 GB. All paths are relative to ComfyUI/models/. Sizes and source URLs are also recorded in models_inventory.md.
# 1. Default SD 1.5 VAE — used by checkpoints that don't bake one in
./download_models.sh --vae-default
# 2. aZovyaRPGArtistTools v4VAE — the production checkpoint (~5.7 GB)
./download_models.sh --checkpoint \
https://civitai.com/api/download/models/251729 \
--name aZovyaRPGArtistTools_v4VAE.safetensors
# 3. NorseViking_v10 — the production style LoRA (~37 MB)
./download_models.sh --lora \
https://civitai.com/api/download/models/31804 \
--name NorseViking_v10.safetensors
# 4. negative_hand-neg — fixes hand anatomy without dragging the style (~25 KB)
./download_models.sh --embedding \
https://civitai.com/api/download/models/60938 \
--name negative_hand-neg.pt
# 5. easynegative — companion general-purpose negative embedding (~24 KB)
./download_models.sh --embedding \
https://civitai.com/api/download/models/9208 \
--name easynegative.safetensors
# 6. 4xFoolhardyRemacri — post-generation upscaler (~67 MB)
./download_models.sh --upscaler \
https://civitai.com/api/download/models/164821 \
--name remacri_original.safetensors
The face detector and segmenter aren’t on CivitAI — they live on Hugging Face and Meta’s S3:
mkdir -p ComfyUI/models/ultralytics/bbox ComfyUI/models/sams
# YOLOv8-nano face detector (Bingsu/adetailer canonical release)
curl -L -o ComfyUI/models/ultralytics/bbox/face_yolov8n.pt \
https://huggingface.co/Bingsu/adetailer/resolve/main/face_yolov8n.pt
# Segment Anything ViT-B (Meta's official checkpoint)
curl -L -o ComfyUI/models/sams/sam_vit_b_01ec64.pth \
https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
Verify everything landed at the right size:
ls -lh ComfyUI/models/checkpoints/aZovyaRPGArtistTools_v4VAE.safetensors # ~5.7 GB
ls -lh ComfyUI/models/loras/NorseViking_v10.safetensors # ~37 MB
ls -lh ComfyUI/models/upscale_models/remacri_original.safetensors # ~67 MB
ls -lh ComfyUI/models/embeddings/negative_hand-neg.pt # ~25 KB
ls -lh ComfyUI/models/ultralytics/bbox/face_yolov8n.pt # ~6 MB
ls -lh ComfyUI/models/sams/sam_vit_b_01ec64.pth # ~358 MB
If any are wildly wrong (a few KB instead of hundreds of MB), it’s almost always a CivitAI auth issue: your CIVITAI_TOKEN is missing or wrong. The download finished successfully but you got a redirect-to-login HTML page instead of the model. Re-check .env, re-run.
If you want to substitute models for your own domain, this is the surface to swap. The graph in section 7 doesn’t care about model identity, only file presence.
6. Launch ComfyUI
./start_comfyui.sh
This launches ComfyUI inside a detached tmux session named comfyui and binds the web UI to http://127.0.0.1:8188. Useful operations:
tmux attach -t comfyui # see the live log
# Ctrl-b d # detach again
tmux kill-session -t comfyui # stop
./start_comfyui.sh # start again (kills the old session first)
The launch flags matter. Read them before changing anything:
| Flag | Why |
|---|---|
--novram | Maximum offload to system RAM. Without it, even SD 1.5 will OOM at 4 GB. |
--use-split-cross-attention | Splits attention to reduce peak VRAM during sampling. |
--cpu-vae | Load-bearing on 4 GB. The VAE decode at the end of generation spikes VRAM and OOMs the Python process at the very last step. CPU decode adds a few seconds per image but is the difference between “works” and “watch your image die at 99%.” |
--preview-method none | Live previews cost VRAM. Off. |
--listen 0.0.0.0 | Reachable from Docker containers on the host bridge. Drop if you want loopback-only. |
--port 8188 | Default ComfyUI port. |
Why a tmux daemon? Because if you run ComfyUI directly inside the shell that owns it, the server dies when the shell exits — and there’s no signal that “the shell exited” from inside an editor or an automation that thought it had ComfyUI running. tmux outlives any individual shell. The launcher kills any pre-existing session with the same name first, so re-running the script is safe.
Aside. Don’t try to wrap ComfyUI in an editor-managed background process — Claude Code background bash tasks, VS Code task runners, that shape of thing. Background bash tasks get reaped when their parent exits. Use the tmux daemon path so the server outlives whatever spawned it.
7. Build the production workflow
Once ComfyUI is up, open http://127.0.0.1:8188. Wire the production graph in the UI. Components, in the order signal flows:
- CheckpointLoaderSimple → loads
aZovyaRPGArtistTools_v4VAE.safetensors. - LoraLoader → loads
NorseViking_v10.safetensorswithstrength_model: 0.7,strength_clip: 0.5. Connect MODEL/CLIP from the checkpoint to the LoRA loader; the LoRA loader’s outputs are what later nodes consume. - CLIPTextEncode (positive) — your prompt. Anything from “bearded warrior, fur cloak, snowy” to a more elaborate scene description.
- CLIPTextEncode (negative) — include
embedding:negative_hand-neg, embedding:easynegativeplus any usual negative tokens. - EmptyLatentImage at 512×768 (the production canvas size).
- KSampler — connect MODEL from the LoRA loader, conditioning from the two text encoders, latent from the
EmptyLatentImage. Recommended params:steps: 30,cfg: 7.5,sampler_name: dpmpp_2m,scheduler: karras,seed: 31337(or whatever you pin for reproducibility). - VAEDecode → uses the VAE baked into the v4VAE checkpoint.
- FaceDetailer (Impact Pack):
BBOX_DETECTOR← UltralyticsDetectorProvider loadingface_yolov8n.pt.SAM_MODEL← SAMLoader loadingsam_vit_b_01ec64.pth.- Defaults are fine. The node detects faces in the decoded image, masks them with SAM, and re-runs generation on the face region — fixing the “face is mush” failure mode SD 1.5 has at full-body 512×768.
- UpscaleModelLoader →
remacri_original.safetensors→ ImageUpscaleWithModel for the 4× pass to 2048×3072. - SaveImage.
Save the workflow JSON via Workflow → Export once it’s wired so you can reload it cleanly. ComfyUI also embeds the full graph in saved PNGs — dragging a generated image back into the canvas reconstructs the workflow. That’s a useful way to ship a reference graph without committing JSON.
Generation time on the reference rig (RTX 3050 Laptop 4 GB) is 30–90 seconds per concept. On a 12 GB+ card with the conservative flags dropped, expect 10–20 seconds.
8. From concept to sprite: PixelLab
You have a 2048×3072 concept image. To turn it into something a game engine can render as a moving character, you need eight rotations — one sprite per cardinal and intercardinal direction. PixelLab’s hosted API is what does that step in this pipeline. It takes your concept image as a reference and generates the 8-direction sprite set conditioned on it.
The flow is async: POST a job, poll until it’s done, download the rotations.
Get an API key and set the token
Sign up at pixellab.ai/signup, buy a credit pack (free trial credit is enough to test the flow), and generate an API token from your account settings. Authentication is a Bearer token:
export PIXELLAB_TOKEN=your_pixellab_api_token_here
Pricing scales with output size and mode. A character rotation through Pro mode runs roughly 20–40 generations per request — meaningful spend at production scale. PixelLab’s pricing page has the per-call estimates.
Preprocess your concept
PixelLab’s /v2/create-character-pro endpoint with method: "create_from_concept" takes two image inputs:
concept_image(max 1024×1024) — the visual you want the rotations to follow.reference_image(max 168×168) — a low-res anchor that shapes the sprite-art treatment.
The production preprocessing for a 2048×3072 ComfyUI output:
- Center-crop the concept to a square.
- Resize the square to 512×512 with Lanczos for
concept_image(well within the 1024 max — keeps the JSON payload manageable). - Resize the same square to 168×168 with Lanczos for
reference_image(exactly at the max).
from PIL import Image
def preprocess(path: str) -> tuple[Image.Image, Image.Image]:
src = Image.open(path).convert("RGB")
side = min(src.size)
left = (src.width - side) // 2
top = (src.height - side) // 2
square = src.crop((left, top, left + side, top + side))
concept = square.resize((512, 512), Image.Resampling.LANCZOS)
reference = square.resize((168, 168), Image.Resampling.LANCZOS)
return concept, reference
Submit the rotation job
PixelLab takes images as base64-encoded PNGs inside the JSON body — the same pattern most image-input APIs use. In a shell, encoding any PNG to that string is one command:
base64 -w 0 concept_512.png > concept.b64
base64 -w 0 reference_168.png > reference.b64
The -w 0 disables line wrapping (base64 defaults to wrapping at 76 columns; JSON doesn’t tolerate embedded newlines in strings).
In Python the equivalent is two steps — write the in-memory PIL.Image to a PNG byte buffer, then base64-encode the bytes:
import base64, io, os, requests
def b64_png(img: Image.Image) -> str:
"""Encode an in-memory PIL image to a base64 PNG string for JSON."""
buf = io.BytesIO()
img.save(buf, format="PNG")
return base64.b64encode(buf.getvalue()).decode("ascii")
concept, reference = preprocess("concept.png")
response = requests.post(
"https://api.pixellab.ai/v2/create-character-pro",
headers={"Authorization": f"Bearer {os.environ['PIXELLAB_TOKEN']}"},
json={
"method": "create_from_concept",
"description": "norse warrior, fur cloak, axe",
"image_size": {"width": 64, "height": 64},
"view": "side",
"concept_image": b64_png(concept),
"reference_image": b64_png(reference),
},
timeout=30,
)
response.raise_for_status()
job_id = response.json()["background_job_id"]
print(f"job: {job_id}")
The response is 202 Accepted with a background_job_id. Pro mode typically takes 60–120 seconds end to end.
Poll the background job
import time
while True:
poll = requests.get(
f"https://api.pixellab.ai/v2/background-jobs/{job_id}",
headers={"Authorization": f"Bearer {os.environ['PIXELLAB_TOKEN']}"},
timeout=30,
)
poll.raise_for_status()
body = poll.json()
status = body["status"]
if status == "completed":
result = body["last_response"]
break
if status == "failed":
raise RuntimeError(f"PixelLab job failed: {body}")
print(f"status: {status}")
time.sleep(2)
The async-shape gotcha. The completion payload is on
body["last_response"], not on the immediate response from your POST. PixelLab’s OpenAPI typeslast_responseas opaqueobject, so the shape isn’t statically discoverable. Early integrations reasonably assumed the completion response would echo the submission shape and crashed trying to read fields that weren’t there. The submission response is “your job is queued”; the completion response is “your job is done, here are the artifacts.” Different shapes. Read the completion endpoint, dump the raw JSON the first time you call it, then build your deserializer.
Download the rotations
last_response for character endpoints contains a rotation_urls dictionary keyed by direction:
import urllib.request
from pathlib import Path
DIRECTIONS = ["south", "south-east", "east", "north-east",
"north", "north-west", "west", "south-west"]
Path("rotations").mkdir(exist_ok=True)
for direction in DIRECTIONS:
url = result["rotation_urls"][direction]
urllib.request.urlretrieve(url, f"rotations/{direction}.png")
You now have eight 64×64 sprite frames — one per cardinal and intercardinal direction. Wire them into your engine’s animation/rotation system however that works in your stack.
Bound your spend
Pro mode bills per generation, so a runaway loop or a sloppy retry can burn money fast. Two patterns from the production pipeline that protect against that:
- Hard daily ceiling. Track total spend per UTC day; refuse new submissions when the day’s running cost exceeds a configured cap. The Skaldborn pipeline runs a
$25/daydefault — when crossed, the worker stops claiming new jobs until the next UTC day. - Circuit breaker. Track consecutive vendor failures; stop submitting after N failures in a row. PixelLab does occasionally return “missing image data” errors on transient backend issues; the production pipeline retries each job up to three times before going terminal.
Neither is in the API itself. Both are your responsibility on the calling side.
Pure-shell smoke test
If you’d rather verify the API end to end before writing any Python — useful for confirming your token, the request shape, and the polling/download flow all work together — the whole sequence runs in shell with curl, jq, and imagemagick:
#!/usr/bin/env bash
set -euo pipefail
# 1. Preprocess (center-crop to square, then resize)
SIDE=$(identify -format '%[fx:min(w,h)]' concept.png)
convert concept.png -gravity center -crop "${SIDE}x${SIDE}+0+0" +repage \
-resize 512x512 -filter Lanczos concept_512.png
convert concept.png -gravity center -crop "${SIDE}x${SIDE}+0+0" +repage \
-resize 168x168 -filter Lanczos reference_168.png
# 2. Encode
base64 -w 0 concept_512.png > concept.b64
base64 -w 0 reference_168.png > reference.b64
# 3. Submit
JOB_ID=$(jq -n \
--rawfile concept concept.b64 \
--rawfile reference reference.b64 \
'{
method: "create_from_concept",
description: "norse warrior, fur cloak, axe",
image_size: { width: 64, height: 64 },
view: "side",
concept_image: $concept,
reference_image: $reference
}' | curl -sX POST https://api.pixellab.ai/v2/create-character-pro \
-H "Authorization: Bearer $PIXELLAB_TOKEN" \
-H "Content-Type: application/json" \
-d @- | jq -r .background_job_id)
echo "job: $JOB_ID"
# 4. Poll
while true; do
BODY=$(curl -sX GET "https://api.pixellab.ai/v2/background-jobs/$JOB_ID" \
-H "Authorization: Bearer $PIXELLAB_TOKEN")
STATUS=$(echo "$BODY" | jq -r .status)
echo "status: $STATUS"
[[ "$STATUS" == "completed" ]] && break
[[ "$STATUS" == "failed" ]] && { echo "$BODY" >&2; exit 1; }
sleep 2
done
# 5. Download the rotations
mkdir -p rotations
echo "$BODY" \
| jq -r '.last_response.rotation_urls | to_entries[] | "\(.key) \(.value)"' \
| while read -r dir url; do curl -sL "$url" -o "rotations/$dir.png"; done
echo "done. rotations in ./rotations/"
Same flow as the Python above, with no language runtime to install. Use it to validate your environment, then graduate to the Python integration for anything you’d actually deploy.
9. Tuning for more VRAM
If you’re replicating this on a card with more headroom, relax the launch flags. Edit start_comfyui.sh:
- 8 GB: drop
--novram, keep--cpu-vae. You may also drop--use-split-cross-attentionif you don’t see OOMs. - 12 GB+: drop all three of
--novram,--use-split-cross-attention,--cpu-vae. GPU VAE decode is much faster.
--cpu-vae is the one to be most careful about removing. It’s there specifically because the VAE decode spike is the one that fires at the very end of generation — drop it on a marginal card and you’ll watch every generation die at 99%. If your ratio of “decode succeeded” to “decode OOMed” ever dips, put it back.
10. Troubleshooting
The four ComfyUI-side things that break for everyone, in roughly the order they break:
nvidia-smi doesn’t work or shows the wrong driver. Install or update the NVIDIA driver. On Ubuntu: sudo ubuntu-drivers autoinstall is the easy path; sudo apt install nvidia-driver-550 (or newer) is the explicit one. Reboot after, verify with nvidia-smi.
torch.cuda.is_available() returns False after install. You probably have a CPU-only PyTorch wheel. Re-run install_comfyui.sh; it pins the CUDA 12.4 index. If a previous environment cached the wrong wheel, force-reinstall:
source comfyui-venv/bin/activate
pip install --force-reinstall torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu124
CivitAI download fails with 401. CIVITAI_TOKEN is missing or wrong. Tokens come from civitai.com/user/account, not from the session cookie.
ComfyUI shows red “missing node” boxes when loading a workflow. Custom nodes aren’t installed. Either use ComfyUI-Manager → “Install Missing Custom Nodes,” or git-clone Impact Pack + Subpack as in section 3.
Generation OOMs at the very end (after sampling completes). VAE-decode VRAM spike. Make sure --cpu-vae is in the launch command.
And the PixelLab-side ones:
401 Unauthorized on the POST. PIXELLAB_TOKEN is missing or wrong. Tokens come from your PixelLab account settings, not the signin cookie.
402 Payment Required. Out of credits. Buy a pack on pixellab.ai.
422 Unprocessable Entity. Validation error. The most common cause is an oversized concept_image or reference_image — verify the preprocessing produced 512×512 and 168×168 respectively, and that you’re sending base64 PNG bytes (not the raw PIL object or a path).
429 Too Many Requests. Rate or concurrency limit hit. Back off; if you’re running batches, throttle the submission rate.
Job sits at status: "processing" for many minutes. Pro-mode requests with method: "create_from_concept" are 60–120 seconds typically. Past 5 minutes, something is genuinely stuck — start a new job rather than waiting indefinitely.
How Skaldborn consumes this
Most of Skaldborn’s art content pipeline is what you just built. The Skaldborn-specific code on top of it is:
- A recipe (a JSON file with a constrained set of “levers” and a locked set of generation parameters) that feeds an
art_jobsrow in a Postgres-backed saga. - A saga worker (running in a Docker container) that substitutes lever values into the ComfyUI workflow above and posts it to the local ComfyUI daemon on the host. That’s why
--listen 0.0.0.0is one of the launch flags — a containerized worker reaching out to the host’s ComfyUI gets connection-refused if ComfyUI is bound to loopback only. - A CLI gate where I review concept images before they go to the paid API.
- The PixelLab integration you just wrote, plus a circuit breaker, the daily cost ceiling, and retry logic — the kind of production-side guards a real batch system needs.
- A manifest projector that turns the finished rotations into entries the simulation consumes.
The recipe pattern, the saga, and the manifest projection are Skaldborn-specific code. The ComfyUI install and the PixelLab integration are open and reproducible — exactly what you’ve just built. The other 20% — the governance shape that wraps them — is what post 04 goes deep on: a review gate before the expensive step, a content-addressed cache, immutable promotion. That’s the part you’d have to write yourself.
What’s next
The next post in the launch arc takes the manifest itself as its subject — how ManifestEntry records flow from recipe through projection into the Age manifest, and how the runtime consumes them. After that, the audio pipeline gets the same governance treatment with an entirely different stack.
If you want to follow along, subscribe via the form at the bottom of any page — one short email when the next post lands. If you want to argue, write to devlog@skaldborn.com.
Source for the install / launch / download scripts and the canonical model inventory: github.com/clunasco/skald-forge. Issues, PRs, and weather complaints welcome.