Set up ComfyUI for your own content pipeline (on a 4 GB consumer GPU)

· series: Launch Arc

This is the companion tutorial to Skaldborn’s art content pipeline. That post explained the architecture; this one walks through the local Stable Diffusion half end to end, on the same 4 GB consumer GPU it runs on in production, and then shows how to drive PixelLab’s hosted API to turn a concept image into an 8-direction sprite. Ubuntu (or any modern Linux distro) is assumed.

You don’t have to have read post 04 to use this. The install pattern, the launch-flag set, the FaceDetailer + upscale graph, and the PixelLab integration generalize to any content pipeline that uses ComfyUI as a generation backend with a sprite-renderer downstream. The model choices below are tuned for fantasy/RPG art; substitute your own if your domain is different.

The canonical scripts and a tighter reference live at clunasco/skald-forge. This post is the narrative version with the lessons and the small landmines baked in. Time budget: about 30 minutes for the local stack if your hardware cooperates, plus a handful of minutes once you have a PixelLab API key.

Table of contents

Why this is hard on a 4 GB card

Default ComfyUI will OOM on SD 1.5 generation if you have 4 GB of VRAM. That’s annoying. What’s especially annoying is where it OOMs: the VAE decode at the end of generation produces a VRAM spike that fires after the 30+ seconds of sampling. You wait, you see the image almost complete, and the process dies on decode. Every generation. Until you add --cpu-vae to the launch script and the spike moves to system RAM.

Three of the launch flags in this guide are there specifically to make 4 GB work. Drop any of them on a small card and the symptoms range from “OOM during sampling, stack trace early” (you find out fast) to “OOM at decode, watch your generation die at the finish line, repeatedly” (you waste hours). The flag table in section 6 tells you which is which.

If you have more VRAM, you’ll want to drop some of these flags as you go up. Section 9 covers that.

↑ Back to top

1. Prerequisites

Hardware. An NVIDIA GPU with at least 4 GB of VRAM. Reference rig: RTX 3050 Laptop 4 GB.

Software.

  • Ubuntu 22.04+ (or any modern Linux distro)
  • A working NVIDIA driver — recent enough to support CUDA 12.4. On Ubuntu, sudo apt install nvidia-driver-550 (or newer) is the usual path; reboot after.
  • The handful of base packages this guide assumes:
sudo apt update
sudo apt install -y python3 python3-venv git tmux curl

Verify CUDA is alive before going further:

nvidia-smi

You should see your GPU and driver version. If this fails, fix the driver before continuing.

A CivitAI account + API token is needed for the production checkpoints/LoRAs (CivitAI gates downloads behind auth). Get one at civitai.com/user/account. Save it; you’ll need it in section 4.

A PixelLab account + API token is needed for the rotation step in section 8. Free trial credit is enough to test the flow end to end. Get one at pixellab.ai/signup; generate the token from your account settings.

↑ Back to top

2. Clone skald-forge and install ComfyUI

git clone https://github.com/clunasco/skald-forge.git ~/workspace/projects/skald-forge
cd ~/workspace/projects/skald-forge
./install_comfyui.sh

install_comfyui.sh is idempotent — re-run it any time to update. Six things happen:

  1. Creates comfyui-venv/ (a Python venv, sibling to ComfyUI/).
  2. Clones comfyanonymous/ComfyUI into ComfyUI/.
  3. Installs PyTorch with the CUDA 12.4 wheels (torch, torchvision, torchaudio from the PyTorch CUDA 12.4 index).
  4. Installs ComfyUI’s requirements.txt.
  5. Installs ComfyUI-Manager into ComfyUI/custom_nodes/.
  6. Prints a CUDA-visibility check at the end. You should see cuda avail : True and your GPU name.

If step 6 says False, you’re holding a CPU-only PyTorch wheel — usually a cached environment from a prior install. The script pins the CUDA 12.4 index, so this only happens if something else got there first. Force a clean reinstall:

source comfyui-venv/bin/activate
pip install --force-reinstall torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu124

↑ Back to top

3. Install the FaceDetailer custom nodes

The production graph uses two custom-node packages from ltdrdata:

  • Impact Pack — provides FaceDetailer, the inpainting node that re-runs generation on the face region after the main pass. SD 1.5 at full-body 512×768 produces faces that are functionally mush; FaceDetailer is the fix.
  • Impact Subpack — provides the UltralyticsDetectorProvider node that loads face_yolov8n.pt for the face bounding box.

Install both:

cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack.git
git clone https://github.com/ltdrdata/ComfyUI-Impact-Subpack.git
cd -

source comfyui-venv/bin/activate
pip install -r ComfyUI/custom_nodes/ComfyUI-Impact-Pack/requirements.txt
pip install -r ComfyUI/custom_nodes/ComfyUI-Impact-Subpack/requirements.txt
deactivate

Once ComfyUI is running you can also install missing custom nodes through the Manager UI (“Manager” button → “Install Missing Custom Nodes”). The git-clone path is what skald-forge documents because it’s reproducible from a fresh checkout without a running server.

↑ Back to top

4. Configure your CivitAI token

download_models.sh auto-loads .env from the repo root. Create it:

cat > .env <<'EOF'
CIVITAI_TOKEN=your_civitai_api_token_here
EOF
chmod 600 .env

.env is gitignored at the skald-forge level. Don’t commit it.

↑ Back to top

5. Download the production model stack

Six files. Total disk: ~6.2 GB. All paths are relative to ComfyUI/models/. Sizes and source URLs are also recorded in models_inventory.md.

# 1. Default SD 1.5 VAE — used by checkpoints that don't bake one in
./download_models.sh --vae-default

# 2. aZovyaRPGArtistTools v4VAE — the production checkpoint (~5.7 GB)
./download_models.sh --checkpoint \
  https://civitai.com/api/download/models/251729 \
  --name aZovyaRPGArtistTools_v4VAE.safetensors

# 3. NorseViking_v10 — the production style LoRA (~37 MB)
./download_models.sh --lora \
  https://civitai.com/api/download/models/31804 \
  --name NorseViking_v10.safetensors

# 4. negative_hand-neg — fixes hand anatomy without dragging the style (~25 KB)
./download_models.sh --embedding \
  https://civitai.com/api/download/models/60938 \
  --name negative_hand-neg.pt

# 5. easynegative — companion general-purpose negative embedding (~24 KB)
./download_models.sh --embedding \
  https://civitai.com/api/download/models/9208 \
  --name easynegative.safetensors

# 6. 4xFoolhardyRemacri — post-generation upscaler (~67 MB)
./download_models.sh --upscaler \
  https://civitai.com/api/download/models/164821 \
  --name remacri_original.safetensors

The face detector and segmenter aren’t on CivitAI — they live on Hugging Face and Meta’s S3:

mkdir -p ComfyUI/models/ultralytics/bbox ComfyUI/models/sams

# YOLOv8-nano face detector (Bingsu/adetailer canonical release)
curl -L -o ComfyUI/models/ultralytics/bbox/face_yolov8n.pt \
  https://huggingface.co/Bingsu/adetailer/resolve/main/face_yolov8n.pt

# Segment Anything ViT-B (Meta's official checkpoint)
curl -L -o ComfyUI/models/sams/sam_vit_b_01ec64.pth \
  https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

Verify everything landed at the right size:

ls -lh ComfyUI/models/checkpoints/aZovyaRPGArtistTools_v4VAE.safetensors  # ~5.7 GB
ls -lh ComfyUI/models/loras/NorseViking_v10.safetensors                   # ~37 MB
ls -lh ComfyUI/models/upscale_models/remacri_original.safetensors         # ~67 MB
ls -lh ComfyUI/models/embeddings/negative_hand-neg.pt                     # ~25 KB
ls -lh ComfyUI/models/ultralytics/bbox/face_yolov8n.pt                    # ~6 MB
ls -lh ComfyUI/models/sams/sam_vit_b_01ec64.pth                           # ~358 MB

If any are wildly wrong (a few KB instead of hundreds of MB), it’s almost always a CivitAI auth issue: your CIVITAI_TOKEN is missing or wrong. The download finished successfully but you got a redirect-to-login HTML page instead of the model. Re-check .env, re-run.

If you want to substitute models for your own domain, this is the surface to swap. The graph in section 7 doesn’t care about model identity, only file presence.

↑ Back to top

6. Launch ComfyUI

./start_comfyui.sh

This launches ComfyUI inside a detached tmux session named comfyui and binds the web UI to http://127.0.0.1:8188. Useful operations:

tmux attach -t comfyui          # see the live log
# Ctrl-b d                      # detach again
tmux kill-session -t comfyui    # stop
./start_comfyui.sh              # start again (kills the old session first)

The launch flags matter. Read them before changing anything:

FlagWhy
--novramMaximum offload to system RAM. Without it, even SD 1.5 will OOM at 4 GB.
--use-split-cross-attentionSplits attention to reduce peak VRAM during sampling.
--cpu-vaeLoad-bearing on 4 GB. The VAE decode at the end of generation spikes VRAM and OOMs the Python process at the very last step. CPU decode adds a few seconds per image but is the difference between “works” and “watch your image die at 99%.”
--preview-method noneLive previews cost VRAM. Off.
--listen 0.0.0.0Reachable from Docker containers on the host bridge. Drop if you want loopback-only.
--port 8188Default ComfyUI port.

Why a tmux daemon? Because if you run ComfyUI directly inside the shell that owns it, the server dies when the shell exits — and there’s no signal that “the shell exited” from inside an editor or an automation that thought it had ComfyUI running. tmux outlives any individual shell. The launcher kills any pre-existing session with the same name first, so re-running the script is safe.

Aside. Don’t try to wrap ComfyUI in an editor-managed background process — Claude Code background bash tasks, VS Code task runners, that shape of thing. Background bash tasks get reaped when their parent exits. Use the tmux daemon path so the server outlives whatever spawned it.

↑ Back to top

7. Build the production workflow

Once ComfyUI is up, open http://127.0.0.1:8188. Wire the production graph in the UI. Components, in the order signal flows:

  1. CheckpointLoaderSimple → loads aZovyaRPGArtistTools_v4VAE.safetensors.
  2. LoraLoader → loads NorseViking_v10.safetensors with strength_model: 0.7, strength_clip: 0.5. Connect MODEL/CLIP from the checkpoint to the LoRA loader; the LoRA loader’s outputs are what later nodes consume.
  3. CLIPTextEncode (positive) — your prompt. Anything from “bearded warrior, fur cloak, snowy” to a more elaborate scene description.
  4. CLIPTextEncode (negative) — include embedding:negative_hand-neg, embedding:easynegative plus any usual negative tokens.
  5. EmptyLatentImage at 512×768 (the production canvas size).
  6. KSampler — connect MODEL from the LoRA loader, conditioning from the two text encoders, latent from the EmptyLatentImage. Recommended params: steps: 30, cfg: 7.5, sampler_name: dpmpp_2m, scheduler: karras, seed: 31337 (or whatever you pin for reproducibility).
  7. VAEDecode → uses the VAE baked into the v4VAE checkpoint.
  8. FaceDetailer (Impact Pack):
    • BBOX_DETECTORUltralyticsDetectorProvider loading face_yolov8n.pt.
    • SAM_MODELSAMLoader loading sam_vit_b_01ec64.pth.
    • Defaults are fine. The node detects faces in the decoded image, masks them with SAM, and re-runs generation on the face region — fixing the “face is mush” failure mode SD 1.5 has at full-body 512×768.
  9. UpscaleModelLoaderremacri_original.safetensorsImageUpscaleWithModel for the 4× pass to 2048×3072.
  10. SaveImage.

Save the workflow JSON via Workflow → Export once it’s wired so you can reload it cleanly. ComfyUI also embeds the full graph in saved PNGs — dragging a generated image back into the canvas reconstructs the workflow. That’s a useful way to ship a reference graph without committing JSON.

Generation time on the reference rig (RTX 3050 Laptop 4 GB) is 30–90 seconds per concept. On a 12 GB+ card with the conservative flags dropped, expect 10–20 seconds.

↑ Back to top

8. From concept to sprite: PixelLab

You have a 2048×3072 concept image. To turn it into something a game engine can render as a moving character, you need eight rotations — one sprite per cardinal and intercardinal direction. PixelLab’s hosted API is what does that step in this pipeline. It takes your concept image as a reference and generates the 8-direction sprite set conditioned on it.

The flow is async: POST a job, poll until it’s done, download the rotations.

Get an API key and set the token

Sign up at pixellab.ai/signup, buy a credit pack (free trial credit is enough to test the flow), and generate an API token from your account settings. Authentication is a Bearer token:

export PIXELLAB_TOKEN=your_pixellab_api_token_here

Pricing scales with output size and mode. A character rotation through Pro mode runs roughly 20–40 generations per request — meaningful spend at production scale. PixelLab’s pricing page has the per-call estimates.

Preprocess your concept

PixelLab’s /v2/create-character-pro endpoint with method: "create_from_concept" takes two image inputs:

  • concept_image (max 1024×1024) — the visual you want the rotations to follow.
  • reference_image (max 168×168) — a low-res anchor that shapes the sprite-art treatment.

The production preprocessing for a 2048×3072 ComfyUI output:

  1. Center-crop the concept to a square.
  2. Resize the square to 512×512 with Lanczos for concept_image (well within the 1024 max — keeps the JSON payload manageable).
  3. Resize the same square to 168×168 with Lanczos for reference_image (exactly at the max).
from PIL import Image

def preprocess(path: str) -> tuple[Image.Image, Image.Image]:
    src = Image.open(path).convert("RGB")
    side = min(src.size)
    left = (src.width - side) // 2
    top = (src.height - side) // 2
    square = src.crop((left, top, left + side, top + side))
    concept = square.resize((512, 512), Image.Resampling.LANCZOS)
    reference = square.resize((168, 168), Image.Resampling.LANCZOS)
    return concept, reference

Submit the rotation job

PixelLab takes images as base64-encoded PNGs inside the JSON body — the same pattern most image-input APIs use. In a shell, encoding any PNG to that string is one command:

base64 -w 0 concept_512.png > concept.b64
base64 -w 0 reference_168.png > reference.b64

The -w 0 disables line wrapping (base64 defaults to wrapping at 76 columns; JSON doesn’t tolerate embedded newlines in strings).

In Python the equivalent is two steps — write the in-memory PIL.Image to a PNG byte buffer, then base64-encode the bytes:

import base64, io, os, requests

def b64_png(img: Image.Image) -> str:
    """Encode an in-memory PIL image to a base64 PNG string for JSON."""
    buf = io.BytesIO()
    img.save(buf, format="PNG")
    return base64.b64encode(buf.getvalue()).decode("ascii")

concept, reference = preprocess("concept.png")

response = requests.post(
    "https://api.pixellab.ai/v2/create-character-pro",
    headers={"Authorization": f"Bearer {os.environ['PIXELLAB_TOKEN']}"},
    json={
        "method": "create_from_concept",
        "description": "norse warrior, fur cloak, axe",
        "image_size": {"width": 64, "height": 64},
        "view": "side",
        "concept_image": b64_png(concept),
        "reference_image": b64_png(reference),
    },
    timeout=30,
)
response.raise_for_status()
job_id = response.json()["background_job_id"]
print(f"job: {job_id}")

The response is 202 Accepted with a background_job_id. Pro mode typically takes 60–120 seconds end to end.

Poll the background job

import time

while True:
    poll = requests.get(
        f"https://api.pixellab.ai/v2/background-jobs/{job_id}",
        headers={"Authorization": f"Bearer {os.environ['PIXELLAB_TOKEN']}"},
        timeout=30,
    )
    poll.raise_for_status()
    body = poll.json()
    status = body["status"]
    if status == "completed":
        result = body["last_response"]
        break
    if status == "failed":
        raise RuntimeError(f"PixelLab job failed: {body}")
    print(f"status: {status}")
    time.sleep(2)

The async-shape gotcha. The completion payload is on body["last_response"], not on the immediate response from your POST. PixelLab’s OpenAPI types last_response as opaque object, so the shape isn’t statically discoverable. Early integrations reasonably assumed the completion response would echo the submission shape and crashed trying to read fields that weren’t there. The submission response is “your job is queued”; the completion response is “your job is done, here are the artifacts.” Different shapes. Read the completion endpoint, dump the raw JSON the first time you call it, then build your deserializer.

Download the rotations

last_response for character endpoints contains a rotation_urls dictionary keyed by direction:

import urllib.request
from pathlib import Path

DIRECTIONS = ["south", "south-east", "east", "north-east",
              "north", "north-west", "west", "south-west"]

Path("rotations").mkdir(exist_ok=True)
for direction in DIRECTIONS:
    url = result["rotation_urls"][direction]
    urllib.request.urlretrieve(url, f"rotations/{direction}.png")

You now have eight 64×64 sprite frames — one per cardinal and intercardinal direction. Wire them into your engine’s animation/rotation system however that works in your stack.

Bound your spend

Pro mode bills per generation, so a runaway loop or a sloppy retry can burn money fast. Two patterns from the production pipeline that protect against that:

  • Hard daily ceiling. Track total spend per UTC day; refuse new submissions when the day’s running cost exceeds a configured cap. The Skaldborn pipeline runs a $25/day default — when crossed, the worker stops claiming new jobs until the next UTC day.
  • Circuit breaker. Track consecutive vendor failures; stop submitting after N failures in a row. PixelLab does occasionally return “missing image data” errors on transient backend issues; the production pipeline retries each job up to three times before going terminal.

Neither is in the API itself. Both are your responsibility on the calling side.

Pure-shell smoke test

If you’d rather verify the API end to end before writing any Python — useful for confirming your token, the request shape, and the polling/download flow all work together — the whole sequence runs in shell with curl, jq, and imagemagick:

#!/usr/bin/env bash
set -euo pipefail

# 1. Preprocess (center-crop to square, then resize)
SIDE=$(identify -format '%[fx:min(w,h)]' concept.png)
convert concept.png -gravity center -crop "${SIDE}x${SIDE}+0+0" +repage \
  -resize 512x512 -filter Lanczos concept_512.png
convert concept.png -gravity center -crop "${SIDE}x${SIDE}+0+0" +repage \
  -resize 168x168 -filter Lanczos reference_168.png

# 2. Encode
base64 -w 0 concept_512.png   > concept.b64
base64 -w 0 reference_168.png > reference.b64

# 3. Submit
JOB_ID=$(jq -n \
  --rawfile concept   concept.b64 \
  --rawfile reference reference.b64 \
  '{
    method: "create_from_concept",
    description: "norse warrior, fur cloak, axe",
    image_size: { width: 64, height: 64 },
    view: "side",
    concept_image:   $concept,
    reference_image: $reference
  }' | curl -sX POST https://api.pixellab.ai/v2/create-character-pro \
       -H "Authorization: Bearer $PIXELLAB_TOKEN" \
       -H "Content-Type: application/json" \
       -d @- | jq -r .background_job_id)

echo "job: $JOB_ID"

# 4. Poll
while true; do
  BODY=$(curl -sX GET "https://api.pixellab.ai/v2/background-jobs/$JOB_ID" \
    -H "Authorization: Bearer $PIXELLAB_TOKEN")
  STATUS=$(echo "$BODY" | jq -r .status)
  echo "status: $STATUS"
  [[ "$STATUS" == "completed" ]] && break
  [[ "$STATUS" == "failed" ]] && { echo "$BODY" >&2; exit 1; }
  sleep 2
done

# 5. Download the rotations
mkdir -p rotations
echo "$BODY" \
  | jq -r '.last_response.rotation_urls | to_entries[] | "\(.key) \(.value)"' \
  | while read -r dir url; do curl -sL "$url" -o "rotations/$dir.png"; done

echo "done. rotations in ./rotations/"

Same flow as the Python above, with no language runtime to install. Use it to validate your environment, then graduate to the Python integration for anything you’d actually deploy.

↑ Back to top

9. Tuning for more VRAM

If you’re replicating this on a card with more headroom, relax the launch flags. Edit start_comfyui.sh:

  • 8 GB: drop --novram, keep --cpu-vae. You may also drop --use-split-cross-attention if you don’t see OOMs.
  • 12 GB+: drop all three of --novram, --use-split-cross-attention, --cpu-vae. GPU VAE decode is much faster.

--cpu-vae is the one to be most careful about removing. It’s there specifically because the VAE decode spike is the one that fires at the very end of generation — drop it on a marginal card and you’ll watch every generation die at 99%. If your ratio of “decode succeeded” to “decode OOMed” ever dips, put it back.

↑ Back to top

10. Troubleshooting

The four ComfyUI-side things that break for everyone, in roughly the order they break:

nvidia-smi doesn’t work or shows the wrong driver. Install or update the NVIDIA driver. On Ubuntu: sudo ubuntu-drivers autoinstall is the easy path; sudo apt install nvidia-driver-550 (or newer) is the explicit one. Reboot after, verify with nvidia-smi.

torch.cuda.is_available() returns False after install. You probably have a CPU-only PyTorch wheel. Re-run install_comfyui.sh; it pins the CUDA 12.4 index. If a previous environment cached the wrong wheel, force-reinstall:

source comfyui-venv/bin/activate
pip install --force-reinstall torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu124

CivitAI download fails with 401. CIVITAI_TOKEN is missing or wrong. Tokens come from civitai.com/user/account, not from the session cookie.

ComfyUI shows red “missing node” boxes when loading a workflow. Custom nodes aren’t installed. Either use ComfyUI-Manager → “Install Missing Custom Nodes,” or git-clone Impact Pack + Subpack as in section 3.

Generation OOMs at the very end (after sampling completes). VAE-decode VRAM spike. Make sure --cpu-vae is in the launch command.

And the PixelLab-side ones:

401 Unauthorized on the POST. PIXELLAB_TOKEN is missing or wrong. Tokens come from your PixelLab account settings, not the signin cookie.

402 Payment Required. Out of credits. Buy a pack on pixellab.ai.

422 Unprocessable Entity. Validation error. The most common cause is an oversized concept_image or reference_image — verify the preprocessing produced 512×512 and 168×168 respectively, and that you’re sending base64 PNG bytes (not the raw PIL object or a path).

429 Too Many Requests. Rate or concurrency limit hit. Back off; if you’re running batches, throttle the submission rate.

Job sits at status: "processing" for many minutes. Pro-mode requests with method: "create_from_concept" are 60–120 seconds typically. Past 5 minutes, something is genuinely stuck — start a new job rather than waiting indefinitely.

↑ Back to top

How Skaldborn consumes this

Most of Skaldborn’s art content pipeline is what you just built. The Skaldborn-specific code on top of it is:

  • A recipe (a JSON file with a constrained set of “levers” and a locked set of generation parameters) that feeds an art_jobs row in a Postgres-backed saga.
  • A saga worker (running in a Docker container) that substitutes lever values into the ComfyUI workflow above and posts it to the local ComfyUI daemon on the host. That’s why --listen 0.0.0.0 is one of the launch flags — a containerized worker reaching out to the host’s ComfyUI gets connection-refused if ComfyUI is bound to loopback only.
  • A CLI gate where I review concept images before they go to the paid API.
  • The PixelLab integration you just wrote, plus a circuit breaker, the daily cost ceiling, and retry logic — the kind of production-side guards a real batch system needs.
  • A manifest projector that turns the finished rotations into entries the simulation consumes.

The recipe pattern, the saga, and the manifest projection are Skaldborn-specific code. The ComfyUI install and the PixelLab integration are open and reproducible — exactly what you’ve just built. The other 20% — the governance shape that wraps them — is what post 04 goes deep on: a review gate before the expensive step, a content-addressed cache, immutable promotion. That’s the part you’d have to write yourself.

↑ Back to top

What’s next

The next post in the launch arc takes the manifest itself as its subject — how ManifestEntry records flow from recipe through projection into the Age manifest, and how the runtime consumes them. After that, the audio pipeline gets the same governance treatment with an entirely different stack.

If you want to follow along, subscribe via the form at the bottom of any page — one short email when the next post lands. If you want to argue, write to devlog@skaldborn.com.

Source for the install / launch / download scripts and the canonical model inventory: github.com/clunasco/skald-forge. Issues, PRs, and weather complaints welcome.

↑ Back to top