ACTIVE 2026

Visual FaQtory

ComfyUI-first local visuals yard for story-driven text2img → img2vid generation, live visuals playout, and QR crowd control.

Visual FaQtory project artwork

About This Project

Visual FaQtory has moved way past vague “AI video” territory. The attached v0.6.5-beta repo shows a real Python pipeline that reads long-form story text, splits it into overlapping windows, generates chained visuals cycle by cycle, and finishes them through a proper stitch / interpolate / upscale pass. The centre of gravity right now is still the local ComfyUI path, because it gives the best-tested free-unlimited results with JuggernautXL SDXL for key imagery and SVD XT for motion. Around that core, the repo also ships LTX-Video and Veo backend lanes, OBS/SRT live-visual helpers, and a QR-driven crowd-control system for show use.

Current Highlights

  • ComfyUI-first local generation flow with text-to-image first and image-to-video second
  • Best-tested free-unlimited stack built around JuggernautXL SDXL + SVD XT
  • Paragraph-driven sliding-story engine with reinject chaining between cycles
  • Finalizer pipeline for stitching, interpolation, upscaling, and optional audio muxing
  • QR crowd control with queue, overlay, rate limiting, and fail-open behaviour
  • Extra backend lanes for LTX-Video, Veo, Diffusers-style experiments, and live show routing

Actual State

The inspected repo is Visual FaQtory v0.6.5-beta. It already ships the main CLI, the sliding story engine, backend abstractions, prompt synthesis, run-state tracking, quality inspection helpers, and the full post-processing chain. That makes the honest story pretty simple: this is a working visuals pipeline, not a concept slide.

ComfyUI First, On Purpose

  • Local ComfyUI remains the main production lane for this project
  • JuggernautXL SDXL handles the strongest keyframe generation right now
  • SVD XT is the best-tested free-unlimited motion layer in the current setup
  • LTX-Video and Veo are present as serious expansion lanes, not the current centre of gravity

Live Visuals Setup

The repo also shows the live-show angle clearly. There is a crowd-control server with QR generation, overlay endpoints, queue APIs, and prompt filtering, plus OBS/SRT watcher scripts for split-box deployments where the GPU machine and the streaming machine are separate. On stage, the system can evolve visuals from a running text story while still letting the crowd steer the next turns without wrecking the whole operator flow.

Tech Stack

PythonComfyUIJuggernautXL SDXLSVD XTFFmpegFastAPIOBSLTX-VideoGoogle Veo