# Image Generation Pipeline Research **Status:** Investigating **Date:** 2026-04-21 **Goal:** Local image generation for HoffDesk blog articles --- ## Executive Summary Exploring options for automated hero images, thumbnails, and article visuals using the Gaming PC's 3080 Ti (12GB VRAM). Target: sovereign-first pipeline that doesn't depend on external APIs for standard content. --- ## Hardware Constraints | Resource | Available | |----------|-----------| | GPU | RTX 3080 Ti (12GB VRAM) | | CPU | Ryzen 9 5900X | | RAM | 64GB DDR4 | | Storage | 2TB NVMe (local) | **Target:** All inference runs locally. Cloud only for overflow/fallback. --- ## Model Evaluation ### Tier 1: Production Candidates #### FLUX.1 [dev] | Attribute | Assessment | |-----------|------------| | **Quality** | Excellent — photorealistic, coherent typography | | **Speed** | ~30-60s per 1024×1024 image | | **VRAM** | ~10GB at 1024×1024 | | **License** | Open (non-commercial OK, commercial needs review) | | **Use Case** | Hero images, featured posts, high-visibility content | | **Pros** | Best-in-class quality, text rendering, composition | | **Cons** | Slower, memory-intensive, occasional NSFW false positives | **Installation:** ```bash # ComfyUI + FLUX workflow # Requires fp8 or quantized variant for 12GB # Recommended: flux1-dev-bnb-nf4 (4-bit quantized) ``` #### SDXL Turbo | Attribute | Assessment | |-----------|------------| | **Quality** | Good — fast, decent composition | | **Speed** | ~2-5s per 512×512 image | | **VRAM** | ~6GB at 512×512 | | **License** | Permissive (Stability AI) | | **Use Case** | Thumbnails, rapid iteration, batch generation | | **Pros** | Fast, low VRAM, good for experimentation | | **Cons** | Less detailed, struggles with complex prompts | **Installation:** ```bash # Automatic1111 or ComfyUI # Model: sdxl-turbo-1.0 # Steps: 1-4 (yes, really) ``` ### Tier 2: Alternatives #### Stable Diffusion 3 - **Status:** Wait for optimized local release - **Pros:** Better prompt adherence than SDXL - **Cons:** Heavier than FLUX, licensing unclear #### Stable Diffusion XL Base + Refiner - **Status:** Deprecated by Turbo and FLUX - **Note:** Keep as fallback if Turbo fails ### Tier 3: Cloud Fallback #### DALL-E 3 (OpenAI) - **Use:** Complex compositions where local models fail - **Cost:** ~$0.04-0.08/image - **Constraint:** Only for overflow, not default #### Midjourney - **Use:** High-artistry needs - **Cost:** $10-30/month - **Constraint:** Discord workflow not automatable --- ## Pipeline Architecture ### Option A: ComfyUI API Server (Recommended) ``` Content Pipeline → ComfyUI API (Gaming PC) → Generated Images ↓ Store in /shared/assets/images/ ↓ Git LFS or R2 for production ``` **Pros:** - Node-based, highly configurable - Queue system for batch jobs - REST API for integration - FLUX + SDXL in one workflow **Cons:** - More complex setup - Higher memory footprint ### Option B: Ollama-style Local API ``` Content Pipeline → Local diffusion API → Generated Images ``` **Pros:** - Simpler integration (like existing LLM setup) - Lower overhead **Cons:** - Fewer features than ComfyUI - May need custom wrapper --- ## Style Guide Considerations ### Visual Identity Questions 1. **Color Palette:** Match HoffDesk dark mode (slate/rose/indigo)? 2. **Typography:** Generate images with readable text, or overlay in CSS? 3. **Consistency:** Same seed/style for series posts? 4. **Art Direction:** Photorealistic, abstract, or illustration? ### Proposed Direction | Content Type | Model | Style | |--------------|-------|-------| | Technical deep-dives | FLUX | Photorealistic hardware, clean compositions | | Tutorials | SDXL Turbo | Clean, bright, diagrammatic | | Personal stories | FLUX | Atmospheric, mood-matched to narrative | | Quick tips | SDXL Turbo | Simple, iconic, fast generation | --- ## Storage & Delivery ### Options | Storage | Pros | Cons | |---------|------|------| | Local (Gaming PC) | Free, fast, sovereign | Single point of failure | | Cloudflare R2 | Cheap, CDN-integrated, versioned | External dependency | | Git LFS | Versioned with content | Repo bloat, slow clones | | S3/MinIO | Industry standard, flexible | Cost, complexity | **Recommendation:** R2 for production assets, local cache for generation pipeline. --- ## Workflow Integration ### Magic Wand UI Enhancement ```markdown ## Content Generation Prompt **Title:** [User input] **Tone:** [dropdown] **Include Hero Image:** [checkbox] ← NEW **Image Style:** [dropdown: Auto / Technical / Atmospheric / Minimal] [Generate] → LLM writes article + FLUX generates hero image ``` ### Batch Generation ```python # Pseudocode for pipeline for post in scheduled_posts: if post.needs_hero_image: prompt = generate_image_prompt(post.content_summary) image = flux_generate(prompt, size="1200x630") post.attach_hero(image) upload_to_r2(image) ``` --- ## Next Steps | Priority | Task | Owner | |----------|------|-------| | P1 | Install ComfyUI on Gaming PC with FLUX fp8 | Socrates | | P2 | Test SDXL Turbo for thumbnail speed | Socrates | | P3 | Define style guide (colors, composition, typography) | Daedalus | | P4 | Build ComfyUI → Pipeline integration | Socrates | | P5 | R2 bucket setup + CDN configuration | Socrates | | P6 | Batch generation test (10 articles) | Wadsworth | --- ## Open Questions 1. **Cost modeling:** Local electricity vs cloud API costs? 2. **Prompt engineering:** Who maintains prompt library for consistency? 3. **Review workflow:** Manual approval or fully automated? 4. **Accessibility:** Alt text generation (local vision model)? 5. **Fallback:** When do we fail over to DALL-E vs retry local? --- ## Resources - [FLUX.1 Documentation](https://huggingface.co/black-forest-labs) - [ComfyUI Examples](https://comfyanonymous.github.io/ComfyUI_examples/) - [SDXL Turbo Paper](https://huggingface.co/papers/2311.17042) - [HoffDesk Design Tokens](/shared/design-tokens/sprint-1.md) --- **Status:** `investigating` — awaiting Gaming PC setup confirmation