📄 document-processing-api-executive-summary.md 14,770 bytes Apr 19, 2026 📋 Raw

Executive Summary: Private Document Processing API

Product Definition

What: A self-hosted document OCR + extraction API that processes images/PDFs through a local vision model (qwen3-vl:8b) and returns structured JSON. Zero cloud vision processing — documents never leave your infrastructure. Designed for privacy-conscious businesses (legal, healthcare-adjacent, financial services, EU GDPR compliance).

Current State: 486 lines of working code (document_sorter.py) that:
- Receives images via Telegram DM → saves to /tmp/dropbox/ → base64 encodes → sends to Gaming PC Ollama qwen3-vl:8b
- Extracts: {vendor, date, category, amount} with strict taxonomy
- Builds filename: YYYY-MM-DD_Vendor_Category_$Amount.ext
- Uploads to Google Drive (currently blocked — Google account suspended)
- Cleans up temp files in finally block
- Uses keep_alive: 0 to immediately unload vision model and free VRAM

Gap to Revenue: The code works for personal use. To sell it as an API service requires: always-on GPU, multi-tenant auth, web API (not Telegram), billing, PDF support, and output storage (not Google Drive).


Market Analysis

Cloud OCR Pricing (competitive benchmarks)

Provider Basic OCR Forms/Tables Custom Extraction Notes
AWS Textract $1.50/1K pages $65/1K pages N/A 97% markup for structured data
Google Document AI $1.50/1K pages $10/1K pages $30/1K pages $300 free credit
Azure Document Intelligence $1.50/1K pages $10/1K pages $30/1K pages 65% discount at commitment tier
Mistral OCR 3 $2/1K pages $2/1K pages $2/1K pages Batch: $1/1K pages (50% off)

The privacy moat: None of these providers can say "your documents never touch a cloud server." For legal firms, healthcare-adjacent services, financial advisors, and EU businesses under GDPR — this is compliance, not preference.

Target Market Segments

Segment Monthly Volume Willingness to Pay Key Pain Point
Small law firms (1-5 attorneys) 500-2,000 pages $49-99/mo Cloud OCR = malpractice risk
Financial advisors (RIA) 1,000-5,000 pages $99-299/mo Client data in AWS = compliance nightmare
Healthcare-adjacent (billing, admin) 2,000-10,000 pages $199-499/mo HIPAA adjacent, can't risk breaches
EU consultancies 500-3,000 pages $79-149/mo GDPR, data sovereignty
Self-hosting enthusiasts 100-1,000 pages $29-49/mo Ideological, not price-sensitive

Market validation:
- Paperless-ngx has 50K+ GitHub stars — proven demand for self-hosted document management
- CleanRoll.ai raised funding for CRE rent roll extraction — validates OCR-as-a-service niche
- Mistral OCR 3 launched December 2025 with $1/1K pages pricing — proves market pressure on cloud pricing


Technical Architecture

Current Implementation (v0.1 - Personal Use)

Telegram DM (image)  /tmp/dropbox/  base64  HTTP POST to Gaming PC Ollama
                                              
                                        qwen3-vl:8b inference (120s timeout)
                                              
                                        JSON extraction  filename builder
                                              
                                        Google Drive upload (BLOCKED)
                                              
                                        👍 Telegram reaction

Hardware dependency: Gaming PC (3080 Ti, 12GB VRAM) must be ON and connected via Tailscale. Windows + Ollama + keep_alive: 0.

Required Architecture (v1.0 - Revenue Service)

Client POST /api/v1/extract (image/PDF + api_key)
              ↓
        FastAPI auth layer (rate limit, tenant isolation)
              ↓
        PDF → image conversion (if needed) via pdf2image
              ↓
        Local Ollama OR always-on GPU endpoint
              ↓
        qwen3-vl:8b inference (15-30s per page)
              ↓
        JSON extraction → structured output
              ↓
        MinIO S3-compatible storage (self-hosted, not AWS)
              ↓
        Webhook callback OR polling endpoint for results
              ↓
        Stripe metered billing (per-page + monthly base)

Critical change: Replace Google Drive with MinIO (self-hosted S3-compatible object storage) or local filesystem with CDN. Never touch AWS/GCP/Azure for file storage.


Gap Analysis: Current → Revenue

Component Status Effort Blocker
Vision model inference ✅ Works (Gaming PC) Must be always-on
Multi-page PDF support ❌ Not built 3-4 days pdf2image + page iteration
API authentication ❌ Not built 2-3 days JWT or API key auth, tenant isolation
Rate limiting / quotas ❌ Not built 2-3 days Redis or in-memory tracking
Async job queue ❌ Not built 4-5 days Celery + Redis or FastAPI background tasks
Result storage (MinIO) ❌ Not built 2-3 days Self-hosted S3-compatible storage
Webhook callbacks ❌ Not built 1-2 days POST to client endpoint with results
Billing (Stripe) ❌ Not built 3-4 days Metered billing, usage tracking
Dashboard / status page ❌ Not built 5-7 days Web UI for job status, usage, API keys
PDF preprocessing ❌ Not built 3-4 days Deskew, denoise, OCR optimization
Error handling / retries ⚠️ Partial 2-3 days Dead letter queue, client alerts

Total engineering effort: 4-6 weeks for MVP (one person, nights/weekends).


Hardware Investment Required

Current Setup Bottlenecks

Issue Current State Impact
Gaming PC not always-on 3080 Ti sleeps, wakes on demand OCR unavailable 60%+ of day
Windows power management Sleep mode, updates Unpredictable downtime
Tailscale dependency Windows → Beelink → Internet Two points of failure
No UPS Power outage = data loss Unacceptable for paid service
Component Cost Purpose
Used RTX 3060 12GB (eBay) $180-220 Dedicated inference GPU, 24/7 operation
Low-power x86 SFF PC (used Dell/HP) $150-250 Host for GPU, headless Ubuntu
650W PSU (if not included) $50-80 Power for GPU
PCIe riser cable (if SFF) $25-40 Physical fit
1TB NVMe SSD $80-100 Model storage + job queue
Total $485-690 Always-on inference server

Power consumption: ~150W under load, ~30W idle = ~$15-25/mo electricity.

Alternative: Used RTX 2060 Super 8GB ($120-150) — enough for qwen3-vl:8b, cheaper entry.

Option B: NVIDIA Jetson Orin Nano

Component Cost Notes
Jetson Orin Nano 8GB Dev Kit $499 ARM, lower power (~25W max)
256GB NVMe $40 Storage
Total $539 Lower power, ARM ecosystem

Tradeoffs:
- Pros: Lower power (~$5/mo), smaller footprint, purpose-built for edge AI
- Cons: ARM architecture (some Python wheels don't exist), slower inference than desktop GPU, 8GB RAM limits concurrent jobs

Recommendation: Option A (used RTX 3060 + SFF PC). More flexible, faster inference, easier troubleshooting.

Intel N150 has no PCIe slot for GPU. External GPU via Thunderbolt/USB4: $300 enclosure + $200 GPU = $500, more complex, lower bandwidth. Skip.


Cost Model: Self-Hosted vs Cloud

Monthly Operating Costs (Self-Hosted)

Cost Amount Notes
Electricity (150W × 24h × 30d) $20-30 At $0.15/kWh
Internet (already paid) $0 Home connection sufficient
Domain + Cloudflare (already paid) $0 Existing setup
Hardware depreciation ($600 / 36mo) $17 3-year lifespan
Total monthly COGS $37-47 Per-tenant marginal cost ≈ $0

Pricing Strategy

Target: Undercut cloud providers by 50% while offering privacy premium.

Tier Price Includes Cloud Equivalent
Starter $29/mo 1,000 pages, 1 user, email support AWS: $65-1,500
Professional $79/mo 5,000 pages, 3 users, webhooks, SLA AWS: $325-7,500
Business $199/mo 20,000 pages, 10 users, API access, priority AWS: $1,300-30,000
Enterprise $499+/mo Unlimited pages, custom models, dedicated infra AWS: custom quote

Break-even: At $79/mo × 10 customers = $790/mo revenue. COGS $47/mo. Gross margin 94%.


Dev/Test Cycles

Phase 1: Hardware (Week 1)

  • Acquire used RTX 3060 + SFF PC
  • Install Ubuntu 22.04 LTS, Ollama, qwen3-vl:8b
  • Verify inference speed: target <30s per page
  • Configure Tailscale static IP or Cloudflare Tunnel

Phase 2: Core API (Weeks 2-3)

  • FastAPI scaffolding with auth (API keys)
  • Image upload endpoint (sync) → returns job_id
  • Async job processing with Celery + Redis
  • MinIO setup for result storage
  • Webhook callback system

Phase 3: PDF Support (Week 4)

  • pdf2image integration for multi-page PDFs
  • Page-by-page processing with progress tracking
  • Zip output for multi-page docs
  • Bulk upload endpoint

Phase 4: Billing (Week 5)

  • Stripe metered billing integration
  • Usage tracking per API key
  • Automatic overage handling
  • Invoice generation

Phase 5: Dashboard (Weeks 6-7)

  • Minimal web UI: job status, usage charts, API key management
  • Status page with uptime metrics
  • Error logs view (sanitized)

Phase 6: Security Hardening (Week 8)

  • Rate limiting (prevent abuse)
  • Input validation (prevent injection)
  • Audit logging (who processed what when)
  • TLS termination via Cloudflare Tunnel
  • Fail2ban for SSH/API brute force

Testbench Demo: Go/No-Go Criteria

Test 1: Inference Performance

Setup: Single page receipt, qwen3-vl:8b, RTX 3060 12GB

Target:
- Cold start (model not loaded): <60 seconds
- Warm inference: <20 seconds per page
- Concurrent requests (3): <90 seconds each

Go if: Average <30s per page under load

Test 2: Accuracy Benchmark

Dataset: 100 documents (mix of receipts, invoices, contracts)

Target:
- Vendor name extraction: >90% accuracy
- Date extraction: >95% accuracy (correct format)
- Amount extraction: >95% accuracy (within $0.01)
- Category classification: >85% accuracy

Go if: Overall field extraction >90% without human correction

Test 3: Uptime & Reliability

Duration: 7 days continuous operation

Target:
- Uptime: >99% (excluding planned maintenance)
- Zero memory leaks (Ollama stays responsive)
- Graceful degradation under load (queue management)
- Automatic recovery from GPU OOM

Go if: Zero unplanned outages, <5min recovery time

Test 4: End-to-End Latency

Scenario: Client POST → processing → webhook callback

Target:
- P50 latency: <45 seconds
- P95 latency: <120 seconds
- P99 latency: <300 seconds (large PDFs)

Go if: P95 <120s for single-page documents

Test 5: Cost Validation

Measurement: 30-day electricity + bandwidth

Target:
- Electricity: <$40/mo
- Bandwidth: <100GB/mo (no overage)
- Hardware: no thermal throttling, <80°C GPU

Go if: Monthly COGS <$50


Risk Assessment

Risk Likelihood Impact Mitigation
Hardware failure (GPU/SSD) Medium High Hot spare, automated backups, 2-day replacement
Ollama/qwen3-vl model update breaks API Low High Pin model version, staged rollout, rollback plan
Customer data breach (local) Low Critical Encryption at rest, no remote access, audit logs
Power outage Medium Medium UPS (CyberPower 1500VA, $150), graceful shutdown
Internet outage Low High 4G failover (optional), queue-and-retry
Legal liability (OCR error) Medium Medium Terms of service disclaimer, $ liability cap
Stripe account freeze Low High Multi-processor backup (LemonSqueezy)
Beelink SSD death (cascading failure) Medium High Daily backups to Gaming PC + cloud (encrypted)

Recommendation

Go/No-Go Decision: CONDITIONAL GO

Proceed if:
1. ✅ Can invest $500-700 in dedicated GPU hardware within 30 days
2. ✅ Willing to spend 6-8 weeks part-time on MVP
3. ✅ First 3 paying customers identified (even if just "would you pay for this?" conversations)
4. ✅ Accept 6-month payback period on hardware

Defer if:
- ❌ Cannot guarantee always-on GPU (Gaming PC unreliable)
- ❌ Not willing to build web UI (Telegram-only won't scale to B2B)
- ❌ No LLC/liability protection (healthcare/legal adjacent customers)

Phased Approach

Phase 0 (Now):
- Validate demand: 5 conversations with law firms/financial advisors
- Price test: "Would you pay $79/mo for unlimited private OCR?"
- Build waitlist

Phase 1 (Month 1):
- Buy hardware, set up dedicated inference server
- Build async API + MinIO storage
- Dogfood with personal documents

Phase 2 (Month 2):
- Stripe integration, billing
- Onboard 3 beta customers at $29/mo (discounted)
- Iterate on extraction accuracy

Phase 3 (Month 3):
- Dashboard web UI
- Public launch at $79/mo
- Target: 10 customers ($790/mo) → break even

Conservative projection: 10 customers by Month 6 = $790/mo revenue, $47/mo COGS, $743/mo gross profit. Annual: ~$8,900 gross profit on ~$700 hardware investment.


Appendix: Competitive Moat Analysis

Why customers choose self-hosted over cloud:

Customer Type Cloud Fear Our Pitch
Law firm Malpractice if client data leaked "Documents never leave your server. Zero cloud touch."
Financial advisor SEC audit, client trust "Audit trail shows local processing only."
Healthcare admin HIPAA violation ($1.5M fine) "No BAA needed — no third-party processing."
EU consultancy GDPR Article 44 (data transfers) "Data sovereignty guaranteed. EU server option available."
Privacy enthusiast Surveillance capitalism "Open source, self-hosted, auditable code."

Differentiation from Paperless-ngx:
- Paperless: document management (storage, tagging, search)
- Us: document processing API (OCR, extraction, structured output)
- Complementary: customers use both

Differentiation from cloud OCR:
- Cloud: 99.9% uptime, infinite scale, higher cost, privacy risk
- Us: 99% uptime, limited scale, lower cost, zero privacy risk

The moat isn't features — it's zero trust architecture that cloud providers cannot replicate by definition.


Document version: 2026-04-19
Status: Draft for review
Next step: Matt's go/no-go decision, then Phase 0 validation