# Executive Summary: Private Document Processing API

## Product Definition

**What:** A self-hosted document OCR + extraction API that processes images/PDFs through a local vision model (qwen3-vl:8b) and returns structured JSON. Zero cloud vision processing — documents never leave your infrastructure. Designed for privacy-conscious businesses (legal, healthcare-adjacent, financial services, EU GDPR compliance).

**Current State:** 486 lines of working code (`document_sorter.py`) that:
- Receives images via Telegram DM → saves to `/tmp/dropbox/` → base64 encodes → sends to Gaming PC Ollama qwen3-vl:8b
- Extracts: `{vendor, date, category, amount}` with strict taxonomy
- Builds filename: `YYYY-MM-DD_Vendor_Category_$Amount.ext`
- Uploads to Google Drive (currently blocked — Google account suspended)
- Cleans up temp files in `finally` block
- Uses `keep_alive: 0` to immediately unload vision model and free VRAM

**Gap to Revenue:** The code works for personal use. To sell it as an API service requires: always-on GPU, multi-tenant auth, web API (not Telegram), billing, PDF support, and output storage (not Google Drive).

---

## Market Analysis

### Cloud OCR Pricing (competitive benchmarks)

| Provider | Basic OCR | Forms/Tables | Custom Extraction | Notes |
|---|---|---|---|---|
| **AWS Textract** | $1.50/1K pages | $65/1K pages | N/A | 97% markup for structured data |
| **Google Document AI** | $1.50/1K pages | $10/1K pages | $30/1K pages | $300 free credit |
| **Azure Document Intelligence** | $1.50/1K pages | $10/1K pages | $30/1K pages | 65% discount at commitment tier |
| **Mistral OCR 3** | $2/1K pages | $2/1K pages | $2/1K pages | Batch: $1/1K pages (50% off) |

**The privacy moat:** None of these providers can say *"your documents never touch a cloud server."* For legal firms, healthcare-adjacent services, financial advisors, and EU businesses under GDPR — this is compliance, not preference.

### Target Market Segments

| Segment | Monthly Volume | Willingness to Pay | Key Pain Point |
|---|---|---|---|
| Small law firms (1-5 attorneys) | 500-2,000 pages | $49-99/mo | Cloud OCR = malpractice risk |
| Financial advisors (RIA) | 1,000-5,000 pages | $99-299/mo | Client data in AWS = compliance nightmare |
| Healthcare-adjacent (billing, admin) | 2,000-10,000 pages | $199-499/mo | HIPAA adjacent, can't risk breaches |
| EU consultancies | 500-3,000 pages | $79-149/mo | GDPR, data sovereignty |
| Self-hosting enthusiasts | 100-1,000 pages | $29-49/mo | Ideological, not price-sensitive |

**Market validation:**
- Paperless-ngx has 50K+ GitHub stars — proven demand for self-hosted document management
- CleanRoll.ai raised funding for CRE rent roll extraction — validates OCR-as-a-service niche
- Mistral OCR 3 launched December 2025 with $1/1K pages pricing — proves market pressure on cloud pricing

---

## Technical Architecture

### Current Implementation (v0.1 - Personal Use)

```
Telegram DM (image) → /tmp/dropbox/ → base64 → HTTP POST to Gaming PC Ollama
                                              ↓
                                        qwen3-vl:8b inference (120s timeout)
                                              ↓
                                        JSON extraction → filename builder
                                              ↓
                                        Google Drive upload (BLOCKED)
                                              ↓
                                        👍 Telegram reaction
```

**Hardware dependency:** Gaming PC (3080 Ti, 12GB VRAM) must be ON and connected via Tailscale. Windows + Ollama + `keep_alive: 0`.

### Required Architecture (v1.0 - Revenue Service)

```
Client POST /api/v1/extract (image/PDF + api_key)
              ↓
        FastAPI auth layer (rate limit, tenant isolation)
              ↓
        PDF → image conversion (if needed) via pdf2image
              ↓
        Local Ollama OR always-on GPU endpoint
              ↓
        qwen3-vl:8b inference (15-30s per page)
              ↓
        JSON extraction → structured output
              ↓
        MinIO S3-compatible storage (self-hosted, not AWS)
              ↓
        Webhook callback OR polling endpoint for results
              ↓
        Stripe metered billing (per-page + monthly base)
```

**Critical change:** Replace Google Drive with MinIO (self-hosted S3-compatible object storage) or local filesystem with CDN. Never touch AWS/GCP/Azure for file storage.

---

## Gap Analysis: Current → Revenue

| Component | Status | Effort | Blocker |
|---|---|---|---|
| **Vision model inference** | ✅ Works (Gaming PC) | — | Must be always-on |
| **Multi-page PDF support** | ❌ Not built | 3-4 days | pdf2image + page iteration |
| **API authentication** | ❌ Not built | 2-3 days | JWT or API key auth, tenant isolation |
| **Rate limiting / quotas** | ❌ Not built | 2-3 days | Redis or in-memory tracking |
| **Async job queue** | ❌ Not built | 4-5 days | Celery + Redis or FastAPI background tasks |
| **Result storage (MinIO)** | ❌ Not built | 2-3 days | Self-hosted S3-compatible storage |
| **Webhook callbacks** | ❌ Not built | 1-2 days | POST to client endpoint with results |
| **Billing (Stripe)** | ❌ Not built | 3-4 days | Metered billing, usage tracking |
| **Dashboard / status page** | ❌ Not built | 5-7 days | Web UI for job status, usage, API keys |
| **PDF preprocessing** | ❌ Not built | 3-4 days | Deskew, denoise, OCR optimization |
| **Error handling / retries** | ⚠️ Partial | 2-3 days | Dead letter queue, client alerts |

**Total engineering effort:** 4-6 weeks for MVP (one person, nights/weekends).

---

## Hardware Investment Required

### Current Setup Bottlenecks

| Issue | Current State | Impact |
|---|---|---|
| Gaming PC not always-on | 3080 Ti sleeps, wakes on demand | OCR unavailable 60%+ of day |
| Windows power management | Sleep mode, updates | Unpredictable downtime |
| Tailscale dependency | Windows → Beelink → Internet | Two points of failure |
| No UPS | Power outage = data loss | Unacceptable for paid service |

### Recommended Hardware Upgrades

#### Option A: Dedicated GPU Server (Recommended)

| Component | Cost | Purpose |
|---|---|---|
| Used RTX 3060 12GB (eBay) | $180-220 | Dedicated inference GPU, 24/7 operation |
| Low-power x86 SFF PC (used Dell/HP) | $150-250 | Host for GPU, headless Ubuntu |
| 650W PSU (if not included) | $50-80 | Power for GPU |
| PCIe riser cable (if SFF) | $25-40 | Physical fit |
| 1TB NVMe SSD | $80-100 | Model storage + job queue |
| **Total** | **$485-690** | Always-on inference server |

**Power consumption:** ~150W under load, ~30W idle = ~$15-25/mo electricity.

**Alternative:** Used RTX 2060 Super 8GB ($120-150) — enough for qwen3-vl:8b, cheaper entry.

#### Option B: NVIDIA Jetson Orin Nano

| Component | Cost | Notes |
|---|---|---|
| Jetson Orin Nano 8GB Dev Kit | $499 | ARM, lower power (~25W max) |
| 256GB NVMe | $40 | Storage |
| **Total** | **$539** | Lower power, ARM ecosystem |

**Tradeoffs:** 
- Pros: Lower power (~$5/mo), smaller footprint, purpose-built for edge AI
- Cons: ARM architecture (some Python wheels don't exist), slower inference than desktop GPU, 8GB RAM limits concurrent jobs

**Recommendation:** Option A (used RTX 3060 + SFF PC). More flexible, faster inference, easier troubleshooting.

#### Option C: Upgrade Beelink (Not Recommended)

Intel N150 has no PCIe slot for GPU. External GPU via Thunderbolt/USB4: $300 enclosure + $200 GPU = $500, more complex, lower bandwidth. Skip.

---

## Cost Model: Self-Hosted vs Cloud

### Monthly Operating Costs (Self-Hosted)

| Cost | Amount | Notes |
|---|---|---|
| Electricity (150W × 24h × 30d) | $20-30 | At $0.15/kWh |
| Internet (already paid) | $0 | Home connection sufficient |
| Domain + Cloudflare (already paid) | $0 | Existing setup |
| Hardware depreciation ($600 / 36mo) | $17 | 3-year lifespan |
| **Total monthly COGS** | **$37-47** | Per-tenant marginal cost ≈ $0 |

### Pricing Strategy

**Target:** Undercut cloud providers by 50% while offering privacy premium.

| Tier | Price | Includes | Cloud Equivalent |
|---|---|---|---|
| Starter | $29/mo | 1,000 pages, 1 user, email support | AWS: $65-1,500 |
| Professional | $79/mo | 5,000 pages, 3 users, webhooks, SLA | AWS: $325-7,500 |
| Business | $199/mo | 20,000 pages, 10 users, API access, priority | AWS: $1,300-30,000 |
| Enterprise | $499+/mo | Unlimited pages, custom models, dedicated infra | AWS: custom quote |

**Break-even:** At $79/mo × 10 customers = $790/mo revenue. COGS $47/mo. Gross margin 94%.

---

## Dev/Test Cycles

### Phase 1: Hardware (Week 1)
- Acquire used RTX 3060 + SFF PC
- Install Ubuntu 22.04 LTS, Ollama, qwen3-vl:8b
- Verify inference speed: target <30s per page
- Configure Tailscale static IP or Cloudflare Tunnel

### Phase 2: Core API (Weeks 2-3)
- FastAPI scaffolding with auth (API keys)
- Image upload endpoint (sync) → returns job_id
- Async job processing with Celery + Redis
- MinIO setup for result storage
- Webhook callback system

### Phase 3: PDF Support (Week 4)
- pdf2image integration for multi-page PDFs
- Page-by-page processing with progress tracking
- Zip output for multi-page docs
- Bulk upload endpoint

### Phase 4: Billing (Week 5)
- Stripe metered billing integration
- Usage tracking per API key
- Automatic overage handling
- Invoice generation

### Phase 5: Dashboard (Weeks 6-7)
- Minimal web UI: job status, usage charts, API key management
- Status page with uptime metrics
- Error logs view (sanitized)

### Phase 6: Security Hardening (Week 8)
- Rate limiting (prevent abuse)
- Input validation (prevent injection)
- Audit logging (who processed what when)
- TLS termination via Cloudflare Tunnel
- Fail2ban for SSH/API brute force

---

## Testbench Demo: Go/No-Go Criteria

### Test 1: Inference Performance

**Setup:** Single page receipt, qwen3-vl:8b, RTX 3060 12GB

**Target:**
- Cold start (model not loaded): <60 seconds
- Warm inference: <20 seconds per page
- Concurrent requests (3): <90 seconds each

**Go if:** Average <30s per page under load

### Test 2: Accuracy Benchmark

**Dataset:** 100 documents (mix of receipts, invoices, contracts)

**Target:**
- Vendor name extraction: >90% accuracy
- Date extraction: >95% accuracy (correct format)
- Amount extraction: >95% accuracy (within $0.01)
- Category classification: >85% accuracy

**Go if:** Overall field extraction >90% without human correction

### Test 3: Uptime & Reliability

**Duration:** 7 days continuous operation

**Target:**
- Uptime: >99% (excluding planned maintenance)
- Zero memory leaks (Ollama stays responsive)
- Graceful degradation under load (queue management)
- Automatic recovery from GPU OOM

**Go if:** Zero unplanned outages, <5min recovery time

### Test 4: End-to-End Latency

**Scenario:** Client POST → processing → webhook callback

**Target:**
- P50 latency: <45 seconds
- P95 latency: <120 seconds
- P99 latency: <300 seconds (large PDFs)

**Go if:** P95 <120s for single-page documents

### Test 5: Cost Validation

**Measurement:** 30-day electricity + bandwidth

**Target:**
- Electricity: <$40/mo
- Bandwidth: <100GB/mo (no overage)
- Hardware: no thermal throttling, <80°C GPU

**Go if:** Monthly COGS <$50

---

## Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Hardware failure (GPU/SSD) | Medium | High | Hot spare, automated backups, 2-day replacement |
| Ollama/qwen3-vl model update breaks API | Low | High | Pin model version, staged rollout, rollback plan |
| Customer data breach (local) | Low | Critical | Encryption at rest, no remote access, audit logs |
| Power outage | Medium | Medium | UPS (CyberPower 1500VA, $150), graceful shutdown |
| Internet outage | Low | High | 4G failover (optional), queue-and-retry |
| Legal liability (OCR error) | Medium | Medium | Terms of service disclaimer, $ liability cap |
| Stripe account freeze | Low | High | Multi-processor backup (LemonSqueezy) |
| Beelink SSD death (cascading failure) | Medium | High | Daily backups to Gaming PC + cloud (encrypted) |

---

## Recommendation

### Go/No-Go Decision: **CONDITIONAL GO**

**Proceed if:**
1. ✅ Can invest $500-700 in dedicated GPU hardware within 30 days
2. ✅ Willing to spend 6-8 weeks part-time on MVP
3. ✅ First 3 paying customers identified (even if just "would you pay for this?" conversations)
4. ✅ Accept 6-month payback period on hardware

**Defer if:**
- ❌ Cannot guarantee always-on GPU (Gaming PC unreliable)
- ❌ Not willing to build web UI (Telegram-only won't scale to B2B)
- ❌ No LLC/liability protection (healthcare/legal adjacent customers)

### Phased Approach

**Phase 0 (Now):** 
- Validate demand: 5 conversations with law firms/financial advisors
- Price test: "Would you pay $79/mo for unlimited private OCR?"
- Build waitlist

**Phase 1 (Month 1):**
- Buy hardware, set up dedicated inference server
- Build async API + MinIO storage
- Dogfood with personal documents

**Phase 2 (Month 2):**
- Stripe integration, billing
- Onboard 3 beta customers at $29/mo (discounted)
- Iterate on extraction accuracy

**Phase 3 (Month 3):**
- Dashboard web UI
- Public launch at $79/mo
- Target: 10 customers ($790/mo) → break even

**Conservative projection:** 10 customers by Month 6 = $790/mo revenue, $47/mo COGS, $743/mo gross profit. Annual: ~$8,900 gross profit on ~$700 hardware investment.

---

## Appendix: Competitive Moat Analysis

**Why customers choose self-hosted over cloud:**

| Customer Type | Cloud Fear | Our Pitch |
|---|---|---|
| Law firm | Malpractice if client data leaked | "Documents never leave your server. Zero cloud touch." |
| Financial advisor | SEC audit, client trust | "Audit trail shows local processing only." |
| Healthcare admin | HIPAA violation ($1.5M fine) | "No BAA needed — no third-party processing." |
| EU consultancy | GDPR Article 44 (data transfers) | "Data sovereignty guaranteed. EU server option available." |
| Privacy enthusiast | Surveillance capitalism | "Open source, self-hosted, auditable code." |

**Differentiation from Paperless-ngx:**
- Paperless: document *management* (storage, tagging, search)
- Us: document *processing API* (OCR, extraction, structured output)
- Complementary: customers use both

**Differentiation from cloud OCR:**
- Cloud: 99.9% uptime, infinite scale, higher cost, privacy risk
- Us: 99% uptime, limited scale, lower cost, zero privacy risk

The moat isn't features — it's **zero trust architecture** that cloud providers cannot replicate by definition.

---

*Document version: 2026-04-19*
*Status: Draft for review*
*Next step: Matt's go/no-go decision, then Phase 0 validation*