Executive Summary: Private Document Processing API
Product Definition
What: A self-hosted document OCR + extraction API that processes images/PDFs through a local vision model (qwen3-vl:8b) and returns structured JSON. Zero cloud vision processing — documents never leave your infrastructure. Designed for privacy-conscious businesses (legal, healthcare-adjacent, financial services, EU GDPR compliance).
Current State: 486 lines of working code (document_sorter.py) that:
- Receives images via Telegram DM → saves to /tmp/dropbox/ → base64 encodes → sends to Gaming PC Ollama qwen3-vl:8b
- Extracts: {vendor, date, category, amount} with strict taxonomy
- Builds filename: YYYY-MM-DD_Vendor_Category_$Amount.ext
- Uploads to Google Drive (currently blocked — Google account suspended)
- Cleans up temp files in finally block
- Uses keep_alive: 0 to immediately unload vision model and free VRAM
Gap to Revenue: The code works for personal use. To sell it as an API service requires: always-on GPU, multi-tenant auth, web API (not Telegram), billing, PDF support, and output storage (not Google Drive).
Market Analysis
Cloud OCR Pricing (competitive benchmarks)
| Provider | Basic OCR | Forms/Tables | Custom Extraction | Notes |
|---|---|---|---|---|
| AWS Textract | $1.50/1K pages | $65/1K pages | N/A | 97% markup for structured data |
| Google Document AI | $1.50/1K pages | $10/1K pages | $30/1K pages | $300 free credit |
| Azure Document Intelligence | $1.50/1K pages | $10/1K pages | $30/1K pages | 65% discount at commitment tier |
| Mistral OCR 3 | $2/1K pages | $2/1K pages | $2/1K pages | Batch: $1/1K pages (50% off) |
The privacy moat: None of these providers can say "your documents never touch a cloud server." For legal firms, healthcare-adjacent services, financial advisors, and EU businesses under GDPR — this is compliance, not preference.
Target Market Segments
| Segment | Monthly Volume | Willingness to Pay | Key Pain Point |
|---|---|---|---|
| Small law firms (1-5 attorneys) | 500-2,000 pages | $49-99/mo | Cloud OCR = malpractice risk |
| Financial advisors (RIA) | 1,000-5,000 pages | $99-299/mo | Client data in AWS = compliance nightmare |
| Healthcare-adjacent (billing, admin) | 2,000-10,000 pages | $199-499/mo | HIPAA adjacent, can't risk breaches |
| EU consultancies | 500-3,000 pages | $79-149/mo | GDPR, data sovereignty |
| Self-hosting enthusiasts | 100-1,000 pages | $29-49/mo | Ideological, not price-sensitive |
Market validation:
- Paperless-ngx has 50K+ GitHub stars — proven demand for self-hosted document management
- CleanRoll.ai raised funding for CRE rent roll extraction — validates OCR-as-a-service niche
- Mistral OCR 3 launched December 2025 with $1/1K pages pricing — proves market pressure on cloud pricing
Technical Architecture
Current Implementation (v0.1 - Personal Use)
Telegram DM (image) → /tmp/dropbox/ → base64 → HTTP POST to Gaming PC Ollama
↓
qwen3-vl:8b inference (120s timeout)
↓
JSON extraction → filename builder
↓
Google Drive upload (BLOCKED)
↓
👍 Telegram reaction
Hardware dependency: Gaming PC (3080 Ti, 12GB VRAM) must be ON and connected via Tailscale. Windows + Ollama + keep_alive: 0.
Required Architecture (v1.0 - Revenue Service)
Client POST /api/v1/extract (image/PDF + api_key)
↓
FastAPI auth layer (rate limit, tenant isolation)
↓
PDF → image conversion (if needed) via pdf2image
↓
Local Ollama OR always-on GPU endpoint
↓
qwen3-vl:8b inference (15-30s per page)
↓
JSON extraction → structured output
↓
MinIO S3-compatible storage (self-hosted, not AWS)
↓
Webhook callback OR polling endpoint for results
↓
Stripe metered billing (per-page + monthly base)
Critical change: Replace Google Drive with MinIO (self-hosted S3-compatible object storage) or local filesystem with CDN. Never touch AWS/GCP/Azure for file storage.
Gap Analysis: Current → Revenue
| Component | Status | Effort | Blocker |
|---|---|---|---|
| Vision model inference | ✅ Works (Gaming PC) | — | Must be always-on |
| Multi-page PDF support | ❌ Not built | 3-4 days | pdf2image + page iteration |
| API authentication | ❌ Not built | 2-3 days | JWT or API key auth, tenant isolation |
| Rate limiting / quotas | ❌ Not built | 2-3 days | Redis or in-memory tracking |
| Async job queue | ❌ Not built | 4-5 days | Celery + Redis or FastAPI background tasks |
| Result storage (MinIO) | ❌ Not built | 2-3 days | Self-hosted S3-compatible storage |
| Webhook callbacks | ❌ Not built | 1-2 days | POST to client endpoint with results |
| Billing (Stripe) | ❌ Not built | 3-4 days | Metered billing, usage tracking |
| Dashboard / status page | ❌ Not built | 5-7 days | Web UI for job status, usage, API keys |
| PDF preprocessing | ❌ Not built | 3-4 days | Deskew, denoise, OCR optimization |
| Error handling / retries | ⚠️ Partial | 2-3 days | Dead letter queue, client alerts |
Total engineering effort: 4-6 weeks for MVP (one person, nights/weekends).
Hardware Investment Required
Current Setup Bottlenecks
| Issue | Current State | Impact |
|---|---|---|
| Gaming PC not always-on | 3080 Ti sleeps, wakes on demand | OCR unavailable 60%+ of day |
| Windows power management | Sleep mode, updates | Unpredictable downtime |
| Tailscale dependency | Windows → Beelink → Internet | Two points of failure |
| No UPS | Power outage = data loss | Unacceptable for paid service |
Recommended Hardware Upgrades
Option A: Dedicated GPU Server (Recommended)
| Component | Cost | Purpose |
|---|---|---|
| Used RTX 3060 12GB (eBay) | $180-220 | Dedicated inference GPU, 24/7 operation |
| Low-power x86 SFF PC (used Dell/HP) | $150-250 | Host for GPU, headless Ubuntu |
| 650W PSU (if not included) | $50-80 | Power for GPU |
| PCIe riser cable (if SFF) | $25-40 | Physical fit |
| 1TB NVMe SSD | $80-100 | Model storage + job queue |
| Total | $485-690 | Always-on inference server |
Power consumption: ~150W under load, ~30W idle = ~$15-25/mo electricity.
Alternative: Used RTX 2060 Super 8GB ($120-150) — enough for qwen3-vl:8b, cheaper entry.
Option B: NVIDIA Jetson Orin Nano
| Component | Cost | Notes |
|---|---|---|
| Jetson Orin Nano 8GB Dev Kit | $499 | ARM, lower power (~25W max) |
| 256GB NVMe | $40 | Storage |
| Total | $539 | Lower power, ARM ecosystem |
Tradeoffs:
- Pros: Lower power (~$5/mo), smaller footprint, purpose-built for edge AI
- Cons: ARM architecture (some Python wheels don't exist), slower inference than desktop GPU, 8GB RAM limits concurrent jobs
Recommendation: Option A (used RTX 3060 + SFF PC). More flexible, faster inference, easier troubleshooting.
Option C: Upgrade Beelink (Not Recommended)
Intel N150 has no PCIe slot for GPU. External GPU via Thunderbolt/USB4: $300 enclosure + $200 GPU = $500, more complex, lower bandwidth. Skip.
Cost Model: Self-Hosted vs Cloud
Monthly Operating Costs (Self-Hosted)
| Cost | Amount | Notes |
|---|---|---|
| Electricity (150W × 24h × 30d) | $20-30 | At $0.15/kWh |
| Internet (already paid) | $0 | Home connection sufficient |
| Domain + Cloudflare (already paid) | $0 | Existing setup |
| Hardware depreciation ($600 / 36mo) | $17 | 3-year lifespan |
| Total monthly COGS | $37-47 | Per-tenant marginal cost ≈ $0 |
Pricing Strategy
Target: Undercut cloud providers by 50% while offering privacy premium.
| Tier | Price | Includes | Cloud Equivalent |
|---|---|---|---|
| Starter | $29/mo | 1,000 pages, 1 user, email support | AWS: $65-1,500 |
| Professional | $79/mo | 5,000 pages, 3 users, webhooks, SLA | AWS: $325-7,500 |
| Business | $199/mo | 20,000 pages, 10 users, API access, priority | AWS: $1,300-30,000 |
| Enterprise | $499+/mo | Unlimited pages, custom models, dedicated infra | AWS: custom quote |
Break-even: At $79/mo × 10 customers = $790/mo revenue. COGS $47/mo. Gross margin 94%.
Dev/Test Cycles
Phase 1: Hardware (Week 1)
- Acquire used RTX 3060 + SFF PC
- Install Ubuntu 22.04 LTS, Ollama, qwen3-vl:8b
- Verify inference speed: target <30s per page
- Configure Tailscale static IP or Cloudflare Tunnel
Phase 2: Core API (Weeks 2-3)
- FastAPI scaffolding with auth (API keys)
- Image upload endpoint (sync) → returns job_id
- Async job processing with Celery + Redis
- MinIO setup for result storage
- Webhook callback system
Phase 3: PDF Support (Week 4)
- pdf2image integration for multi-page PDFs
- Page-by-page processing with progress tracking
- Zip output for multi-page docs
- Bulk upload endpoint
Phase 4: Billing (Week 5)
- Stripe metered billing integration
- Usage tracking per API key
- Automatic overage handling
- Invoice generation
Phase 5: Dashboard (Weeks 6-7)
- Minimal web UI: job status, usage charts, API key management
- Status page with uptime metrics
- Error logs view (sanitized)
Phase 6: Security Hardening (Week 8)
- Rate limiting (prevent abuse)
- Input validation (prevent injection)
- Audit logging (who processed what when)
- TLS termination via Cloudflare Tunnel
- Fail2ban for SSH/API brute force
Testbench Demo: Go/No-Go Criteria
Test 1: Inference Performance
Setup: Single page receipt, qwen3-vl:8b, RTX 3060 12GB
Target:
- Cold start (model not loaded): <60 seconds
- Warm inference: <20 seconds per page
- Concurrent requests (3): <90 seconds each
Go if: Average <30s per page under load
Test 2: Accuracy Benchmark
Dataset: 100 documents (mix of receipts, invoices, contracts)
Target:
- Vendor name extraction: >90% accuracy
- Date extraction: >95% accuracy (correct format)
- Amount extraction: >95% accuracy (within $0.01)
- Category classification: >85% accuracy
Go if: Overall field extraction >90% without human correction
Test 3: Uptime & Reliability
Duration: 7 days continuous operation
Target:
- Uptime: >99% (excluding planned maintenance)
- Zero memory leaks (Ollama stays responsive)
- Graceful degradation under load (queue management)
- Automatic recovery from GPU OOM
Go if: Zero unplanned outages, <5min recovery time
Test 4: End-to-End Latency
Scenario: Client POST → processing → webhook callback
Target:
- P50 latency: <45 seconds
- P95 latency: <120 seconds
- P99 latency: <300 seconds (large PDFs)
Go if: P95 <120s for single-page documents
Test 5: Cost Validation
Measurement: 30-day electricity + bandwidth
Target:
- Electricity: <$40/mo
- Bandwidth: <100GB/mo (no overage)
- Hardware: no thermal throttling, <80°C GPU
Go if: Monthly COGS <$50
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Hardware failure (GPU/SSD) | Medium | High | Hot spare, automated backups, 2-day replacement |
| Ollama/qwen3-vl model update breaks API | Low | High | Pin model version, staged rollout, rollback plan |
| Customer data breach (local) | Low | Critical | Encryption at rest, no remote access, audit logs |
| Power outage | Medium | Medium | UPS (CyberPower 1500VA, $150), graceful shutdown |
| Internet outage | Low | High | 4G failover (optional), queue-and-retry |
| Legal liability (OCR error) | Medium | Medium | Terms of service disclaimer, $ liability cap |
| Stripe account freeze | Low | High | Multi-processor backup (LemonSqueezy) |
| Beelink SSD death (cascading failure) | Medium | High | Daily backups to Gaming PC + cloud (encrypted) |
Recommendation
Go/No-Go Decision: CONDITIONAL GO
Proceed if:
1. ✅ Can invest $500-700 in dedicated GPU hardware within 30 days
2. ✅ Willing to spend 6-8 weeks part-time on MVP
3. ✅ First 3 paying customers identified (even if just "would you pay for this?" conversations)
4. ✅ Accept 6-month payback period on hardware
Defer if:
- ❌ Cannot guarantee always-on GPU (Gaming PC unreliable)
- ❌ Not willing to build web UI (Telegram-only won't scale to B2B)
- ❌ No LLC/liability protection (healthcare/legal adjacent customers)
Phased Approach
Phase 0 (Now):
- Validate demand: 5 conversations with law firms/financial advisors
- Price test: "Would you pay $79/mo for unlimited private OCR?"
- Build waitlist
Phase 1 (Month 1):
- Buy hardware, set up dedicated inference server
- Build async API + MinIO storage
- Dogfood with personal documents
Phase 2 (Month 2):
- Stripe integration, billing
- Onboard 3 beta customers at $29/mo (discounted)
- Iterate on extraction accuracy
Phase 3 (Month 3):
- Dashboard web UI
- Public launch at $79/mo
- Target: 10 customers ($790/mo) → break even
Conservative projection: 10 customers by Month 6 = $790/mo revenue, $47/mo COGS, $743/mo gross profit. Annual: ~$8,900 gross profit on ~$700 hardware investment.
Appendix: Competitive Moat Analysis
Why customers choose self-hosted over cloud:
| Customer Type | Cloud Fear | Our Pitch |
|---|---|---|
| Law firm | Malpractice if client data leaked | "Documents never leave your server. Zero cloud touch." |
| Financial advisor | SEC audit, client trust | "Audit trail shows local processing only." |
| Healthcare admin | HIPAA violation ($1.5M fine) | "No BAA needed — no third-party processing." |
| EU consultancy | GDPR Article 44 (data transfers) | "Data sovereignty guaranteed. EU server option available." |
| Privacy enthusiast | Surveillance capitalism | "Open source, self-hosted, auditable code." |
Differentiation from Paperless-ngx:
- Paperless: document management (storage, tagging, search)
- Us: document processing API (OCR, extraction, structured output)
- Complementary: customers use both
Differentiation from cloud OCR:
- Cloud: 99.9% uptime, infinite scale, higher cost, privacy risk
- Us: 99% uptime, limited scale, lower cost, zero privacy risk
The moat isn't features — it's zero trust architecture that cloud providers cannot replicate by definition.
Document version: 2026-04-19
Status: Draft for review
Next step: Matt's go/no-go decision, then Phase 0 validation