DocuPipe Logo

DOCUPIPE

    Solutions

    Resources

    Pricing

Document API for Developers

Documents in.
JSON out.

Parse any document. Extract any JSON schema.
One REST API. PDFs, images, Word, Excel, handwriting — anything.

Read the Docs

Request

curl -X POST https://app.docupipe.ai/document \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": {
      "file": {"url": "https://example.com/invoice.pdf"},
      "fileExtension": "pdf"
    },
    "workflowId": "YOUR_WORKFLOW_ID"
  }'

200 OK

Webhook payload

{
  "eventType": "standardization.processed.success",
  "documentId": "dcmt5678",
  "data": {
    "invoiceNumber": "08472",
    "vendor": "Martinez & Sons",
    "date": "2025-03-14",
    "total": 907.76,
    "currency": "USD"
  }
}

One POST with a workflow ID. Structured JSON delivered to your webhook when ready.

99%

Extraction Accuracy

1000+

Teams Using DocuPipe

<5s

Per Page Processing

Over 1 billion pages processed, and counting

Trusted by customers big and small across every industry
Customer logo 1
Customer logo 2
Customer logo 3
Customer logo 4
Customer logo 5
Customer logo 6
Customer logo 7
Customer logo 8
Customer logo 9
Customer logo 10
Customer logo 11
Customer logo 12
Customer logo 13
Customer logo 14
Customer logo 15
Customer logo 16
Customer logo 1
Customer logo 2
Customer logo 3
Customer logo 4
Customer logo 5
Customer logo 6
Customer logo 7
Customer logo 8
Customer logo 9
Customer logo 10
Customer logo 11
Customer logo 12
Customer logo 13
Customer logo 14
Customer logo 15
Customer logo 16
G2 Best Support
G2 High Performer
G2 Users Love Us
G2 Most Likely to Recommend
G2 Easiest To Do Business With

Rated 4.9/5 on G2 verified reviews

The Problem

Most document APIs stop at OCR

OCR is the easy part of document AI. The hard part is everything that comes after — and most APIs leave it to you. You paid for a document API and got an OCR engine plus a long list of TODOs.

Most APIs return raw text

You get a wall of unstructured text and have to write your own field extraction. Schema-aware extraction is bolted on as a separate service if it exists at all.

Multi-step pipelines

Upload to one service, OCR through another, run forms-analysis through a third, glue it together yourself. Each hop adds latency and a new failure mode.

Handwriting and scans break

Most APIs handle clean digital PDFs and fall over on real-world inputs: scanned forms, handwritten notes, photographed documents, faxed pages.

Compliance is an afterthought

HIPAA, SOC 2, BAAs, zero data retention — table stakes for production document workloads, often missing or gated behind enterprise sales motions.

Built for every document

You got it, we parse it

One API across every doc type. No specialized models to wire up, no per-vertical pipelines to maintain.

Invoices

Bills of lading

Medical records

Bank statements

Contracts

Leases

Purchase orders

Tax forms

IDs and passports

Lab reports

Insurance claims

Anything else

Capabilities

Parse and extract, one API

A complete set of primitives. Production-ready from day one — no assembly required.

Parse any document

PDFs, images, Word, Excel, scans, handwriting. Clean structured text with tables, layouts, and checkboxes preserved.

Extract any schema

Define a JSON schema once, run it across thousands of documents. Get exact-fit structured data — no post-processing, no glue code.

Tables and layouts

Tables come back as structured JSON arrays. Multi-column layouts parsed in correct reading order. Checkboxes detected.

60+ languages

English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hebrew, and more. Multilingual documents handled natively.

One REST API

Single endpoint, async by default, webhooks for completion, base64 or URL upload. Works with any language that speaks HTTP.

Fast and compliant

Under 5 seconds per page on most documents. SOC 2 Type II, ISO 27001, HIPAA. BAAs available for healthcare workloads.

Developer Experience

One POST. Webhook delivers.

Build a workflow once — parse, extract, and any post-processing — then POST documents to it. The structured JSON arrives at your webhook when processing finishes. No polling, no glue code.

REST API
Workflows
Webhooks
JSON Schema
URL or Base64
cURL
Python
Node.js

extract.py

import requests

# one call. the workflow handles parsing + schema extraction.
# structured JSON is delivered to your webhook when ready.
requests.post(
    "https://app.docupipe.ai/document",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "document": {
            "file": {"url": "https://example.com/invoice.pdf"},
            "fileExtension": "pdf",
        },
        "workflowId": "YOUR_WORKFLOW_ID",
    },
)

# your webhook receives the structured JSON when processing finishes
@app.post("/webhook/docupipe")
def webhook(payload: dict):
    if payload["eventType"] == "standardization.processed.success":
        print(payload["data"])  # exact-fit JSON matching your schema

Comparison

DocuPipe vs the cloud doc APIs

DocuPipeAWS TextractGoogle Document AIAzure Doc Intelligence
Schema-based extractionBuilt-inNo (separate Bedrock setup)Form parser onlyCustom model training
Handwriting recognitionYes, production-gradeBasicLimitedLimited
Table extractionFull structure as JSONBasic JSONBasic JSONBasic JSON
File typesPDF, images, Word, ExcelPDF, imagesPDF, imagesPDF, images
Language support60+ languages~30 languages~50 languages~70 languages
API designSingle REST endpointMultiple servicesMultiple servicesMultiple services
ComplianceSOC 2, ISO 27001, HIPAASOC 2, HIPAASOC 2, HIPAASOC 2, HIPAA
Pricing transparencyPer-page, on sitePer-feature, complexPer-feature, complexPer-feature, complex
Free tier300 pages, no cardLimited free tierLimited free tierLimited free tier

Pricing

300 free credits. Plans from $99/mo.

No credit card required to start. Per-page transparent pricing — no per-feature surprises.

See Pricing

FAQ

Frequently asked questions

DocuPipe is a document API for developers. Two core capabilities: parse any document into clean structured text, and extract any JSON schema you define from any document. One REST endpoint, async-by-default with webhooks, works with PDFs, images, Word, Excel, scans, and handwriting.

PDFs (native and scanned), images (PNG, JPG, TIFF, WebP), Word documents (DOC, DOCX), Excel spreadsheets (XLS, XLSX, CSV), plain text, JSON, XML, and HTML. AI-powered OCR handles handwritten content, checkboxes, and low-quality scans.

60+ languages including English, Spanish, French, German, Portuguese, Italian, Dutch, Swedish, Chinese, Japanese, Korean, Arabic, Hebrew, Hindi, Thai, and more. Multilingual documents handled natively.

You define a JSON schema for the structured data you want to extract — fields, types, descriptions, nested arrays, the whole thing. DocuPipe runs it against any document and returns exact-fit JSON matching your schema. You can also generate schemas interactively in the dashboard from example documents.

Most documents process in under 5 seconds per page. Calls are async by default — you get a job ID immediately and either poll for completion or receive a webhook callback when ready. Suitable for both interactive and batch workloads.

Production-grade OCR handles handwritten content, checkboxes, photographed documents, and low-quality scans. Tables and multi-column layouts are preserved in the correct reading order.

SOC 2 Type II, ISO 27001, and HIPAA compliant. BAA agreements available for healthcare workloads. All documents encrypted in transit (TLS) and at rest. Zero-data-retention available. Customer data is never used for model training.

Credit-based: most operations cost 1-2 credits per page. Start with 300 free credits — no credit card required. Paid plans start at $99/mo for 2,500 credits, with volume discounts on higher tiers. Full pricing on the pricing page.

REST API works with any language that speaks HTTP — Python, Node.js, Go, Java, Ruby, PHP, .NET. No-code integrations via Make.com, Zapier, and n8n. Output as JSON, CSV, or Excel — feed directly into your database, vector store, or downstream pipeline.

Yes — sign up gets you 300 free credits and a dashboard where you can upload your own documents and try extraction interactively. No credit card required.

Start building

300 free credits. One REST API. Your documents, your schema, structured JSON back.

Read the Docs