AI document-extraction benchmark

DocuBench

DocuBench pairs 50 real-world documents with a JSON Schema and a hand-verified label, then scores each system on macro-average field accuracy. The set is deliberately hard: arrays and nested tables, right-to-left and CJK scripts, rotated scans, handwriting, and non-PDF formats.

Macro-average field accuracy

Compare
50 / 50 docs
⌘-click a model in the list to highlight it
Language
Length
Format
Challenges
97.24%
DocuPipe high
leads across all 50 documents
No documents match the current slice. Loosen a filter.

Accuracy by slice

Per-document scores

No documents match the current slice.

Learn more

The full DocuBench write-up

Methodology, the scoring contract, worked examples, and a per-document analysis of where each system wins and loses.

Read the write-up →