AI document-extraction benchmark

DocuBench

DocuBench pairs 50 real-world documents with a JSON Schema and a hand-verified label, then scores each system on macro-average field accuracy. The set is deliberately hard: arrays and nested tables, right-to-left and CJK scripts, rotated scans, handwriting, and non-PDF formats.

Macro-average field accuracy

Compare

50 / 50 docs

⌘-click a model in the list to highlight it

Language

Length

Format

Challenges

97.24%

DocuPipe high

leads across all 50 documents

No documents match the current slice. Loosen a filter.

Accuracy by slice

Per-document scores

No documents match the current slice.

Learn more

The full DocuBench write-up

Methodology, the scoring contract, worked examples, and a per-document analysis of where each system wins and loses.

Read the write-up →