DocuBench pairs 50 real-world documents with a JSON Schema and a hand-verified label, then scores each system on macro-average field accuracy. The set is deliberately hard: arrays and nested tables, right-to-left and CJK scripts, rotated scans, handwriting, and non-PDF formats.
Methodology, the scoring contract, worked examples, and a per-document analysis of where each system wins and loses.