The Production Gap
A document tool's performance on your evaluation set and its performance on your users' actual documents are two different numbers. The gap between them is structural, not a bug — and closing it requires a different kind of work than improving the eval.