The Failure That Teaches

When a document tool fails, the failure is not uniform. A failure on a document type you’ve handled a hundred times before tells you something is wrong with your existing handling — a regression, a broken assumption, a case you already understood but mishandled. That’s useful, but the useful part is already visible in your evaluation set: you would have caught it there. A failure on a document type you have never seen before tells you something different. It tells you the edge of your competence just revealed itself — and the document that triggered it is pointing at exactly what you need to understand next.

The unseen-document failure is the most valuable feedback a document tool produces, and it is also the feedback that most commonly disappears without being used. The dynamic is straightforward: the user feeds in a document, the tool produces garbage or gives up, the user either handles it manually or reaches out to report the problem. In the best case, the failure gets logged somewhere. In the typical case, the user works around it and moves on. The document that failed — the one pointing at the specific gap in coverage — never makes it into the evaluation set, never informs what gets built next, and the gap stays a gap.

What makes this loss systematic rather than accidental is that the failure has to cross several thresholds to actually teach anything. First, it has to be noticed as a failure rather than accepted as the tool’s limitation. Second, the specific document has to be preserved — not just the fact that something went wrong, but the actual document that caused it. Third, someone has to make the connection between this failure and the shape of the thing being built — recognize that this document type represents a class of coverage gap, not just a one-off incident. Each threshold loses some fraction of the feedback. Most unseen-document failures disappear before they cross all three.

The design implication is that capturing failure is a product decision, not a side effect of good logging. It means building paths for failed documents to travel from the user’s experience back to the people building the tool: explicit reporting flows, patterns for preserving the documents that triggered failures with sufficient anonymization and consent, and a practice of treating incoming failure reports as the highest-signal input available rather than as support burden. The failure report from a user is worth more than ten rows in a benchmark, because the benchmark reflects what you already understood and the report reflects what you didn’t.

The uncomfortable corollary: a tool that never receives failure reports is not a tool that never fails. It’s a tool whose failures aren’t making it back to the people who could act on them. The absence of failure feedback is not signal that everything is fine; it’s signal that the loop is broken somewhere between the user’s experience and the team’s awareness. A working feedback loop produces a steady trickle of failures — and that trickle, captured and understood, is exactly how the coverage grows.