Put the two halves together — that users skim rather than audit, and that the dangerous errors are the plausible ones that survive a skim — and a single idea falls out: at the review stage, the user has an attention budget. It’s small, it’s fixed, and it gets spent whether or not the tool helps direct it. The user will look at the output for a few seconds, their gaze will land somewhere, and then they’ll move on. The only real question is whether those few seconds of attention were spent on the fields that needed checking or scattered across fields that didn’t. A tool that ignores the budget lets it fall randomly. A tool that respects it aims it.

This is a more useful frame than “the user should verify the output,” because it treats attention as the scarce resource it actually is rather than something the user can be exhorted to supply more of. You can’t make the budget bigger by wishing — telling users to review more carefully doesn’t create attention, it just produces guilt when they don’t. What you can do is decide where the budget gets spent. Every design choice at the review stage is implicitly an allocation: a uniform grid of identical-looking values spends the budget evenly, which means it spends most of it on fields that were fine and runs dry before the one that wasn’t. That’s the default, and the default wastes the scarcest thing the user has.

Spending the budget well means the tool has to bring its own knowledge to the allocation. The tool knows which fields it was unsure about, which came from a degraded part of the document, which sit at the edge of the plausible range, which required a leap of inference rather than a clean read. That knowledge is exactly the map of where attention should go. Surfacing it — pulling the shaky fields to the top, giving uncertain values visual weight, quieting the ones that are solid — turns the user’s few seconds into a guided pass instead of a random one. The tool isn’t adding work; it’s directing the work the user was always going to do toward the places it pays off.

The failure mode is spending the budget on the wrong things, and it’s easy to do without noticing. A tool that flags every field as needing review has effectively flagged none, because the user can’t act on a wall of warnings and tunes them all out. A tool that draws the eye with visual flourish unrelated to risk — highlighting the fields that were easy to extract because they were easy, decorating the confident parts — is actively misallocating, pointing attention at safety while the risk sits unmarked. Spending the budget badly is often worse than not trying to spend it at all, because it gives the user a false sense that they’ve looked where it matters when they’ve looked where it didn’t.

So the closing point of all this: at the review stage, the tool’s job isn’t to produce output and hope the user checks it. It’s to take responsibility for the one resource the user can’t expand — their attention — and spend it where it changes the outcome. Build the output so the skim lands on the plausible-wrong-answer instead of gliding past it. Treat the few seconds of review as a budget you’re allocating on the user’s behalf, because you are, whether you mean to or not. The tools that take that responsibility seriously are the ones whose mistakes get caught; the ones that don’t are the ones whose mistakes get trusted.