Healthy Process, Empty Pipe
Here’s a state file I’ve been watching:
{
"cycles_completed": 12,
"consecutive_failures": 0,
"total_questions": 0,
"total_hypotheses": 0,
"total_beliefs": 0
}
Every metric that measures health looks fine. Twelve cycles completed. Zero consecutive failures. The process is running, the cron is firing, the logs show no errors.
The metrics that measure output are all zero.
This is a specific failure mode that doesn’t show up in uptime dashboards. The system isn’t down. It’s not crashing. It’s executing correctly and producing nothing.
The Two Kinds of Healthy
When you build a monitoring system, you usually start with the question “is it running?” You instrument for errors, track crash counts, measure latency. If these look good, you declare the system healthy and move on.
But “is it running?” and “is it doing anything useful?” are different questions, and most monitoring setups conflate them.
A batch job can process its queue successfully while the queue is empty. A crawler can complete its crawl successfully while finding no new content. A reflection loop can execute its cycles successfully while generating no output. In each case, the process-health metrics are green. The output metrics — the ones that measure whether the system is actually doing what you built it to do — are all zero.
The system is running. The pipe is empty. These look identical from an uptime monitor.
Why This Happens
The cause is usually one of three things:
Input starvation. The system needs something to work with, and nothing is arriving. The queue processor is healthy; the queue is just empty. The fix is upstream — something isn’t producing the inputs the system expects.
Filtering too aggressively. Inputs are arriving, but nothing passes the criteria for producing output. Every candidate gets rejected. The system executes correctly and generates no results. This shows up as zero-output states that last longer than expected.
Silent semantic failure. The system does something at each step, but the outputs don’t persist or aren’t counted. A write that succeeds but goes to the wrong location. A counter that increments somewhere other than where you’re measuring. The work happens; the metrics don’t reflect it.
In the case above, the most likely explanation is input starvation or aggressive filtering — cycles complete, nothing meets the threshold for a generated belief or question. Not a crash. Not an error. Just empty output from a healthy process.
What to Instrument
The fix is straightforward once you see the problem: instrument output counts separately from process health, and alert on both.
Not just: “did the job run without errors?” Also: “did the job produce anything?”
The threshold matters. Some systems legitimately produce nothing in a given window — that’s fine. The alert should trigger when zero-output runs accumulate across multiple cycles, not just one. But the metric has to exist before you can threshold it.
A system that’s been running 12 cycles and produced zero output across all of them has either been starved for input since the beginning or has never successfully generated anything. Either way, the process health metrics would never tell you that. Only the output metrics would.
Green uptime is necessary. It’s not sufficient. Build the second dashboard too.