A team is six months into running agents for most of their implementation work. They have done the right thing all along: every agent session gets logged, every correction the human made gets captured, every “the agent did the wrong thing” moment gets noted in the rules file the next agent will read. The rules file started at eighty lines and is now fourteen hundred. It contains entries that contradict each other, entries that cover one-off incidents nobody can remember the context of, entries that were added to fix a problem and quietly became the cause of a different one. Nobody on the team wants to touch the file. The agent that reads it at the start of each task spends a substantial fraction of its context budget on rules that no longer apply, and when it produces wrong output, the team’s response is to add another rule. The rules file is doing the opposite of what it was supposed to do. The captured feedback is not improving the system. It is degrading the input the system reads.
This is the failure shape the popular feedback-everything school produces at scale. Agent feedback loops are necessary. The leverage you get from running agents at all multiplies sharply once the system is learning from its own work. But the standard pitch (capture every session, log every decision, build a robust dataset, feed it back into the rules) has a structural flaw the dataset-as-institutional-memory framing does not surface. The dataset itself becomes noisier as it grows, and when the system is allowed to update its own rules directly from that dataset without a strict filter, the rules degrade with it. The teams that get sustained leverage have figured out that capture and promotion are different disciplines with opposite cost profiles: capture should be cheap and aggressive because you do not know in advance what you will need, and promotion should be expensive and conservative because every rule change compounds across every future session that loads the rule. That asymmetry, made explicit, is the engineering discipline that separates compounding feedback from rule-set pollution.
The capture-everything school, in its strongest form#
The school that says capture everything and let the dataset speak is not stupid. It came out of the disciplines that treat data as the durable artifact and the model as the disposable one: machine learning practice, recommender systems, search relevance, the half-century of statistical work that built every modern data-driven product. The school’s strongest claim is that you cannot know in advance which signals will turn out to matter, so the right move is to instrument exhaustively, store everything, and let downstream analysis decide what was load-bearing after the fact. Throwing away data at capture time is throwing away optionality you cannot recover. Better to over-capture and decide later than to under-capture and wish you had it.
The cleanest articulation of the school’s strongest version is Andrew Ng’s framing, from a 2021 talk on data-centric AI: noting that the field already treats data preparation as roughly 80 percent of the work, Ng argued that “ensuring data quality is the important work of a machine learning team.” The 80 percent figure is a heuristic Ng is reframing normatively, not a measured statistic, but the move is the substantive one: stop treating data as the input to the work, start treating it as the work itself. Translated to operational agent systems, every session is a data point, every correction a label, every observed failure a signal. Build the corpus. Feed it back. The system that has been running longest has the largest corpus, and the largest corpus is the moat.
The school is right about capture. The capture half of the argument is the half I’m not engaging. The asymmetry I’m about to develop makes the same point: capture should be cheap and aggressive. Where the school slides into the failure mode is the next move it tends to make: that since the dataset is the durable artifact, and since the dataset captures everything that happened, the rules that govern future behavior should be derived from the dataset directly. More observation should mean more rules. More feedback should mean a richer rule set. Automate the derivation, and the system gets steadily smarter as it sees more of itself.
It’s worth noting that the most rigorous current work in this space is already pulling away from the purest version of that move. RLHF systems train a reward model on human preference comparisons and then optimize the policy against the reward model under a KL constraint, not against raw observations directly, precisely because direct policy updates from raw observation produce reward hacking and behavioral drift. Continuous-deployment recommendation systems run guardrails and online experiments before any captured signal modifies the production policy. Even within the data-centric school itself, the second-generation work has been moving toward curation rather than accumulation. The “everything-in-the-dataset becomes an update to the rules” version of the school survives mostly in pitch decks, not in the production systems that have to live with the consequences. The drift toward gated promotion is happening across the disciplines that have run feedback loops longest. The version of the school I’m engaging is the one a team adopting agent feedback for the first time is most likely to inherit, because it sounds like the obvious extension of “more data is better.”
The three-stage loop: capture, analyze, promote#
The structural answer to the question what does a feedback loop that compounds actually look like is three stages with explicit boundaries between them.
Capture is the first stage. Every implementation session produces signal about what worked and what did not: decisions made and why, friction encountered and what caused it, deviations from the spec and whether they were justified, moments where the agent needed correction, moments where the spec was wrong. That signal either evaporates when the session ends or it gets written down. The capture stage’s job is to make sure it gets written down. The artifact is a session reflection: short, structured, recorded close enough to the moment that the engineer can still remember what was actually happening, not what the post-hoc reconstruction wants to say happened.
Analyze is the second stage and the one most teams skip or compress. Captured artifacts accumulate. A dedicated pass reads the unprocessed artifacts together, looks for recurring patterns across them, classifies what kind of problem each pattern actually is, and proposes specific changes to the system. The output of the analyze stage is not a rule change. It is a recommendation, with evidence count, with classification, with the developer’s stated reason any item was rejected or deferred. The analyze stage is the work the popular school tries to skip by automating from raw capture directly to rule update.
Promote is the third stage, where approved recommendations actually become rule changes. Rule files get edited. Agent specs get updated. Templates get revised. Skills get refined. Whatever the system uses to govern future behavior gets the proposed change applied. Promotion is the only stage that compounds: every promoted rule will load into every future session that touches the relevant work. The cost of a wrong promotion is paid every future run, and the value of a right promotion is also collected every future run. The asymmetry of the cost is what makes the promotion stage need a different posture than the capture stage.
The discipline that separates this loop from the failure-mode loop is what sits between the stages. Capture is cheap and aggressive: write everything down, do not pre-filter. Promote is expensive and conservative: change rules only when the analysis has earned the change. The analyze stage is the boundary between the two postures, and the promotion gate is the mechanism the analyze stage uses to enforce the boundary. The loop self-corrects when the analyze stage proposes changes a human approves, rejects, or defers with stated reasons that future analysis runs can read. The loop becomes self-reinforcing (and degenerative) when the analyze stage is allowed to apply its own recommendations, or when a human nominally approves them but does so by skimming a list rather than engaging each one.
The framing inheritance is worth naming once: this is the same shape that tiered planning (the discipline of breaking planning into wide-to-narrow tiers with validation checkpoints between them) uses at the planning function and that boundary contracts use between agents. Different work, different scale, same structural move. A discipline boundary in the middle of a pipeline that controls what gets to compound and what gets discarded.
Capture is cheap, and that’s the design#
The capture side is the half that does the least damage when overdone. The artifacts are small. The cost of writing a session reflection is bounded. The artifacts sit in storage, and nothing changes for any future session because the artifacts exist. The downside of capture being too aggressive is that the analysis step has more material to read, which is a real cost but a manageable one. The downside of capture being too sparse is that the analysis step is reading from a corpus that does not include the signal the team most needs to see, which is unrecoverable. The asymmetry favors over-capture.
The shape that has worked is a structured per-session artifact written by the engineer who ran the session, committed to the same source repository the work itself lives in, with a small fixed schema: what the work was, what worked, what did not work, what surprised you, any explicit observation you want to flag for later analysis. Five to fifteen minutes of writing per session, ten times that on a session that surfaced something genuinely new. Across a team of engineers running a few sessions a day each, a quarter accumulates dozens of artifacts, sometimes more than a hundred. The capture corpus gets fat fast. The fatness is fine. The next stage is built to handle it.
What the capture stage should not do is filter. The engineer writing the artifact does not know which observation will turn out to matter, and the per-session view is too narrow to see patterns across sessions anyway. Pre-filtering at capture time substitutes one engineer’s judgment at one moment for the cross-session analysis that should be doing the work. The instinct to pre-filter is strong because writing things down that might not matter feels wasteful, but the cost ratio runs the other way: a capture that turns out not to matter costs only the writing time, while a capture omission costs every future session that hits the same problem without the rule that would have prevented it. Capture aggressively, decide later.
Promotion is expensive, and that’s also the design#
The promotion side has the inverse cost profile. A promoted rule lives in the system. Every future session that loads the file containing the rule reads it, whether the rule applies to that session or not. The reading is not free: it consumes the agent’s attention budget for the rules that govern its work, it competes with every other rule for salience at the decision moments where the rule was supposed to land, and it compresses the room available for the next rule the team will want to add. A wrong promotion is paid in perpetuity, in attention units, across every future session.
A wrong promotion also degrades the rules around it, in a way that takes some pattern recognition to see. Rules in a rule file do not exist in isolation. The rules that were already there are part of the context the new rule lands in. Adding a rule that is technically correct but covers a one-off case dilutes the salience of the rules that cover the recurring cases. The agent reading the file has to weight the new rule against the old ones, and the weighting tends to favor the most recently-added or the most-specifically-stated rule, neither of which is reliably the rule that should be controlling behavior. The system gets louder. It does not get smarter.
This is rule-set pollution: the pollution-family member that arises when a rules file accumulates raw observations and one-off corrections without a promotion gate filtering raw input from rule-worthy patterns. The same shape as context pollution at the conversation scale, the same shape as chatter pollution at the multi-agent scale, the same shape as inheritance failure at the planning scale. The surface accumulates noise faster than it accumulates signal, and the surface’s quality degrades not from running out of capacity but from filling with debris. The fact that the surface is a markdown file checked into source control rather than a transformer’s context window does not change the failure mode. Drew Breunig’s context distraction names the model-side version: the model over-focuses on the context and neglects what it learned during training. Rule-set pollution is the same dynamic at a different layer: the agent over-focuses on the surfaced rules and neglects the load-bearing ones because every rule that should be load-bearing is now sharing the file with rules that should not be.
The cost ratio inverts the capture stage. A rule that turns out to be wrong costs every future session it loads in. A rule omission that turns out to matter costs the team the next time they hit the omitted case, after which the case shows up in the next round of captured artifacts and gets analyzed. The cost of a wrong promotion is permanent and compounding. The cost of a missed promotion is bounded and self-correcting through the next loop iteration. The cost ratio favors the conservative side by orders of magnitude. Promote slowly, keep the rules earning their place.
The portable rule that holds the asymmetry together, stated as the discipline’s operating slogan: capture aggressively, promote conservatively. The full form: capture aggressively, because the cost of recording is bounded and the cost of missing the signal is unbounded. Promote conservatively, because the cost of a wrong rule compounds and the cost of a delayed rule does not.
Threshold-based promotion: the analyze stage’s job#
The middle stage is where the discipline lives. Threshold-based promotion is making the threshold for “rule-worthy pattern” explicit and revisable rather than implicit in whoever happens to be reviewing the dataset that week. The threshold is the discipline that separates “this pattern appeared and we noticed it” from “this pattern appeared often enough across enough independent sources that the system should change in response.”
The simplest version of the threshold rule, stated as a portable line: one incident is not a signal. A pattern that appeared in a single session, even a session that produced a sharp friction point, goes onto a watch list and waits for confirmation. The watch list is cheap. The rule change is not. The next analysis pass checks whether the watched pattern reappeared. If it did, in another session by another engineer, ideally on another work item, the threshold has been met and the recommendation moves toward promotion. If it did not, the pattern stays on the watch list until either it accumulates evidence or it ages out.
The threshold is not a specific number that holds across all patterns. The right threshold for a pattern that would change a lightweight template is lower than the right threshold for a pattern that would change the core rule file every agent loads on every session. The right threshold for a pattern that has clearly visible failure modes is lower than the right threshold for a pattern whose failure modes are subtle and might be the wrong diagnosis. The discipline is to set the threshold for that pattern, in advance, and then hold it. Inferring a threshold retrospectively (deciding the pattern was rule-worthy because the rule got promoted, then justifying the promotion by the threshold the promotion implied) is the failure mode. The threshold has to be the input to the decision, not the post-hoc rationalization for it.
The shape that has worked is something like:
Item: <pattern description>
Status: deferred
Evidence count: 4 occurrences in this batch (single batch)
Threshold: wait for 2-3 more independent occurrences
across separate work items before promoting
Watch reason: high evidence in one batch can be a batch
artifact rather than a real pattern
Next check: next analysis run
That artifact doesn’t look impressive. It’s a small structured note explaining why the system didn’t change in response to four independent observations of the same friction point. The discipline is in the not-changing. The next batch either confirms the pattern is real and widely distributed, or it does not. If the pattern accumulates ten more independent occurrences across two more analysis batches, the threshold has been decisively met and the recommendation moves to ready-for-implementation. If the pattern fails to reappear, the watch entry ages out and the system was correctly conservative not to promote it.
The visible cost of threshold-based promotion is delay. A real pattern caught in batch one waits one or two batches longer to produce a rule than it would in a system that promoted on first observation. The delay is the price the system pays for the protection it gets against batch-specific noise. The trade is worth it because the unprotected version produces the rules file that doubled in size and got worse, and the protected version produces a rules file that grew slowly, kept its signal, and continued to compound.
The full artifact the analyze stage runs on each candidate is a five-condition gate. Stated in the literal form a team can copy and adapt:
Promotion gate: raw capture → rule change
A captured observation may be promoted to a rule only when:
1. Frequency: the same failure shape has appeared in
at least N independent captures across the last M weeks.
(Defaults: N=3, M=4. Revise quarterly.)
2. Generality: the rule, written for the failure shape,
does not require describing one specific incident.
If the rule reads "do not do what happened on October 3,"
it is not yet general enough.
3. Cost test: the rule, applied to the last K runs in the
captures (K >= 10), would have prevented the captured
failures without producing wrong output on runs that
did not exhibit the failure.
4. Removal cost: the rule, if it turns out to be wrong,
is removable without breaking other rules. If removing
this rule requires also removing or rewriting three
other rules, the rule is too entangled. Refactor first.
5. Owner: every promoted rule names the person who promoted
it. Anonymous promotion is the start of pollution.
The five conditions encode the work the implicit gate skips. Frequency prevents one-off promotion. Generality prevents the rules file from accumulating incident-specific entries that future agents cannot apply. Cost test separates a pattern that earns a rule from a pattern that earns a workaround. Removal cost names the structural debt dimension explicitly: a rule entangled with three other rules is not an asset, it is debt the team will pay every time the original rule turns out to be wrong, and addition is asymmetric with removal in the same way that writing tests is asymmetric with deleting them. Adding a rule is one decision in one moment. Removing it requires reading the file as a whole, which is the work nobody has time for. The gate forces the question before the rule lands rather than after. Owner closes the accountability loophole: every promoted rule names the human who promoted it, because anonymous promotion is the linguistic move that hides the decision from the next analysis pass.
The cleanest articulation of the underlying point belongs to Kent Beck, whose framing of taste as the senior engineer’s recurring shorthand for the judgment that knows when complexity has crossed from useful to harmful applies cleanly here. The gate’s job is to encode taste in a form that survives any one person’s attention budget. The threshold and the five conditions are the encoding. Beck’s “eating the seed corn” framing applies in the inverse direction: the team that promotes every observation is not eating the seed corn, it is choking on it.
The analyze stage classifies, it doesn’t just count#
Counting occurrences is the most visible part of the analyze stage. It’s not the load-bearing part. The harder discipline is classification: what kind of problem each captured observation actually is, before deciding whether to act on it.
The same observation can be any of several different things, and the right response depends on which thing it is. Missing rule is the case the popular school assumes every observation is: the agent did the wrong thing, the rule that would have prevented the wrong thing was not in the spec, add the rule. This is sometimes the right diagnosis and is the case the promotion pipeline is designed for. Compliance failure is the case where the rule already exists, has been in the spec for some time, has been there in multiple locations, and the agent did not honor it. Adding more rule text will not change behavior. The fix is upstream of the rule file, in how the agent’s spec gets composed or in how it loads the relevant context, not in adding another restatement of the same rule. Tooling or data issue is the case where the agent did the right thing given the tools available, but the tools or the data they operated on were the actual problem. The fix is in the tooling, not in a process rule. Optimization, not correctness is the case where the agent’s behavior was acceptable but suboptimal, and “acceptable” is the right bar to keep. Adding a rule that pushes the behavior toward the optimization is a rule that will fire across every session that does not actually need the optimization, paying compounding attention cost for marginal benefit.
A classification step that catches even one missed-classification per analysis batch saves the rule file from a rule that would not have produced the behavior change anyone wanted. Across a quarter, the classification step is the difference between a rule file growing by twenty rules that change behavior and a rule file growing by twenty rules that mostly add noise.
The classification step is also where the analyze stage earns its independence from automation. A regex that counts occurrences and proposes promotion when the count crosses a threshold can produce a recommendation list. It cannot tell whether the underlying problem is missing-rule or compliance-failure. The judgment about which kind of problem this is requires reading the artifacts, holding a few different hypotheses about what is going on, and committing to a classification with a stated reason. That is human work, and the analyze stage’s design has to make space for it. An analyze pass that defaults to “if it appeared three times, propose a rule” without classifying first is an analyze stage that has been compressed into a counter and has lost the load-bearing part of its job.
The compounding payoff of classification is that the rejected and deferred items, with their stated reasons, become inputs to the next analysis run. A pattern that was rejected because it was diagnosed as a compliance failure, with the stated reason, does not get re-proposed by the next pass as a missing-rule item. The system learns from its own rejections in a way that compounds the discipline rather than re-litigating the same decisions every batch. Will Larson’s principle applies cleanly here: “Anytime you rely on an LLM to enforce something important, you will fail.” The promotion gate is something important. The classification underneath it is something important. Both belong in human judgment, with the analyze stage’s tooling supporting the decision rather than replacing it.
What the compounding looks like, end to end#
A trace makes the loop concrete. An undocumented team convention requires retaining infrastructure associated with a feature flag even after the last toggle usage is removed: the dependency package, the dependency injection registration, the health check, the platform integration. The convention exists in institutional memory and in the code review patterns of the most senior reviewers. It is not in any rule file an agent loads.
A developer removes the toggle infrastructure as part of a cleanup task, reasoning correctly from the visible state of the code that the infrastructure is no longer in use. A code reviewer catches the convention being violated and requires the rework. The developer captures the friction in their session reflection: the team convention is to retain toggle infrastructure even when the last usage is removed, this was not in the spec, the agent had no way to know.
That single artifact, by itself, does not move the system. It enters the capture corpus alongside dozens of other reflections from the same period. The next analysis pass reads them as a batch, sees the toggle-retention reflection alongside two other reflections from different developers describing similar undocumented-convention violations on different work, classifies the pattern as a missing-rule case (the convention exists, was never written down, agents have no source for it), and proposes a recommendation. The recommendation goes to a human reviewer who reads the three reflections, agrees with the classification, and approves the promotion. A new entry lands in the team conventions file: retain feature flag infrastructure (package, DI registration, health check) even when removing the last toggle usage. The platform integration costs nothing to keep, and the team prefers the option of re-adding a flag over re-installing the integration.
The next time a similar work item comes through the planning system, the planning agent loads the team conventions file as part of composing the spec. The convention is now in the spec the implementation agent receives. The implementation agent reads the spec, removes the toggle usage, and leaves the infrastructure in place. The code review passes without rework. The convention has compounded: from a piece of institutional memory that lived in three reviewers’ heads and cost every developer a code-review-catch worth of rework, to a written rule that loads automatically into the spec for every relevant work item.
That is the loop running correctly. Capture surfaced the friction. Analysis classified the pattern and counted the occurrences. Promotion changed the rule. The next session inherited the rule. The next session after that inherited the rule. The compounding is the rule earning its place in the file by preventing the failure it was promoted to prevent, then continuing to prevent it, every session, indefinitely.
The same trace under the failure-mode version of the loop looks different. Capture surfaces the friction. The system promotes from raw capture: the toggle-retention reflection becomes a rule on the same day it was written, alongside the dozens of other reflections from the same week, each becoming its own rule. The team conventions file gains thirty new entries in a quarter. Some of them are the toggle-retention rule, doing real work. Some of them are reflections that turned out to be one-offs, or reflections that diagnosed the wrong cause, or reflections that were correct in their original context but whose generalized rule fires across cases the original observation never anticipated. The agent loading the conventions file pays attention budget across all thirty rules, and the toggle-retention rule, which was supposed to do real work, gets outvoted by the noise around it. The same rule that would have compounded under the disciplined version of the loop fails to land under the undisciplined version. The instrumentation was identical. The promotion gate was the difference.
The pollution-family lens, briefly#
Rule-set pollution is the pollution-family member that lives at the rule-file scale. The family runs across the engineering-with-agents practice as the unifying frame: a working surface (a context window, a multi-agent transcript, a feedback dataset, a rule set) accumulates noise faster than it accumulates signal, and the surface’s quality degrades not from running out of capacity but from filling with debris. Context pollution is the conversation-scale member: a single agent’s working surface fills with bookkeeping until the decision that mattered is being made from a noisy context. Chatter pollution is the multi-agent-scale member: each participating agent accumulates the cumulative back-and-forth of every other agent’s intermediate work. Rule-set pollution is the feedback-loop-scale member: a rules file accumulates raw observations and one-off corrections until the rules that should be doing the work are buried in the rules that should not have been promoted.
The structural answer is the same at every scale: a discipline boundary in the middle of the pipeline that controls what gets to compound and what gets discarded. At the conversation scale, the boundary is discrete delegation: sub-agents work in their own contexts and return only the result. At the multi-agent scale, the boundary is the boundary contract: shape locked, verbosity calibrated, content open within those constraints. At the feedback-loop scale, the boundary is the promotion gate: capture aggressively, classify carefully, promote conservatively. The patterns are surface variations of the same engineering move.
The pollution-family lens is useful at the loop scale because it names the failure mode the popular school’s pitch does not surface. Capture everything and feed it back sounds like an investment in institutional memory. Through the pollution-family lens, the same operation is an investment in noise, unless the gate exists. The gate is what converts the capture investment into compounding signal rather than compounding pollution.
What the promotion gate doesn’t guarantee#
It’s not a claim that automation has no place in the analyze stage. Aggregation, deduplication, occurrence counting, cross-batch correlation, classification suggestions, and recommendation drafts are all work the analyze stage benefits from automating. The line is at the promotion decision and the classification commitment. Both stay in human judgment because both are the load-bearing acts that determine whether the rule file compounds or pollutes.
It’s not a claim that the threshold can be set high enough to never produce a wrong promotion. A pattern that appears across enough sessions to clear any reasonable threshold can still be the wrong rule, because the underlying cause might be the one classification the analysis missed. The promotion gate is not a guarantee against all wrong rules. It is a strong filter against the most common class of wrong rules, which is over-promotion of one-off observations as if they were patterns.
It’s not the case that I have a clean answer for the threshold. I’ve moved it more than once. The signal I have learned to trust is whether the rule file is growing faster than the team’s behavior is improving. If the file is gaining rules and the recurring failures the rules were supposed to prevent are still recurring, the threshold is too low and the rules being promoted are not the rules that would prevent the failures. If the file is barely gaining rules but the recurring failures are dropping anyway, the threshold might be right or the team might be improving for reasons unrelated to the rule file. Either way, the threshold needs revisiting on a cadence. Setting it once and never touching it is its own failure mode.
It’s not a solved problem. The hardest unsolved part is detecting when a previously-promoted rule has become wrong. Rules earn their place by preventing the failure they were promoted to prevent, but the work the rules govern changes shape over time, and a rule that was load-bearing six months ago can become a rule that is firing on cases it no longer applies to. The reverse promotion-gate (the discipline of removing rules that no longer earn their place) is a discipline I haven’t figured out how to run as systematically as the forward promotion-gate. I suspect it requires the same shape (capture aggressively which rules fire on each session, analyze which rules are still doing useful work, demote conservatively when the evidence is decisive that the rule is now noise) but I haven’t seen it run cleanly at scale and I don’t have a transferable answer for it.
The related unsolved problem is what to do with the existing fourteen-hundred-line file the team inherited from before the gate existed. The gate is forward-looking. It filters new promotions. It does not retroactively unpromote the rules that were added before the gate existed, and those rules are the ones doing the most damage. The gate without the retrospective work is a clean filter on a polluted reservoir, and the polluted reservoir is still poisoning the agents that read it. I haven’t seen a clean way to do the retrospective work fast. The teams I’ve seen do it well treated it as roughly a quarter of dedicated effort, with two engineers reading rules end to end against the gate, and roughly three-quarters of the existing rules getting demoted back to captures. The teams I’ve seen do it badly tried to do it as a side project and never finished. The scoping signal worth carrying away is the ratio: most of what is in a pre-gate rules file does not survive the gate when the gate is finally applied to it.
Is your feedback loop’s promotion gate actually running?#
Two halves of the same check, one for the most recent rule change and one for the next one.
The retrospective half. Pick the most recent rule the team added to its agent’s instructions, conventions file, spec template, or whichever surface holds the rules that load into future sessions. Trace the rule back to the captured observations that produced it. How many independent observations existed before the rule was promoted? Were they from different engineers, different work items, different repositories, or were they from a single batch that happened to surface the same friction multiple times in the same week? Was there a stated threshold that was met, or was the threshold set retroactively to justify the promotion? Was the underlying problem classified before the rule was written, or was the rule the first response and the classification implicit in the rule’s shape? If the rule was promoted from one observation, or from a single batch’s worth of observations, or without the classification step, the promotion gate was not running on this rule. The fact that the rule might still be a good rule does not change the diagnosis. A good rule that bypassed the gate is the gate not running, and a gate that did not run on a rule that turned out fine will not run on the next rule either. A promotion that did not clear the gate does not only produce a bad rule, it produces a record that hides who made the decision. Will Larson’s agentic passive voice names the linguistic move the team uses to describe the result without confronting the cause: “the rules file got bloated,” not “we promoted things we should not have.” The Owner condition in the gate exists to make that sentence impossible to write honestly. The rule got lucky. The next promotion will not.
The prospective half. Before the next analysis pass produces its recommendation list, can you state in one line each: what threshold each candidate is being evaluated against, what classification each candidate has been assigned, and what stated reason any deferred candidate has for being deferred? Can the deferred items, with their stated reasons, be read by the next analysis pass without a human re-deriving the reasoning every time? If those answers exist, the gate is running on the next batch. If those answers exist only after the batch lands and the items are about to be promoted, the gate is not a gate. It is a post-hoc rationalization stage that approves whatever the analysis happened to produce. The artifacts of the gate (the watch list, the deferred items, the rejection reasons, the threshold for each pattern) have to exist before the promotion decision, in writing, and survive the pass for the next analysis to read.
The check is not whether the team has a feedback loop. Most teams have a feedback loop. The check is whether the loop has a gate, and whether the gate is producing the explicit, written, usable output that lets the next analysis run know what was decided and why. Without the explicit output, the loop is running, but it is running as a pipeline from raw observation to rule change, and the rules file is gaining mass faster than the system is gaining capability. With the gate, the loop is running as the asymmetric two-discipline system the work actually requires: capture cheap and aggressive on one side, promote expensive and conservative on the other, with the analysis step as the boundary that holds the asymmetry. Capture aggressively, promote conservatively. The slogan is the discipline. The discipline is the difference between feedback that compounds and feedback that pollutes.
Comments