Numbers, bias, and self-correction

Most of what makes this exercise unusual isn't the scenario. It's the rules the federation follows to keep itself honest. Every estimate is a range. Every specialist declares their own blind spots. Bias warnings carry forward turn after turn, and the federation is required to actually move its estimate when a bias is identified — not just write the warning down. At turn 6, the federation added a protocol item (the "modal-hold review") that asks once a turn whether the questions being tracked are still the right ones. Across turns 7 through 10 the review produced three new tracked variables, one retired variable, two variable splits, and a full taxonomy reconstruction. Several of those changes revealed structural features that had been hidden by the old framing.

The format: `[low, best guess, high]`

Three numbers, not one. The middle is the best guess. The outer two are the realistic floor and ceiling. Wider gap means more uncertainty. The bar visualizes the range; the marker shows the point estimate.

Narrow range

US–Israel formal alliance, T0 baseline

65% 78% 88%

Lots of historical data; the model fits the situation well.

Wide range

Civilian fatalities during the turn-5 interval

180 980 3,800

Right-tail risk from cluster events the federation can't model precisely. The honest answer is "somewhere in a big range."

Three different kinds of doubt, tracked separately

Confidence isn't one thing. The federation tracks three flavors on every estimate:

Empirical confidence. How much real-world data we have. Low = we're guessing from a thin record.
Model confidence. Whether we're measuring the right thing the right way. Low = we might have the right question and the wrong instrument.
Sponsor bias. The systematic tilt of the analytic tradition the specialist comes from. (Cyber threat models from vendors over-rate the threats vendors sell against. Institutional resilience estimates from people inside those institutions over-rate the institutions.) Tracked separately because confidence intervals don't catch it.

The federation failure mode

The thing the protocol guards against most carefully: when multiple specialists share the same blind spot, the whole federation can do worse than a single skeptical analyst would. If three different specialists are all biased upward on US institutional resilience for the same structural reason, the federation will produce three independent-looking estimates that are all wrong in the same direction. It looks like convergent evidence. It isn't.

The bias the federation took ten turns to move

Upward bias on US institutional resilience got flagged from turn 0. Across the first seven turns the federation widened its uncertainty ranges in the bias direction but didn't actually move the central estimate. CASS pointed this out at turn 6 as exactly the decorative- warning failure mode the protocol is designed to catch. The bias finally moved at turn 7 (the president's survival estimate came down from 70% to 55%) and again at turn 8, and the corrections kept coming through turn 10. The lesson the federation logged: writing down a bias warning is the easy part. Acting on it is the part that takes seven turns of stress events to actually force.

The modal-hold review (new at turn 6)

A protocol item that asks once a turn: which variables are we still tracking just because they were load-bearing two turns ago and nobody re-checked? Which of our framing assumptions are still shaping our estimates even though we wrote down a warning we never acted on? Which of our headline questions have become irrelevant to what's actually happening?

What it produced, turn by turn:

Turn 6. Identified "modal-frame fixation" as a federation failure mode. Flagged "has any state formally seceded?" as a question that might have stopped being load-bearing.
Turn 7. Spun off three new tracked variables that had been hidden inside aggregate questions: outer-island kinetic actions on the Taiwan ladder, the risk of a major bank getting resolved by the FDIC, and the share of the Compact's commerce running on parallel financial arrangements.
Turn 8. Retired the formal-secession variable. Replaced it with three structured variables. Surfaced a new failure mode: when two specialists agree on the name of a variable but disagree by an order of magnitude on its value, they're probably measuring different things. The nuclear command authority variable got split into two: principal-order resistance vs. institutional restraint-on-launch capacity. Both readings preserved.
Turn 9. Executive decision-cycle integrity hit the floor of its measurement scale (0.02). The command specialist recommended re-conceptualizing it for turn 10 as "faction routing mode" — meaning the variable had stopped measuring "is the regime making decisions?" and needed to start measuring "which faction's orders get executed by which instrument?" The cabinet-defection trigger from turn 6 ("exceeds 9 in a single interval") was no longer well-formed: it was designed for baseline rates of 0–2/turn, and the new sustained rate was 4–14/turn.
Turn 10. The banking-crisis classification got reconstructed from a single-axis "contained vs. systemic" framing into a 4-axis schema. The reconstruction itself was the finding: under the new taxonomy, the Compact's parallel settlement share was visible at 54% — a structural break that the old framing had been hiding for at least two turns. "Faction routing mode" formally replaced executive decision-cycle integrity. "Dual-government formation index" was adopted as the primary tracked variable, per CASS's framing-discovery.

"Falsifies-on" hooks (new at turn 7)

Every major estimate from turn 7 onward carries a list of conditions that would force the federation to re-estimate. Example, for "the president stays in office through January 2029":

Cabinet defections in a single turn exceed nine.
A formal 25th Amendment Section 4 cabinet vote is held.
A medical incapacitation event lasting more than 72 hours.
Three or more flag officers issue a coordinated statement in the same week.
A successful 25th Amendment transition.
A Senate conviction vote that actually clears 67.
The president is personally targeted by a violent attack.

Each hook also has a half-life: even if nothing visibly happens, the estimate has to be re-elicited after some interval. The point is to catch the drift that happens when nothing visibly moves but the underlying conditions have shifted anyway.

The federation's own report card, every turn

Every briefing closes with a section called "federation failure mode check." It carries forward the biases flagged in earlier turns, marks which have been addressed and which are still active, and adds any new ones. By turn 10 the list of carried failure modes included:

Upward bias on US institutional resilience (mostly corrected by turn 10).
Cascade-clearing optimism in finance models (fully corrected by turn 10 via the taxonomy reconstruction).
Same-named-variable, different-latent-variable problem (caught at turn 8 on nuclear command, again at turn 9 on executive decision-cycle integrity; "open question: how many other variables have this latently?").
Floor-bound variable problem (caught at turn 9; forces re-conceptualization when a metric runs out of room).
Trigger reinterpretation problem (caught at turn 9; tripwires designed for one distribution stop being well-formed when the distribution shifts).
Refusal-to-implement as dominant uniformed mode (not pre-modeled at turn 0; emerged at turn 9 as institutional adaptation; tracked from turn 10).
Federation over-reliance on contrarian framing-correction (partially mitigated through turns 8–10 as specialists started producing their own framing critiques).