When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

Throughout 64 English authorities and 6 2026 situations, even the strongest situation shock was solely 13% of the median uncertainty band.

In plain English: the mannequin’s assumptions moved the consequence lower than historic forecast error did. Essentially the most aggressive challenger surge I might parameterise sits contained in the noise the mannequin has produced in previous elections. That’s not a defect. It’s the consequence.

I constructed this situation mannequin anticipating clear separation between assumptions. I anticipated S3, the challenger surge, to dominate. I anticipated rankings I might defend. What I received was an envelope the place the strongest shock sits inside calibrated uncertainty, and the place rankings dissolve when intervals are plotted on high of them.

That is the second instalment of a mission on English native electoral information. Part 1 corrected a categorical-normalisation bug that reversed the unique headline. Half 2 picks up the place the corrected baseline ends and asks a unique query: given the historic churn we now measure accurately, what 2026 situations are price modelling, and the way ought to we learn them when uncertainty is wider than the shocks?

What was modelled

The 2026 English native elections are scheduled for Thursday 7 Might 2026. This mission covers 64 energetic authorities holding elections that day: 32 London boroughs, 27 metropolitan boroughs, and 5 West Yorkshire authorities. Six situations apply completely different assumptions to the identical historic baseline. 4 metrics are computed for every situation × authority mixture: volatility_score, delta_fi, swing_concentration, and turnout_delta. The mannequin produces 1,536 output rows, every with some extent estimate plus calibrated P10, P50, and P90 values from 2,000 attracts of the empirical error distribution.

State of affairs	Query	Essential assumption
S0	What if no new swing is utilized?	Historic uncertainty solely
S1	What if 2018-2022 challenger patterns proceed?	Continuation of current challenger churn
S2	What if main events partially recuperate?	Institution recovers half misplaced share
S3	What if challengers surge more durable?	Stress check: +4pp challenger surge
S4	What if deprivation-linked turnout rises?	+3pp turnout in IMD deciles 1-3
S5	What if London volatility is capped by historical past?	London P90 upper-tail cap

Every situation is a managed perturbation. Labels describe assumptions, not outcomes. The complete interactive dashboard is on Tableau Public.

Two definitions to hold by means of the remainder of the article: situation shock is the motion within the situation level estimate relative to the baseline. Uncertainty width is the P10-to-P90 interval calibrated from historic forecast error. The 13% headline is the primary divided by the second.

Technique: backtest errors because the empirical uncertainty distribution

Backtest errors will not be only a scorecard. They will turn out to be the empirical uncertainty distribution for future situation evaluation.

The usual use of a backtest is cross/fail. Did the predictions match held-out actuality? That solutions whether or not the mannequin labored, nevertheless it leaves the residuals on the ground.

A second use treats these residuals as a distribution. How unsuitable has the mannequin been throughout boroughs and cycles, in what route, with what unfold? The reply turns into the empirical pattern from which future uncertainty bands are drawn. Predictive bands cease being parametric assumptions about how errors ought to behave. They’re bootstrapped from how errors even have behaved.

This mannequin makes use of backtests within the second sense. Tier-level mean-centered historic error swimming pools from the 2014→2018 coaching window and the 2018→2022 backtest kind the bootstrap pool from which 2026 uncertainty bands are sampled. In sensible phrases: the mannequin is asking how a lot motion would depend as genuinely uncommon relative to the noise it has produced earlier than.

Two design decisions form the calibration.

Errors are pooled on the tier degree, not on the borough degree. Every borough has 1-2 prior observations, which is just too noisy to characterise a residual distribution. Pooling on the tier degree (London, Metropolitan, West Yorkshire) retains a pattern giant sufficient to be informative whereas preserving the structural distinction between geographies which have traditionally behaved in a different way.

Errors are mean-centered earlier than sampling. This separates historic bias from uncertainty dispersion. With out centering, S0’s P50 would drift away from zero due to historic imply error, mixing the mannequin’s monitor report of being barely off into the median of the band. After centering, the band represents dispersion across the situation assumption fairly than dispersion across the mannequin’s bias.

One nuance price flagging: mean-centering removes common historic bias however doesn’t power the bootstrap median to equal the purpose estimate. When residual swimming pools are skewed or bounded (swing_concentration has a decrease certain of 1.0), the P50 can nonetheless sit barely off the belief. Reporting P10/P50/P90 individually, fairly than imply ± commonplace deviation, retains that asymmetry seen.

The two,000 attracts produce steady percentile estimates whereas maintaining the total output beneath 10,000 rows for clear Tableau ingestion.

Knowledge science takeaway: Backtest errors will not be only a scorecard. They will turn out to be the empirical uncertainty distribution for future situation evaluation, calibrating bands that replicate how the mannequin has really been unsuitable.

The consequence: shocks smaller than uncertainty

Three numbers carry the discovering:

S3 challenger surge: 13% of the median volatility interval.
S1 volatility continuation: 6%.
S2 institution restoration: 5%.

Every quantity is the situation shock divided by the median P10-to-P90 band width throughout the 64 energetic authorities. The strongest shock, a +4pp challenger surge, strikes the central estimate by about one-eighth of the historic noise the mannequin has produced in previous cycles.

The consequence I least anticipated is a very powerful one: the situations are much less separated than the uncertainty bands. If this had been a forecast dashboard, that might be disappointing. For a situation evaluation, it’s the level.

Determine 1: IntervalBands. Filter context: State of affairs = S3; Kind = Uncertainty band width; Metric locked to volatility_score. Every row is one authority. Bar = P10-P90 band. White dot = P50. The inset studies every situation shock as a proportion of the median band width.

The way to learn the chart: every horizontal bar is one authority’s calibrated uncertainty interval. The white dot inside it’s the calibrated median. The bar’s color is geographic, not analytical (teal = London, amber = Metropolitan, slate = West Yorkshire). The amber rings displaying every situation’s level estimate are seen on the rankings panel (Determine 2b); in Determine 1 they’re summarised within the inset percentages.

Throughout 64 authorities and the three energetic situations, the purpose estimate almost all the time sits contained in the bar. The shock perturbs the mannequin lower than the mannequin has traditionally perturbed itself.

Half 1 reported that the correlation between turnout change and volatility was statistically null (r = -0.12, p = 0.35). Half 2 finds that situation shocks are equally smaller than the uncertainty round them. The sample is identical: when the magnitude of an impact is similar to or smaller than the noise, rating the results creates false precision. Impact-vs-uncertainty determines whether or not a consequence ought to be interpreted as sign or context.

The dashboard doesn’t say “S3 wins.” It says S3 strikes the envelope most whereas nonetheless sitting inside broad empirical uncertainty. “Wins” implies the mannequin has chosen between situations. It has not. One situation perturbs the central estimate barely greater than the others; the band round all three stays extensive sufficient to soak up the distinction.

Knowledge science takeaway: At all times examine impact measurement to uncertainty width. A situation shock that appears giant in isolation could also be small relative to historic error.

Studying the dashboard: geography and rankings

Two views translate the headline into geographic and ranked context.

The map exhibits uncertainty footprint for one situation at a time. Color encodes P50 beneath the chosen situation; measurement encodes interval width. The widest bands will not be completely in London. Metropolitan boroughs within the North East, North West, and West Yorkshire present interval widths similar to the densest London cluster.

The rankings view is the place the effect-vs-uncertainty comparability turns into hardest to disregard. Every row exhibits three marks: the bar (P10-P90), the white dot (P50), and the amber ring (situation level estimate). The amber ring almost all the time sits contained in the bar, which suggests the situation shock is smaller than the historic uncertainty even for the authorities ranked on the high.

**Determine 2b: Rankings.** *Filter context: State of affairs = S3; Metric = Volatility rating; Kind = Uncertainty band width.* High-15 authorities. Switching the type to P50 or situation shock reorders the rating, and the rings nonetheless sit contained in the bars.

Rankings of unsure estimates want their intervals proven alongside them. A ranked listing with out uncertainty invitations false precision: the reader sees Authority A above Authority B and assumes the mannequin is assured concerning the order. When the bands overlap, as they do at each degree of those rankings, that confidence is unwarranted.

Two uneven situations, two design classes

Two of the six situations behave in a different way from the remainder. S4 and S5 don’t run on the identical vote-share-perturbation logic as S1, S2, and S3, and the distinction makes them helpful design demonstrations past the election context.

S4 lesson: isolate one mechanism at a time.

S4 assessments a speculation from UK turnout literature: that elections in additional disadvantaged authorities can present turnout shifts when native salience adjustments. It applies a +3 proportion level turnout shock to authorities falling in IMD deciles 1-3 beneath the LAD-level Index of A number of Deprivation (IMD 2019) overlay. 41 of the 64 energetic authorities obtain the shock; 23 don’t. The tier break up: 13 of 32 London boroughs, 23 of 27 metropolitan boroughs, all 5 West Yorkshire authorities. Inside this situation scope, the shock concentrates amongst Metropolitan and West Yorkshire authorities greater than amongst London boroughs.

**Determine 3: Caveats.** *Filter context: No user-selectable parameters. Each panels present pre-locked situation logic.* High: S4 tier break up. Backside: S5 cap. Most London S5 P90 = 16.7. Cap = 39.45. Binding occasions = 0.

Vote-share metrics (fragmentation, volatility, swing focus) are copied from S0 unchanged beneath S4. The situation is turnout-only by development.

That development is the design lesson. By maintaining S4 to a single perturbation channel, the belief is falsifiable by itself phrases. If noticed 2026 turnout shifts in IMD-1-to-3 authorities will not be within the +3pp vary, the belief fails with out dragging the vote-share story with it. A situation that perturbs three mechanisms concurrently is more durable to be taught from when actuality disagrees with it. You can’t inform which assumption broke.

S5 lesson: log guardrails even when they don’t bind.

S5 caps the higher tail of London volatility_score at 39.45. The cap is the empirical ninetieth percentile of historic London borough volatility throughout the coaching and backtest home windows: 64 London borough observations (32 from coaching, 32 from backtest, Metropolis of London excluded as a result of it sits outdoors the 32-borough London electoral scope). The cap is one-sided, applies solely to London, and constrains the P90 solely.

Within the frozen run, the utmost London S5 P90 is 16.70. That’s 42% of the cap, with 22.75 models of headroom. The cap binds zero instances.

S5 is a guardrail, not an adjustment. It will have constrained the higher tail of London volatility if any borough had exceeded historic ranges. None did. The worth lies in being logged. A stress check that doesn’t bind remains to be helpful provenance: it exhibits the analyst thought-about the failure mode, parameterised the constraint from information, and reported that the constraint was inactive. Eradicating the cap from the documentation as a result of it didn’t hearth would erase the analytical choice that was made.

Reproducibility and limitations

The mannequin is frozen, seeded, hashed, and reproducible from the repository. Re-running src/civic_lens/scenario_model.py in opposition to the locked commit reproduces the output bit-for-bit.

**Determine 4: Provenance.** *Filter context: No user-selectable parameters; all values are model-lock outputs from the frozen run.* Frozen 2026-05-01 00:13:56 UTC. Mannequin SHA b795a07. Output hash sha256:522fd6bdc5f3… 0 validation failures, 0 ordering violations, 0 small-pool occasions. RNG seed 20260430. 2,000 attracts per situation × authority × metric.

One recognized limitation is documented on the dashboard alongside the consequence. The coaching window predates Reform UK’s 2025-2026 enlargement, so right-wing challenger volatility could also be understated beneath a speculation the place Reform behaves in a different way from prior rebel events at scale.

All underlying information is overtly licensed: election outcomes from the DCLEAPIL v1.0 dataset (Leman 2025, CC BY-SA 4.0); turnout and 2022 cross-checks from the Commons Library local elections dataset (Open Parliament Licence v3.0); deprivation and geography from ONS / MHCLG (OGL v3). The pipeline code within the Civic Lens repository is MIT-licensed; derived information are printed with supply attribution and stay topic to upstream licences.

Knowledge science takeaway: A mannequin is extra reliable when its outputs are frozen, hashed, and reproducible. Provenance is a part of the evaluation. Limitations ought to be seen on the identical display screen because the headline quantity.

What situation evaluation teaches us

The transferable ability will not be election modelling. It’s constructing situation methods the place assumptions are seen, uncertainty is calibrated in opposition to historic error, and impact sizes are reported alongside the noise that surrounds them. The identical sample exhibits up in demand forecasts beneath price-change situations, public well being coverage stress assessments, and threat fashions the place regulator-imposed shocks are smaller than realised market volatility. Rank situations with out displaying the uncertainty round them and also you produce false precision. That’s the lure.

The mannequin doesn’t say what’s going to occur in Might 2026. It says what could be shocking relative to calibrated uncertainty. Three issues to observe on outcomes night time and the times after:

Whether or not challenger surges exceed the S3 envelope. If realised volatility in challenger-active boroughs exceeds the S3 P90 bands proven on the dashboard, the calibrated band has been breached and the mannequin wants retraining. That is the most definitely place for the mannequin to interrupt, as a result of Reform UK’s post-2024 trajectory is unprecedented within the coaching window.
Whether or not London volatility breaches the historic upper-tail cap. The S5 cap of 39.45 is the empirical ninetieth percentile throughout 64 historic London observations. A single 2026 borough exceeding it might clear the historic upper-tail threshold. Two or extra could be a significant break with the historic distribution.
Whether or not deprivation-linked turnout shifts materialise within the route S4 assumes. A clear check of 1 remoted mechanism, with vote-share metrics held fixed. If turnout in IMD-1-to-3 authorities doesn’t transfer within the +3pp vary, the S4 speculation fails by itself phrases.

What occurs after Might 7

The mannequin is already frozen. The hashes, RNG seed, and code commit proven on the provenance dashboard can’t change between now and election night time. Regardless of the calibrated bands say in the present day is what they’ll say when realised outcomes land.

Half 3 of this sequence might be a public accuracy audit. Frozen situation outputs might be examined in opposition to precise 2026 borough-level outcomes. Protection charges (did P10-P90 comprise the realised worth?), imply absolute error, rating high quality, and any systematic misses will all be reported, together with the failures. The methodology caveat about Reform UK is the most definitely failure mode; we are going to see whether or not the bands held.

That’s what the freeze permits. The “three issues to observe” above will not be rhetorical. They’re the falsification standards for an uncertainty mannequin printed earlier than its information existed.

Essentially the most sincere consequence will not be a prediction. It’s a warning about precision. The situations transfer the envelope, however historic uncertainty remains to be wider than the shocks.

For information scientists, that could be the principle lesson: situation evaluation is most helpful when it resists changing into a forecast.

The complete interactive dashboard is printed on Tableau Public. The pipeline, situation mannequin code, calculated fields, and Tableau construct information are open-source at github.com/Wisabi-Analytics/civic-lens.

Obinna Iheanachor is a Senior AI/Knowledge Engineer and founding father of Wisabi Analytics, a UK-based information engineering and AI consultancy. He creates content material round manufacturing AI methods, information pipelines, and utilized analytics at @DataSenseiObi on X and Wisabi Analytics on YouTube. Civic Lens is an open-source political information mission at github.com/Wisabi-Analytics/civic-lens.

Source link

When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

Why I Don’t Trust LLMs to Decide When the Weather Changed

U.S. Officials Want Early Access to Advanced AI, and the Big Companies Have Agreed

Surviving High Uncertainty in Logistics with MARL

How to Make Claude Code Validate its own Work

Discrete Time-To-Event Modeling – Predicting When Something Will Happen

RAG Hallucinates — I Built a Self-Healing Layer That Fixes It in Real Time

Swiss startup Moonlight AI raises €2.8 million to turn routine blood and cytology imaging into genomic insights

Anthropic Gets in Bed With SpaceX as the AI Race Turns Weird

‘Legally Blonde’ Prequel ‘Elle’ Gets New Pinkified Teaser Trailer

When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

Featured Picks

Study Links Time Poverty to Dementia Risk, Urges Policy Changes

Breaking EV Charging Bottleneck With Precision Cooling

Maingear’s Opening Salvo for PC Memory Shortage: Supply Your Own

When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

What was modelled

Technique: backtest errors because the empirical uncertainty distribution

The consequence: shocks smaller than uncertainty

Studying the dashboard: geography and rankings

Two uneven situations, two design classes

Reproducibility and limitations

What situation evaluation teaches us

What occurs after Might 7

Related Posts