Bayesian Thinking for People Who Hated Statistics

corridor, Tuesday morning. The professor uncaps a marker and writes throughout the whiteboard: P(A|B) = P(B|A) · P(A) / P(B). Your hand copies the formulation. Your mind checks out someplace across the vertical bar.

If that reminiscence simply surfaced, you’re in good firm. Analysis suggests up to 80% of college students expertise some type of statistics anxiousness. For a lot of, it’s the strongest predictor of their course grade (stronger than prior math means, according to a University of Kansas study).

Right here’s what most statistics programs by no means point out: you’ve been doing Bayesian reasoning since childhood. The formulation on the whiteboard wasn’t educating you one thing new. It was burying one thing you already understood below a pile of notation.

The Drawback That Broke 82% of Docs

Do this earlier than studying additional.

One p.c of ladies aged 40 who take part in routine screening have breast most cancers. A mammogram accurately identifies most cancers 80% of the time. It additionally produces a false alarm 9.6% of the time, flagging most cancers when none exists.

A girl will get a constructive mammogram. What’s the likelihood she truly has most cancers?

Take a second.

In 1978, researchers at Harvard Medical Faculty posed a similar base-rate problem to 60 physicians and medical college students. Solely 18% arrived on the right reply. Almost half guessed 95%.

The precise reply for the mammogram downside: 7.8%.

The trick is to depend as a substitute of calculate. Take 10,000 girls:

100 have most cancers (that’s 1%)
Of these 100, 80 take a look at constructive (80% sensitivity)
Of the 9,900 cancer-free girls, about 950 get a false constructive (9.6%)

Whole constructive mammograms: 80 + 950 = 1,030.

Ladies who even have most cancers among the many positives: 80.

Likelihood: 80 ÷ 1,030 = 7.8%.

The false positives from the huge wholesome group swamp the true positives from the small most cancers group. Picture by the writer.

No Greek letters required. Simply counting.

In Python, it’s 4 strains:

prior = 0.01           # 1% base price
sensitivity = 0.80     # P(constructive | most cancers)
false_pos = 0.096      # P(constructive | no most cancers)

posterior = (sensitivity * prior) / (
    sensitivity * prior + false_pos * (1 - prior)
)
print(f"{posterior:.1%}")  # 7.8%

German psychologist Gerd Gigerenzer spent a long time finding out this actual failure. When he and Ulrich Hoffrage rewrote probability problems using natural frequencies (counting actual folks as a substitute of juggling percentages), right responses amongst naive contributors jumped from the only digits to almost 50%. Identical math, totally different illustration. The bottleneck was by no means intelligence. It was the format.

You’ve Been Bayesian Your Complete Life

You do that calculation unconsciously each day.

Your pal recommends a restaurant. “Greatest pad thai within the metropolis,” she says. You open Google Maps: 4.2 stars, 1,200 evaluations. Your prior (she is aware of Thai meals, she’s been proper earlier than) meets the proof (strong however not stellar evaluations from strangers). Your up to date perception: most likely good, price attempting. You go.

That’s Bayes’ theorem in three seconds. Prior perception + new proof = up to date perception.

A noise at 3 AM. Your prior: the cat knocked one thing over (this occurs twice every week). The proof: it appears like glass shattering, not a delicate thud. Your posterior shifts. You rise up to test. For those who discover the cat standing subsequent to a damaged vase, whiskers twitching, your perception updates once more. Prior confirmed. Again to sleep.

You test the climate app: 40% probability of rain. You look exterior at a blue sky with no clouds on the horizon. Your inner mannequin disagrees with the app. You seize a light-weight jacket however depart the umbrella.

You get an e mail out of your CEO asking you to purchase reward playing cards. Your prior: she has by no means made a request like this earlier than. The proof: the e-mail got here from a Gmail deal with, the grammar feels off, the tone is flawed. Your posterior: virtually actually phishing. You don’t click on.

None of those really feel like statistics. They really feel like widespread sense. That’s the purpose.

The formulation on the whiteboard was simply notation for what your mind does between sensing an issue and making a call.

The perceived hole between “statistics” and “widespread sense” is an artifact of how statistics is taught. Begin with the formulation, and also you get confusion. Begin with the instinct, and the formulation writes itself.

Why Your Statistics Course Bought It Backwards

This isn’t a fringe critique. The statistics institution itself has began saying it out loud.

In 2016, the American Statistical Association (ASA) released its first formal guidance on a selected statistical methodology in 177 years of existence. The goal: p-value misuse. Among the many six rules: p-values don’t measure the likelihood {that a} speculation is true, and the 0.05 significance threshold is “typical and arbitrary.”

Three years later, 854 scientists signed a Nature commentary titled “Scientists Rise Up Towards Statistical Significance.” The identical problem of The American Statistician carried 43 papers on what comes after p < 0.05.

The core structural downside, as biostatistician Frank Harrell at Vanderbilt describes it: frequentist statistics asks “how unusual are my information, assuming nothing attention-grabbing is going on?” That’s P(information | speculation). What you truly need is: “given this information, how seemingly is my speculation?” That’s P(speculation | information).

These usually are not the identical query. Complicated them is what mathematician Aubrey Clayton calls “Bernoulli’s Fallacy,” an error he traces to a selected mistake by Jacob Bernoulli within the 18th century that has been baked into curricula ever since.

How deep does this confusion go? A 2022 study discovered that 73% of statistics methodology instructors (not college students, instructors) endorsed the most typical misinterpretation of p-values, treating them as P(speculation | information).

“P-values situation on what’s unknown and don’t situation on what is understood. They’re backward possibilities.”

Frank Harrell, Vanderbilt College

The downstream end result: a replication disaster. The Reproducibility Project tried to copy 100 printed psychology research. Roughly 60% failed. Replicated results had been, on common, half the initially reported dimension. P-hacking (adjusting evaluation till p < 0.05 seems) was recognized as a main driver.

Bayes in 5 Minutes, No Formulation

Each Bayesian calculation has precisely three elements.

The Prior. What you believed earlier than seeing any proof. Within the mammogram downside, it’s the 1% base price. Within the restaurant determination, it’s your pal’s observe file. Priors aren’t guesses; they’ll incorporate a long time of information. They’re your beginning place.

The Probability. How possible is the proof you noticed, below every doable state of actuality? If most cancers is current, how seemingly is a constructive take a look at? (80%.) If absent, how seemingly? (9.6%.) The ratio of those two numbers (80 ÷ 9.6 ≈ 8.3) is the probability ratio. It measures the diagnostic energy of the proof: how a lot ought to this proof transfer your perception?

The Posterior. Your up to date perception after combining prior with proof. That is what you care about. Within the mammogram case: 7.8%.

That’s the entire framework. Prior × Probability = Posterior (after normalizing). The formulation P(A|B) = P(B|A) · P(A) / P(B) is shorthand for “replace what you believed based mostly on what you simply discovered.”

One crucial rule: a powerful prior wants robust proof to maneuver. For those who’re 95% certain your deployment is steady and a single noisy alert fires, your posterior barely budges. But when three unbiased monitoring techniques all flag the identical service at 3 AM, the proof overwhelms the prior. Your perception shifts quick. This is the reason patterns matter greater than single information factors, and why accumulating proof is extra highly effective than any single take a look at.

The PRIOR Framework: Bayesian Reasoning at Work

Right here’s a five-step course of you possibly can apply at your desk on Monday morning. No statistical software program required.

P: Pin Your Prior

Earlier than taking a look at any information, write down what you imagine and why. Power a quantity. “I believe there’s a 60% probability the conversion drop is brought on by the brand new checkout circulation.” This prevents anchoring to regardless of the information exhibits first.

Labored instance: Your group’s A/B take a look at stories a 12% raise in sign-ups. Earlier than deciphering, ask: what was your prior? If 9 out of ten comparable experiments at your organization produced lifts below 5%, a 12% end result deserves scrutiny, not celebration. Your prior says massive results are uncommon right here.

R: Charge the Proof

Ask two questions:

If my perception is right, how seemingly is that this proof?
If my perception is flawed, how seemingly is that this proof?

The ratio issues greater than both quantity alone. A ratio close to 1 means the proof is equally according to each explanations (it’s weak, barely price updating on). A ratio of 8:1 or larger means the proof strongly favors one aspect. Transfer your perception accordingly.

I: Invert the Query

Earlier than concluding something, test: am I answering the query I care about? “What’s the likelihood of seeing this information if my speculation had been true” shouldn’t be “what’s the likelihood my speculation is true given this information.” The primary is a p-value. The second is what you need. Complicated them is the only commonest statistical error in printed analysis.

O: Output Your Up to date Perception

Mix prior and proof. Robust proof with a excessive probability ratio shifts your perception considerably. Ambiguous proof barely touches it. State the end result explicitly: “I now estimate a 35% probability this impact is actual, down from 60%.”

You don’t want actual numbers. Even tough classes (unlikely, believable, possible, near-certain) beat binary considering (vital vs. not vital).

R: Rinse and Repeat

Your posterior as we speak turns into tomorrow’s prior. Run a follow-up experiment. Verify a unique information reduce. Every bit of proof refines the image. The self-discipline: by no means throw away your amassed data and begin from scratch with each new dataset.

From Spam Filters to Sunken Submarines

Bayesian reasoning isn’t only a considering device. It runs in manufacturing techniques processing billions of choices.

Spam filtering. In August 2002, Paul Graham printed “A Plan for Spam,” introducing Bayesian classification for e mail. The system assigned every phrase a likelihood of showing in spam versus reliable mail (the probability), mixed it with the bottom price of spam (the prior), and computed a posterior for every message. Graham’s filter caught spam at a 99.5% price with zero false positives on his private corpus. Each main e mail supplier now makes use of some descendant of this strategy.

Hyperparameter tuning. Bayesian optimization has changed grid search at firms operating costly coaching jobs. As a substitute of exhaustively testing each setting mixture, it builds a probabilistic mannequin of which configurations will carry out effectively (the prior), evaluates essentially the most promising candidate, observes the end result, and updates (posterior). Every iteration makes a better alternative. For a mannequin that takes hours to coach, this may reduce tuning time from weeks to days.

Uncertainty quantification. Probabilistic programming frameworks like PyMC and Stan construct fashions that output full likelihood distributions as a substitute of single numbers. Somewhat than “the coefficient is 0.42,” you get “the coefficient falls between 0.35 and 0.49 with 95% likelihood.” This can be a Bayesian credible interval. In contrast to a frequentist confidence interval, it truly means what most individuals suppose a confidence interval means: there’s a 95% probability the true worth is in that vary.

However essentially the most dramatic Bayesian success story includes a nuclear submarine on the backside of the Atlantic.

In Might 1968, the USS Scorpion didn’t arrive at its dwelling port in Norfolk, Virginia. Ninety-nine males aboard. The Navy knew the sub was someplace within the Atlantic, however the search space spanned 1000’s of sq. miles of deep ocean flooring.

Mathematician John Craven took a different approach than grid-searching the ocean. He assembled consultants and had them assign possibilities to 9 failure situations (hull implosion, torpedo malfunction, navigation error). He divided the search space into grid squares and assigned every a previous likelihood based mostly on the mixed estimates.

Then the search started. Each time a group cleared a grid sq. and located nothing, Craven up to date the posteriors. Empty sq. 47? Likelihood mass shifted to the remaining squares. Every failed search was not a wasted effort. It was proof, systematically narrowing the probabilities.

Each grid sq. that turned up empty wasn’t a failure. It was information.

The tactic pinpointed the Scorpion inside 220 yards of the anticipated location, on the ocean flooring at 10,000 toes. The identical Bayesian search method later situated a hydrogen bomb misplaced after a 1966 B-52 crash close to Palomares, Spain, and helped discover the wreckage of Air France Flight 447 within the deep Atlantic in 2011.

Return to the mammogram downside for a second.

The explanation 82% of medical doctors acquired it flawed wasn’t arithmetic. It was that no person taught them to ask the one query that issues: how widespread is that this situation within the inhabitants being examined?

That query (the prior) is essentially the most uncared for step in information interpretation. Skip it, and also you mistake a false alarm for a prognosis, a loud experiment for an actual impact, a coincidence for a sample.

Each statistic you encounter this week is a mammogram end result. The headline claiming a drug “doubles your danger.” The A/B take a look at with p = 0.03. The efficiency evaluate based mostly on a single quarter of information.

Each is proof. None is a conclusion.

The conclusion requires what you’ve all the time had: what you knew earlier than you noticed the quantity. Your statistics professor simply by no means gave you permission to make use of it.

References

Casscells, W., Schoenberger, A., & Graboy, T.B. (1978). “Interpretation by Physicians of Clinical Laboratory Results.” New England Journal of Drugs, 299(18), 999-1001.
Gigerenzer, G. & Hoffrage, U. (1995). “How to Improve Bayesian Reasoning Without Instruction: Frequency Formats.” Psychological Evaluate, 102, 684-704.
American Statistical Affiliation (2016). “The ASA Statement on Statistical Significance and P-Values.” The American Statistician, 70(2), 129-133.
Amrhein, V., Greenland, S., & McShane, B. (2019). “Scientists Rise Up Against Statistical Significance.” Nature, 567, 305-307.
Open Science Collaboration (2015). “Estimating the Reproducibility of Psychological Science.” Science, 349(6251), aac4716.
Graham, P. (2002). “A Plan for Spam.”
Harrell, F. (2017). “My Journey from Frequentist to Bayesian Statistics.” Statistical Pondering.
Clayton, A. (2021). Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia College Press.
Badenes-Ribera, L., et al. (2022). “Persistent Misconceptions About P-Values Among Academic Psychologists.” PMC.
Kalid Azad. “An Intuitive (and Short) Explanation of Bayes’ Theorem.” BetterExplained.
Wikipedia contributors. “Bayesian Search Theory.” Wikipedia.

Source link

Bayesian Thinking for People Who Hated Statistics

Escaping the Valley of Choice in BI

Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

How to Combine Claude Code and Codex for Maximum Coding Power

It’s the Lessons We Learned Along the Way. Or, Is It?

Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt

GM reimagines Hummer off-roader with California ideas unit

London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

How to Edit, Merge, and Split PDFs With Free Online Tools

Featured Picks

Meepo Rover Review: Affordable All-Terrain Electric Mountainboard

FapAI Chatbot Review: Key Features & Pricing

‘Big Balls’ Is Now at the Social Security Administration

Bayesian Thinking for People Who Hated Statistics

The Drawback That Broke 82% of Docs

You’ve Been Bayesian Your Complete Life

Why Your Statistics Course Bought It Backwards

Bayes in 5 Minutes, No Formulation

The PRIOR Framework: Bayesian Reasoning at Work

P: Pin Your Prior

R: Charge the Proof

I: Invert the Query

O: Output Your Up to date Perception

R: Rinse and Repeat

From Spam Filters to Sunken Submarines

References

Related Posts