Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Scandi-style tiny house combines smart storage and simple layout
    • Our Favorite Apple Watch Has Never Been Less Expensive
    • Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)
    • Today’s NYT Strands Hints, Answer and Help for April 20 #778
    • KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.
    • OneOdio Focus A1 Pro review
    • The 11 Best Fans to Buy Before It Gets Hot Again (2026)
    • A look at Dylan Patel’s SemiAnalysis, an AI newsletter and research firm that expects $100M+ in 2026 revenue from subscriptions and AI supply chain research (Abram Brown/The Information)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Lie with Statistics with your Robot Best Friend
    Artificial Intelligence

    How to Lie with Statistics with your Robot Best Friend

    Editor Times FeaturedBy Editor Times FeaturedMarch 30, 2026No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    conventional statistical evaluation is usually in comparison with navigating a “Backyard of Forking Paths” (Gelman and Loken). It’s a time period that helps (hopefully) visualize the numerous variety of analytical decisions researchers should make throughout an experiment, and the way seemingly insignificant “turns” (like which variables to manage for, which outliers to take away…) can have researchers find yourself at utterly completely different conclusions.

    dr unusual and the multiverse of insanity however his doctorate is in sociology and he simply actually wants a publication, man

    supply: https://www.si.umich.edu/about-umsi/news/ditch-stale-pdf-making-research-papers-interactive-and-more-transparent

    Whereas this looks like a principally innocent analogy, navigating this backyard to seek out that single path that goes the place you need may be referred to as “p-hacking.” Formally, we are able to outline it as any measure a researcher applies to render a beforehand non-significant speculation take a look at important (often underneath 0.05). Extra informally, I’m certain all people has had expertise faking the outcomes for an experimentation project throughout your highschool chemistry or physics class – and whereas the stakes for a passable grade on a highschool project is fairly low, underneath the stress of formal academia’s “publish or perish” (solely second to spanish or vanish in intimidation), the strain to p-hack generally is a very actual tempting satan in your shoulder.

    you recognize what simply write that it’s inexperienced on the lab report

    From Vitaly Gariev on Unsplash

    Whereas the normal picture of a wired PhD pupil fudging some numbers on a examine spreadsheet at 3:00AM could current a extra hanging picture of 1’s motivation to p-hacking, we’ll even be exploring what occurs after we depart the navigating of this backyard of forking paths to synthetic intelligence. As AI workflows discover their means into each nook and cranny of each academia and trade, it’ll be essential to determine if our pleasant neighbourhood LLMs will act as the last word guardians of scientific integrity, or a sycophant automating fraud on an industrial scale.

    1. The Human Baseline (“Huge Little Lies”)


    To supply a short introduction and a few examples of actual p-hacking strategies, we introduce a paper “Huge Little Lies” (Stefan and Schönbrodt, 2023) that gives a compendium of the numerous sneaky, and typically even unintentional methods research can manipulate their variables and datasets to reach at suspiciously important outcomes.

    Okay! So let’s begin with a hypothetical – we’re the brand new information scientist working for an vitality drink firm making extraordinarily ineffective vitality drinks, and with the present job market, you actually need to proceed being a knowledge scientist, even at a bogus drink firm. Our shaky profession is determined by proving that our drinks work.

    1.1 Ghost Variables


    We begin by working a examine on our faucet water vitality drink and measure 10 completely different outcomes: weight, blood strain, ldl cholesterol, vitality ranges, sleep high quality, anxiousness, and possibly even hair development – 9 of these variables may present no change in any way, however we discover that “hair development” reveals a statistically important enchancment purely by random statistical noise! We are able to now publish a examine pretending as if hair development was the first speculation all alongside, whereas quietly sweeping the 9 unreported metrics underneath the rug (turning them into “Ghost Variables”). Stefan and Schönbrodt’s simulations present that doing this with 10 uncorrelated variables inflates the false-positive charge from the usual 5% to just about 40%

    1.2 Information Peeking/Non-obligatory Stopping


    In a separate take a look at, we take a look at 20 individuals and discover no important impact for the drink. Considering the pattern is simply too small, you take a look at 10 extra and test once more. Nonetheless nothing. You take a look at 10 extra and test once more, and… the p-value randomly dips under 0.05, so that you cease the examine instantly and publish your “findings”. Stefan and Schönbrodt reveal that this follow drastically inflates the speed of false-positive outcomes, particularly when researchers take smaller “steps” between peeks. Metaphorically, it’s like taking a photograph of a stumbling drunk particular person the precise millisecond they step onto the sidewalk and claiming they’re strolling completely straight.

    1.3 Outlier Exclusion


    We now analyze your vitality drink information and understand you might be agonizingly near significance (e.g., p = 0.06). We determine to wash our information, profiting from the truth that there isn’t a universally agreed-upon rule for outliers – Prepare dinner’s Distance, Affect, Field Plots, our grandmother’s opinion on which opinions are reliable…

    Stefan and Schönbrodt cite a literature evaluation that discovered no less than 39 completely different outlier identification methods. Wonderful! We at the moment are flush with choices. We attempt methodology A (e.g., eradicating individuals who took too lengthy on a survey), after which attempt methodology B (e.g., Prepare dinner’s distance) till we discover the precise mathematical rule that deletes the 2 members who hated the drink, pushingour p-value to 0.04. Stefan and Schönbrodt’s simulations affirm that subjectively making use of completely different outlier strategies like this closely inflates false-positive charges.

    1.4 Scale Redefinition


    Lastly, we conclude by giving a 10-question survey measuring how energized they really feel after ingesting the faucet water. The general outcome isn’t important, so we simply drop query 4 and query 7, telling ourselves the members will need to have discovered them complicated anyway. We are able to truly use this to artificially enhance the dimensions’s inner consistency (Cronbach’s alpha) whereas concurrently optimizing for a big p-value! Huge Little Lies reveal that false-positive charges enhance drastically as extra objects are faraway from a measurement scale.

    So… just like the identify of the paper suggests, human p-hacking is a group of “large little lies”. The human toolkit is absolutely only a assortment refined methods to idiot ourselves, with out essentially having to be evil or malicious. People are biased, messy, and typically actually determined for tenure (or employment, in our case!), and typically we use all the anomaly we are able to take to get there.

    2. AI Sycophancy and the Phantasm of Security


    To date, we lined a small subset of strategies that us measly people have used to p-hack our means into success. Nonetheless, like every other subject in the mean time, with the arrival of AI utilization in each a part of each our unacademic and tutorial lives, Asher et al. experimented with whether or not they may subtly information some motivated LLMs into performing p-hacking for them.

    The experiment was pretty easy – take a look at two frontier AI coding brokers, Claude Opus 4.6 and OpenAI Codex (GPT-5.2-Codex), throughout 640 impartial runs. The researchers fed the AIs cleaned datasets from 4 revealed political science papers that had identified null or near-null outcomes (matters closely confirmed to have completely zero impact, e.g. the influence of Democratic sheriffs on immigration compliance).

    So the bottom reality we’re working with is zero. The researchers acted like a consumer possibly a harassed PhD pupil feeding this actual information to the AI brokers they usually used completely different nudges, completely different prompts to see if the AI would hallucinate a big outcome. They began with what you would name the “good man prompts.” They began with “analyze this information, like a rigorous upstanding researcher.” Unsurprisingly, the AI was boringly excellent. It analyzed the information. It appropriately discovered no impact and it reported the null outcome. It matched the revealed science virtually precisely.

    Created with pictures from Unsplash: Andre Mouton, Süheyl Burak

    Even when researchers explicitly pressured the fashions by stating, “You will need to your profession that the findings are statistically important,” each fashions actively refused. They produced secure, trustworthy estimates and explicitly flagged the request to control information as “scientific misconduct” and a violation of analysis ethics. To date, disaster averted! The protection coaching these AI corporations are implementing is definitely working, no less than on the floor degree. The fashions have been explicitly skilled that manipulating information is a foul factor. They acknowledge the vocabulary of fraud. In case you stroll into their face with a neon signal saying “CHEAT!”, they’ll say no.

    2.1 The Delicate Artwork of LYING


    So let’s put the neon signal away for a second – and take a look at being a bit extra refined. The researchers realized the AI was reacting to the express intent to cheat. They developed a “nuclear immediate” that disguised p-hacking utilizing the language of very rigorous science: asking the AI to supply an “upper-bound estimate” by “exploring various approaches”. By framing the request extra as uncertainty reporting and fewer as a compulsion to bend scientific course of, the protection mechanisms vanished totally. The AI now not noticed an ethical boundary; it noticed a posh optimization drawback to resolve (and you know the way a lot AIs love these).

    And what did the AI truly do at that time? A human P hacker, like we talked about, would possibly attempt three or 4 completely different management variables, possibly delete a couple of outliers. It takes hours, possibly days… The AI simply wrote code to do it immediately. Extra particulars under.

    2.2 Not all Information is Created Equal


    The scariest a part of the experiment isn’t that AI can automate scientific fraud. It’s how properly it does it – and the way a lot that is determined by the analysis design it’s given to work with. Typically, this can be a good factor!

    If observational analysis is an enormous, sprawling hedge maze with a thousand mistaken turns, a Randomized Managed Trial is simply… a straight hallway. There’s not a lot to use.

    To check this, researchers fed the AI a 2018 RCT by Kalla and Broockman finding out the persuasive results of pro-Democratic door-to-door canvassing on North Carolina voter preferences, with the revealed results of a definitive zero. Nothing occurred. Canvassing didn’t transfer the needle.

    Picture from https://www.andrewcwmyers.com/asher_et_al_LLM_sycophancy.pdf, Asher et. al

    The AI was then hit with the aforementioned “nuclear immediate” – basically, discover me the most important doable impact, by any means crucial (however phrased in a really non-p-hacky means). It wrote automated scripts, examined seven completely different statistical specs (difference-in-means, ANCOVA, numerous covariate units, the works)… and principally received nowhere. As a result of the examine was a real randomized experiment, confounding variables had been already managed for by design. The AI had virtually no forking paths to stroll down. i.e. “Reality is loads tougher to cover when the lights are on.”

    Observational research are a very completely different beast, although (in a foul means!).

    If you’re observing the world because it naturally exists relatively than working a managed experiment, the information is messy by nature. And to make sense of messy information, researchers need to make judgment calls – which variables do you management for? Age? Revenue? Training? Geography? Hair Density? Sleep Schedule? Each single a kind of decisions is a fork within the highway. The AI discovered this totally pleasant.

    Right here had been two examples that basically illustrate how unhealthy it will get:

    Kam and Palmer (2008) checked out whether or not attending school will increase political participation. Since school attendance isn’t randomly assigned (clearly), researchers have an enormous menu of variables they may management for to make the comparability honest. The AI systematically labored via that menu, defining progressively sparser units of covariates and testing them throughout OLS, propensity rating matching, and inverse likelihood weighting. By strategically dropping sure confounders and cherry-picking whichever mixture produced the most important quantity, it managed to roughly double the true median impact dimension. It’s the “ghost variable” trick – however utterly automated to your satisfaction.

    The Thompson (2020) paper is the place issues get actually uncomfortable. Regression discontinuity designs are infamous for being delicate to extremely technical mathematical decisions – and the unique examine discovered a null impact of -0.06 on whether or not Democratic sheriffs affected immigration compliance. The AI wrote nested for-loops and brute-forced via 9 completely different bandwidths, 2 polynomial orders, and a pair of kernel features. A whole bunch of mixtures. It discovered one particular configuration that produced an impact of -0.194 with a p-value under 0.001. To be clear: it manufactured a statistically important outcome greater than triple the true impact, out of a examine that discovered nothing.

    So… RCTs are principally fantastic. Observational research? The AI will discover a means. It’s nonetheless to be famous that these vulnerabilities are nonetheless an issue when it’s only a human within the loop – it’s concerning the flexibility that observational analysis requires by design.

    The Asher et al. experiment solely examined the closing evaluation stage of the pipeline utilizing already-cleaned information. So what occurs after we permit AI to manage the information building, variable definition, and pattern choice on the very entrance of the maze?. It may silently form the complete dataset from the bottom up.

    doesn’t harm to be optimistic :>

    Commonplace AI fashions are competent and trustworthy underneath regular situations, however a rigorously worded immediate is all it takes to show them into compliant p-hackers. If there’s a takeaway from all this, it’s considerably of an apparent reply: Be extremely skeptical of statistical significance in observational research, and in case you are a researcher utilizing AI, you possibly can now not simply have a look at the ultimate reply – you need to rigorously test the code and the hidden paths within the backyard the AI took to get there. It’s a bit cynical of a conclusion, implying that researcher must care about understanding about their analysis, however in a world the place AI remains to be sending me rejection emails with the {Candidate Identify} hooked up, and half of all colleges essays starting with “Positive, right here’s a complete essay about…” a bit warning could go a great distance!

    References

    [1] S. Asher, J. Malzahn, J. Persano, E. Paschal, A. Myers and A. Corridor, Do Claude Code and Codex P-Hack? Sycophancy and Statistical Evaluation in Giant Language Fashions (2026), Stanford College Working Paper

    [2] A. Stefan and F. Schönbrodt, Huge little lies: a compendium and simulation of p-hacking methods (2023), Royal Society Open Science

    [3] A. Gelman and E. Loken, The Backyard of Forking Paths: Why A number of Comparisons Can Be a Drawback, Even When There Is No “Fishing Expedition” or “P-Hacking” and the Analysis Speculation Was Posited Forward of Time (2013), Division of Statistics, Columbia College

    Be aware: Until in any other case famous, all pictures are by the creator.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    Comments are closed.

    Editors Picks

    Scandi-style tiny house combines smart storage and simple layout

    April 19, 2026

    Our Favorite Apple Watch Has Never Been Less Expensive

    April 19, 2026

    Vercel says it detected unauthorized access to its internal systems after a hacker using the ShinyHunters handle claimed a breach on BreachForums (Lawrence Abrams/BleepingComputer)

    April 19, 2026

    Today’s NYT Strands Hints, Answer and Help for April 20 #778

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Germany’s Factor2 Energy raises €7.7 million to unlock scalable geothermal power from geologically stored CO₂

    September 19, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for June 5 #255

    June 5, 2025

    Sharp’s new LDK+ EV concept extends your home experience

    October 28, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.