Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged
    • NCAA seeks faster trial over DraftKings disputed March Madness branding case
    • AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says
    • Extragalactic Archaeology tells the ‘life story’ of a whole galaxy
    • Swedish semiconductor startup AlixLabs closes €15 million Series A to scale atomic-level etching technology
    • Republican Mutiny Sinks Trump’s Push to Extend Warrantless Surveillance
    • Yocha Dehe slams Vallejo Council over rushed casino deal approval process
    • One Rumored Color for the iPhone 18 Pro? A Rich Dark Cherry Red
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The Hidden Trap of Fixed and Random Effects
    Artificial Intelligence

    The Hidden Trap of Fixed and Random Effects

    Editor Times FeaturedBy Editor Times FeaturedJuly 19, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    What Are Random Results and Mounted Results?

    When designing a examine, we frequently purpose to isolate unbiased variables from these of no curiosity to look at their true results on the dependent variables. For instance, let’s say we want to examine the results of utilizing Github Copilot (unbiased variable) on developer productiveness (dependent variable). One strategy is to measure how a lot time builders spend utilizing Copilot and the way shortly they full coding duties. At first look, we might observe a robust constructive correlation: extra Copilot utilization, quicker process completion.

    Nevertheless, different components may affect how shortly builders end their work. For instance, Firm A might need quicker CI/CD pipelines or take care of smaller and less complicated duties, whereas Firm B might require prolonged code evaluations or deal with extra advanced and time-consuming duties. If we don’t account for these organizational variations, we’d mistakenly conclude that Copilot is much less efficient for builders in Firm B, though it’s the atmosphere, not Copilot, that really slows them down.

    These sorts of group-level variations — variations throughout groups, corporations, or initiatives — are usually generally known as “random results“ or “fastened results“.

    Mounted results are variables of curiosity, the place every group is handled individually utilizing one-hot coding. This fashion, for the reason that within-group variability is captured neatly inside every dummy variable, we’re assuming the variance of every group is comparable, or homoscedastic.

    [y_i = beta_0 + beta_1 x_i + gamma_1 D_{1i} + gamma_2 D_{2i} + cdots + varepsilon_i]

    the place D1i, D2i, … respectively are dummy variables representing group D1i, D2i, … and γ₁, γ₂, … respectively are fastened impact coefficients for every corresponding group.

    Random results, alternatively, are usually not variables of curiosity. We assume every group is a part of a broader inhabitants and every group impact lies someplace inside a broader chance distribution of that inhabitants. As such, the variance of every group is heterogeneous.

    [ y_{ij} = beta_0 + beta_1 x_{ij} + u_j + varepsilon_{ij} ]

    the place uj is a random impact of group j of pattern i, drawn from a distribution, usually a traditional distribution 𝒩(0, σ²ᵤ).

    Rethink Rigorously Mounted and Random Results

    Nevertheless, it might mislead your evaluation if you happen to simply randomly insert these results into your mannequin with out considering rigorously about what sorts of variations they’re really capturing.

    I not too long ago labored on a challenge analyzing Environmental Impacts of AI models, which I studied how sure architectural options (variety of parameters, variety of compute, dataset measurement, and coaching time) and {hardware} selections ({hardware} kind, variety of {hardware}) of AI fashions have an effect on power use throughout coaching. I discovered that Training_time, Hardware_quantity, and Hardware_type considerably affected the power utilization. The connection could be roughly modeled as:

    [ text{energy} = text{Training_time} + text{Hardware_quantity} + text{Hardware}]

    Since I believed there is perhaps variations between organizations, for instance, in coding fashion, code construction, or algorithm preferences, I believed that together with Group as random results would assist account for all of those unobserved potential variations. To check my assumption, I in contrast the outcomes of two fashions: with and with out Group, to see which one is a greater match. Within the two fashions, the dependent variable Power was extraordinarily right-skewed, so I utilized a log transformation to stabilize its variance. Right here I used Generalized Linear Fashions (GLM) because the distribution of my information was not regular.

    glm <- glm(
      log_Energy ~ Training_time_hour + 
                   Hardware_quantity + 
                   Training_hardware,
                   information = df)
    abstract(glm)
    
    glm_random_effects <- glmer(
      log_Energy ~ Training_time_hour + 
                   Hardware_quantity + 
                   Training_hardware + 
                   (1 | Group), // Random results
                   information = df)
    abstract(glm_random_effects)
    AIC(glm_random_effects)

    The GLM mannequin with out Group produced an AIC of 312.55, with Training_time, Hardware_quantity, and sure varieties of {Hardware} had been statistically vital.

    > abstract(glm)
    
    Name:
    glm(method = log_Energy ~ Training_time_hour + Hardware_quantity + 
        Training_hardware, information = df)
    
    Coefficients:
                                                     Estimate Std. Error t worth Pr(>|t|)    
    (Intercept)                                     7.134e+00  1.393e+00   5.123 5.07e-06 ***
    Training_time_hour                              1.509e-03  2.548e-04   5.922 3.08e-07 ***
    Hardware_quantity                               3.674e-04  9.957e-05   3.690 0.000563 ***
    Training_hardwareGoogle TPU v3                  1.887e+00  1.508e+00   1.251 0.216956    
    Training_hardwareGoogle TPU v4                  3.270e+00  1.591e+00   2.055 0.045247 *  
    Training_hardwareHuawei Ascend 910              2.702e+00  2.485e+00   1.087 0.282287    
    Training_hardwareNVIDIA A100                    2.528e+00  1.511e+00   1.674 0.100562    
    Training_hardwareNVIDIA A100 SXM4 40 GB         3.103e+00  1.750e+00   1.773 0.082409 .  
    Training_hardwareNVIDIA A100 SXM4 80 GB         3.866e+00  1.745e+00   2.216 0.031366 *  
    Training_hardwareNVIDIA GeForce GTX 285        -4.077e+00  2.412e+00  -1.690 0.097336 .  
    Training_hardwareNVIDIA GeForce GTX TITAN X    -9.706e-01  1.969e+00  -0.493 0.624318    
    Training_hardwareNVIDIA GTX Titan Black        -8.423e-01  2.415e+00  -0.349 0.728781    
    Training_hardwareNVIDIA H100 SXM5 80GB          3.600e+00  1.864e+00   1.931 0.059248 .  
    Training_hardwareNVIDIA P100                   -1.663e+00  1.899e+00  -0.876 0.385436    
    Training_hardwareNVIDIA Quadro P600            -1.970e+00  2.419e+00  -0.814 0.419398    
    Training_hardwareNVIDIA Quadro RTX 4000        -1.367e+00  2.424e+00  -0.564 0.575293    
    Training_hardwareNVIDIA Quadro RTX 5000        -2.309e+00  2.418e+00  -0.955 0.344354    
    Training_hardwareNVIDIA Tesla K80               1.761e+00  1.988e+00   0.886 0.380116    
    Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.415e+00  1.833e+00   1.863 0.068501 .  
    Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.698e+00  2.413e+00   1.532 0.131852    
    Training_hardwareNVIDIA V100                   -3.638e-01  1.582e+00  -0.230 0.819087    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for gaussian household taken to be 3.877685)
    
        Null deviance: 901.45  on 69  levels of freedom
    Residual deviance: 190.01  on 49  levels of freedom
    AIC: 312.55
    
    Variety of Fisher Scoring iterations: 2

    However, the GLM mannequin with Group produced an AIC of 300.38, a lot decrease than the earlier mannequin, indicating a greater mannequin match. Nevertheless, when taking a more in-depth look, I seen a big subject: The statistical significance of different variables have gone away, as if Group took away the importance from them!

    > abstract(glm_random_effects)
    Linear combined mannequin match by REML ['lmerMod']
    Formulation: log_Energy ~ Training_time_hour + Hardware_quantity + Training_hardware +  
        (1 | Group)
       Knowledge: df
    
    REML criterion at convergence: 254.4
    
    Scaled residuals: 
         Min       1Q   Median       3Q      Max 
    -1.65549 -0.24100  0.01125  0.26555  1.51828 
    
    Random results:
     Teams       Title        Variance Std.Dev.
     Group (Intercept) 3.775    1.943   
     Residual                 1.118    1.057   
    Variety of obs: 70, teams:  Group, 44
    
    Mounted results:
                                                     Estimate Std. Error t worth
    (Intercept)                                     6.132e+00  1.170e+00   5.243
    Training_time_hour                              1.354e-03  2.111e-04   6.411
    Hardware_quantity                               3.477e-04  7.035e-05   4.942
    Training_hardwareGoogle TPU v3                  2.949e+00  1.069e+00   2.758
    Training_hardwareGoogle TPU v4                  2.863e+00  1.081e+00   2.648
    Training_hardwareHuawei Ascend 910              4.086e+00  2.534e+00   1.613
    Training_hardwareNVIDIA A100                    3.959e+00  1.299e+00   3.047
    Training_hardwareNVIDIA A100 SXM4 40 GB         3.728e+00  1.551e+00   2.404
    Training_hardwareNVIDIA A100 SXM4 80 GB         4.950e+00  1.478e+00   3.349
    Training_hardwareNVIDIA GeForce GTX 285        -3.068e+00  2.502e+00  -1.226
    Training_hardwareNVIDIA GeForce GTX TITAN X     4.503e-02  1.952e+00   0.023
    Training_hardwareNVIDIA GTX Titan Black         2.375e-01  2.500e+00   0.095
    Training_hardwareNVIDIA H100 SXM5 80GB          4.197e+00  1.552e+00   2.704
    Training_hardwareNVIDIA P100                   -1.132e+00  1.512e+00  -0.749
    Training_hardwareNVIDIA Quadro P600            -1.351e+00  1.904e+00  -0.710
    Training_hardwareNVIDIA Quadro RTX 4000        -2.167e-01  2.503e+00  -0.087
    Training_hardwareNVIDIA Quadro RTX 5000        -1.203e+00  2.501e+00  -0.481
    Training_hardwareNVIDIA Tesla K80               1.559e+00  1.445e+00   1.079
    Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.751e+00  1.536e+00   2.443
    Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.487e+00  1.761e+00   1.980
    Training_hardwareNVIDIA V100                    7.019e-01  1.434e+00   0.489
    
    Correlation matrix not proven by default, as p = 21 > 12.
    Use print(x, correlation=TRUE)  or
        vcov(x)        if you happen to want it
    
    match warnings:
    Some predictor variables are on very completely different scales: think about rescaling
    > AIC(glm_random_effects)
    [1] 300.3767

    Considering over it rigorously, it made a number of sense. Sure organizations might constantly desire particular varieties of {hardware}, or bigger organizations might be able to afford dearer {hardware} and sources to coach larger AI fashions. In different phrases, the random results right here probably overlapped and overly defined the variations of our obtainable unbiased variables, therefore they absorbed a big portion of what we had been making an attempt to review.

    This highlights an necessary level: whereas random or fastened results are helpful instruments to manage for undesirable group-level variations, they’ll additionally unintentionally seize the underlying variations of our unbiased variables. We should always rigorously think about what these results really symbolize, earlier than simply blindly introducing them to our fashions hoping they’d fortunately take up all of the noise.


    References: Steve Halfway, Knowledge Evaluation in R, https://bookdown.org/steve_midway/DAR/random-effects.html



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026

    Comments are closed.

    Editors Picks

    MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged

    April 18, 2026

    NCAA seeks faster trial over DraftKings disputed March Madness branding case

    April 18, 2026

    AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says

    April 18, 2026

    Extragalactic Archaeology tells the ‘life story’ of a whole galaxy

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Thousands of Asus routers are being hit with stealthy, persistent backdoors

    May 29, 2025

    The Best Hair Dryers of 2026, Tested on 6 Different Hair Types

    March 25, 2026

    Understanding the potential impact and benefits

    April 20, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.