Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Canyon Spectral:ON CF 8 Electric Mountain Bike: Beginner-Friendly, Under $5K
    • US-sanctioned currency exchange says $15 million heist done by “unfriendly states”
    • This New Air Purifier Filter Can Remove Cannabis Smoke Odor, Just in Time for 4/20
    • Portable water filter provides safe drinking water from any source
    • MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged
    • NCAA seeks faster trial over DraftKings disputed March Madness branding case
    • AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says
    • Extragalactic Archaeology tells the ‘life story’ of a whole galaxy
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»I Measured Neural Network Training Every 5 Steps for 10,000 Iterations
    Artificial Intelligence

    I Measured Neural Network Training Every 5 Steps for 10,000 Iterations

    Editor Times FeaturedBy Editor Times FeaturedNovember 15, 2025No Comments10 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    how neural networks discovered. Prepare them, watch the loss go down, save checkpoints each epoch. Commonplace workflow. Then I measured coaching dynamics at 5-step intervals as a substitute of epoch-level, and all the pieces I assumed I knew fell aside.

    The query that began this journey: Does a neural community’s capability broaden throughout coaching, or is it fastened from initialization? Till 2019, all of us assumed the reply was apparent—parameters are fastened, so capability have to be fastened too. However Ansuini et al. found one thing that shouldn’t be attainable: the efficient representational dimensionality will increase throughout coaching. Yang et al. confirmed it in 2024.

    This adjustments all the pieces. If studying house expands whereas the community learns, how can we mechanistically perceive what it’s truly doing?

    Excessive-Frequency Coaching Checkpoints

    After we are coaching a DNN with 10,000 steps, we used to arrange chack factors each 100 or 200 steps. Measuring at 5-step intervals generates an excessive amount of information that aren’t simple to handle. However these high-frequency checkpoints reveal very priceless details about how a DNN learns.

    Excessive-frequency checkpoints present details about:

    • Whether or not early coaching errors might be recovered from (they usually can’t)
    • Why some architectures work and others fail
    • When interpretability evaluation ought to occur (spoiler: manner sooner than we thought)
    • How you can design higher coaching approaches

    Throughout an utilized analysis mission I’ve measured DNN coaching at excessive decision — each 5 steps as a substitute of each 100 or 500. I used a primary MLP structure with the identical dataset I’ve been utilizing for the final 10 years.

    Determine 1. Experimental setupWe detect discrete transitions utilizing z-score evaluation with
    rolling statistics:

    The outcomes have been stunning. Deep neural networks, even easy architectures, broaden their efficient parameter house throughout coaching. I had assumed this house was predetermined by the structure itself. As a substitute, DNNs endure discrete transitions—small jumps that enhance the efficient dimensionality of their studying house.

    Determine 2: Efficient dimensionality of activation patterns throughout coaching, measured utilizing secure rank. We see three distinct phases emerge: preliminary collapse (steps 0-300) the place dimensionality drops from 2500 to 500, growth part (steps 300-5000) the place dimensionality climbs to 1000, and stabilization (steps 5000-8000) the place dimensionality plateaus. This implies steps 0-2000 represent a qualitatively
    distinct developmental window. Picture by creator.

    In Determine 2 we are able to see the monitoring of activation efficient dimensionality throughout coaching. We see these transitions focus within the first 25% of coaching, and are hidden at bigger checkpoint intervals (100-1000 steps). We wanted a high-frequency checkpointing (5 steps) to detect most of them. The curve additionally exhibits an attention-grabbing habits. The preliminary collapse represents loss panorama restructuring the place random initialization provides option to a task-aligned construction. Then we see an growth part with gradual dimensionality progress. Between 2000-3000 steps, there’s a stabilization that displays DNN architectural capability limits.

    Determine 3: Representational dimensionality (measured utilizing secure rank) exhibits sturdy unfavourable correlation with loss (ρ = −0.951) and average unfavourable correlation with gradient magnitude (ρ = −0.701). As loss decreases from 2.0 to close zero, dimensionality expands from 9.0 to 9.6. Counterintuitively, improved efficiency correlates with expanded slightly than com- pressed representations. Picture by creator.

    This adjustments how we should always take into consideration DNN coaching, interpretability, and structure design.

    Exploration vs Enlargement

    Think about the next two eventualities:

    Situation A:
    Fastened Capability (Exploration)
    Situation B:
    Increasing Capability (Innovation)
    Your community begins with a hard and fast representational capability. Coaching explores completely different areas of this pre-determined house. It’s like navigating a map that exists from the start. Early coaching simply means “haven’t discovered the nice area but”. Your community begins with minimal capability. Coaching creates representational constructions. Its like constructing roads whereas touring — every highway permits new locations. Early coaching establishes what turns into learnable later.

    Which is it?

    The query issues as a result of if capability expands, then early coaching isn’t recoverable. You’ll be able to’t simply “practice longer” to repair early errors. So, interpretability has a timeline the place options kind in sequence. Understanding this sequence is essential. Furehtermore, structure design appears to be about growth fee not simply closing capability. Lastly, crucial intervals exist. If we miss the window, we miss the aptitude.

    When We Have to Measure Excessive-Frequency Checkpoints

    Enlargement vs Exploration

    Determine 4: Excessive-frequency sampling vs. Low Frequency sampling within the experiment describred in Determine 1. We detect discrete transitions utilizing z-score evaluation with rolling statistics. Excessive-frequency sampling captures fast transitions that coarse-grained measurement misses. This comparability checks whether or not temporal decision impacts observable dynamics.

    As seen in Figures 2 and three, high-frequency sampling reveals attention-grabbing data. We will indentify three completely different phases:

    Section 1: Collapse (steps 0-300) The community restructures from random initialization. Dimensionality drops sharply because the loss panorama is reshaped across the job. This isn’t studying but, it’s preparation for studying.
    Section 2: Enlargement (steps 300-5,000)
    Dimensionality climbs steadily. That is capability growth. The community is constructing representational constructions. Easy options that allow complicated options that allow higher-order options.
    Section 3: Stabilization (steps 5,000-8,000) Development plateaus. Architectural constraints bind. The community refines what it has slightly than constructing new capability.

    This plots reveals growth, not exploration. The community at step 5,000 can signify capabilities that have been unattainable at step 300 as a result of they didn’t exist.

    Capability Expands, Parameters Don’t

    Determine 5: Comparability of activation house to weight house.
    Weight house dimensionality stays practically fixed
    (9.72-9.79) with only one detected “leap” throughout 8000 steps. Picture by creator

    The comparability between activation and weight areas exhibits that each comply with completely different dynamics with high-frequency sampling. The activation house exhibits ap. 85 discrete jumps (together with Gaussian noise). The burden house exhibits only one. The identical community with the identical coaching run. It confirms that the community at step 8000 computes capabilities inaccessible at step 500 regardless of an an identical parameter rely. That is the clearest proof for growth.

    DNNs innovate by producing new parameter house choices throughout coaching to be able to signify complicated duties.

    Transitions Are Quick and Early

    We have now seen how high-frequency sampling exhibits many extra transitions. Low-frequency checkpointing would miss practically all of them. These transitions focus early. Two thirds of all transitions occur within the first 2,000 steps — simply 25% of whole coaching time. It signifies that if we need to perceive what options kind and when, we have to look throughout steps 0-2,000, not at convergence. By step 5,000, the story is over.

    Enlargement {Couples} to Optimization

    If we glance once more at Determine 3, we see that as loss decreases, dimensionality expands. The community doesn’t simplify because it learns. It turns into extra complicated. Dimensionality correlates strongly with loss (ρ = -0.951) and reasonably with gradient magnitude (ρ = -0.701). This might appear counterintuitive: improved efficiency correlates with expanded slightly than compressed representations. We’d anticipate networks to search out easier, extra compressed representations as they be taught. As a substitute, they broaden into higher-dimensional areas.

    Why?

    A attainable rationalization is that complicated duties require complicated representations. The community doesn’t discover a easier rationalization and builds the representational adjustments wanted to separate courses, acknowledge patterns, and generalize.

    Sensible Deployment

    We have now seen a special option to perceive and debug DNN coaching throughout any area.

    If we all know when options kind throughout coaching, we are able to analyze them as they crystallize slightly than reverse-engineering a black field afterward.

    In actual deployment eventualities, we are able to monitor representational dimensionality in real-time, detect when growth phases happen, and run interpretability analyses at every transition level. This tells us exactly when our community is constructing new representational constructions—and when it’s completed. The measurement strategy is architecture-agnostic: it really works whether or not you’re coaching CNNs for imaginative and prescient, transformers for language, RL brokers for management, or multimodal fashions for cross-domain duties.

    Instance 1: Intervention experiments that map causal dependencies. Disrupt coaching throughout particular home windows and measure which downstream capabilities are misplaced. If corrupting knowledge throughout steps 2,000-5,000 completely damages texture recognition however the identical corruption at step 6,000 has no impact, you’ve discovered when texture options crystallize and what they depend upon. This works identically for object recognition in imaginative and prescient fashions, syntactic construction in language fashions, or state discrimination in RL brokers.
    Instance 2: For manufacturing deployment, steady dimensionality monitoring catches representational issues throughout coaching when you may nonetheless repair them. If layers cease increasing, you’ve architectural bottlenecks. If growth turns into erratic, you’ve instability. If early layers saturate whereas late layers fail to broaden, you’ve data stream issues. Commonplace loss curves gained’t present these points till it’s too late—dimensionality monitoring surfaces them instantly.
    Instance 3: The structure design implications are equally sensible. Measure growth dynamics throughout the first 5-10% of coaching throughout candidate architectures. Choose for clear part transitions and structured bottom-up improvement. These networks aren’t simply extra performant—they’re essentially extra interpretable as a result of options kind in clear sequential layers slightly than tangled simultaneity.

    What’s Subsequent

    So we’ve established that networks broaden their representational house throughout coaching, that we are able to measure these transitions at excessive decision, and that this opens new approaches to interpretability and intervention. The pure query: are you able to apply this to your individual work?

    I’m releasing the whole measurement infrastructure as open supply. I included validated implementations for MLPs, CNNs, ResNets, Transformers, and Imaginative and prescient Transformers, with hooks for customized architectures.

    The whole lot runs with three traces added to your coaching loop.

    The GitHub repository supplies experiment templates for the experiments mentioned above: characteristic formation mapping, intervention protocols, cross-architecture switch prediction, and manufacturing monitoring setups. The measurement methodology is validated. What issues now’s what you uncover while you apply it to your area.

    Attempt it:

    pip set up ndtracker

    Quickstart, directions, and examples within the repository: Neural Dimensionality Tracker (NDT)

    The code is production-ready. The protocols are documented. The questions are open. I wish to see what you discover while you measure your coaching dynamics at excessive decision irrespective of the context and the structure.

    You’ll be able to share your outcomes, open points together with your findings, or simply ⭐️ the repo if this adjustments how you concentrate on coaching. Keep in mind, the interpretability timeline exists throughout all neural architectures.

    Javier Marín | LinkedIn | Twitter


    References & Additional Studying

    • Achille, A., Rovere, M., & Soatto, S. (2019). Vital studying intervals in deep networks. In Worldwide Convention on Studying Representations (ICLR). https://openreview.net/forum?id=BkeStsCcKQ
    • Frankle, J., Dziugaite, G. Ok., Roy, D. M., & Carbin, M. (2020). Linear mode connectivity and the lottery ticket speculation. In Proceedings of the thirty seventh Worldwide Convention on Machine Studying (pp. 3259-3269). PMLR. https://proceedings.mlr.press/v119/frankle20a.html
    • Ansuini, A., Laio, A., Macke, J. H., & Zoccolan, D. (2019). Intrinsic dimension of knowledge representations in deep neural networks. In Advances in Neural Info Processing Programs (Vol. 32, pp. 6109-6119). https://proceedings.neurips.cc/paper/2019/hash/cfcce0621b49c983991ead4c3d4d3b6b-Abstract.html
    • Yang, J., Zhao, Y., & Zhu, Q. (2024). ε-rank and the staircase phenomenon: New insights into neural community coaching dynamics. arXiv preprint arXiv:2412.05144. https://arxiv.org/abs/2412.05144
    • Olah, C., Mordvintsev, A., & Schubert, L. (2017). Function visualization. Distill, 2(11), e7. https://doi.org/10.23915/distill.00007
    • Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., DasSarma, N., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Jones, A., Kernion, J., Lovitt, L., Ndousse, Ok., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish, S., & Olah, C. (2021). A mathematical framework for transformer circuits. Transformer Circuits Thread. https://transformer-circuits.pub/2021/framework/index.html



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026

    Comments are closed.

    Editors Picks

    Canyon Spectral:ON CF 8 Electric Mountain Bike: Beginner-Friendly, Under $5K

    April 18, 2026

    US-sanctioned currency exchange says $15 million heist done by “unfriendly states”

    April 18, 2026

    This New Air Purifier Filter Can Remove Cannabis Smoke Odor, Just in Time for 4/20

    April 18, 2026

    Portable water filter provides safe drinking water from any source

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    The Duty-Free Loophole Is Closing. What That Means for You—and Your Packages

    August 29, 2025

    Come for the Amenity Kits, Stay for the Flight

    July 2, 2025

    5 Best Lip Balms to Try in 2025, All Tested in Tough Conditions

    June 24, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.