Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • How small businesses can leverage AI
    • Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt
    • GM reimagines Hummer off-roader with California ideas unit
    • London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform
    • How to Edit, Merge, and Split PDFs With Free Online Tools
    • Florida crackdown targets illegal machines in Sarasota
    • Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds
    • New radio bursts detected from binary stars
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Agentic AI for Modern Deep Learning Experimentation
    Artificial Intelligence

    Agentic AI for Modern Deep Learning Experimentation

    Editor Times FeaturedBy Editor Times FeaturedFebruary 18, 2026No Comments15 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    that reads your metrics, detects anomalies, applies predefined tuning guidelines, restarts jobs when obligatory, and logs each choice—with out you looking at loss curves at 2 a.m.

    On this article, I’ll present a light-weight agent designed for deep studying researchers and ML engineers that may:

    • Detect failures routinely
    • Visually cause over efficiency metrics
    • Apply your predefined hyperparameter methods
    • Relaunch jobs
    • Doc each motion and end result

    No structure search. No AutoML. No invasive rewrites of your codebase.

    The implementation is deliberately minimal: containerize your coaching script, add a small LangChain-based agent, outline hyperparameters in YAML, and specific preferences in markdown. You’re most likely doing 50% of this already.

    Drop this agent into your guide prepare.py workflow and go from 0️⃣ to 💯 in a single day.

    The issue along with your current experiments

    🤔 You endlessly ponder over hyperparameters.

    ▶️ You run prepare.py.

    🐛 You repair the bug in prepare.py.

    🔁 You rerun prepare.py

    👀 You stare at TensorBoard.

    🫠 You query actuality.

    🔄 You repeat.

    Each practising Deep Studying/Machine Studying Engineer within the discipline does this. Don’t be ashamed. Authentic photo by MART PRODUCTION through Pexels. Gif imagined by Grok

    Cease looking at your mannequin spit out numbers

    You aren’t a Jedi. No quantity of staring will magically make your [validation loss | classification accuracy | perplexity | any other metric you can name] transfer within the course you need.

    Babysitting a mannequin into the midnight for a vanishing/exploding gradient NaN in a deep transformer based mostly community that you may’t monitor down—and that may by no means even seem? Additionally a onerous no.

    How are you supposed to resolve actual analysis issues when most of your time is spent on work that technically needs to be performed, but contributes little or no to precise perception?

    If 70% of your day is consumed by operational drag, when does the pondering occur?

    Shift to agentic-driven experiments

    A lot of the deep studying engineers and researchers I work with nonetheless run experiments manually. A good portion of the day goes to: scanning Weights & Biases or TensorBoard for final evening’s run, evaluating runs, exporting metrics, adjusting hyperparameters, logging notes, restarting jobs. Then repeating the cycle.

    It’s dry, tedious, and repetitive work.

    We’re going to dump these repetitive duties so you’ll be able to shift your focus to excessive worth work

    The idea of AutoML is, frankly, laughable.

    Your [new] agent won’t make selections on easy methods to change your community topology or add advanced options — that’s your job. It’s going to substitute the repetitive glue work that eats helpful time with little added worth.

    Agent Pushed Experiments (ADEs)

    Switching from guide experiments to an agent-driven workflow is easier than it initially appears. No rewriting your stack, no heavy programs, no tech debt.

    Picture by Writer

    At its core, an ADE requires three steps:

    1. Containerize your current coaching script
      • Wrap your present prepare.py in a Docker container. No refactoring of mannequin logic. No architectural modifications. Only a reproducible execution boundary.
    2. Add a light-weight agent
      • Introduce a small LangChain-based script that reads metrics out of your dashboard, applies your preferences, decides when and the place to relaunch, halt or doc and schedule it with cron or any job scheduler
    3. Outline habits and preferences with pure language
      • Use a YAML file for configuration and hyperparameters
      • Use a Markdown doc to speak along with your agent

    That’s the whole system. Now, Let’s assessment every step.

    Containerize your coaching script

    One may argue try to be doing this in any case. It makes restarting and scheduling a lot simpler, and, when you transfer to a Kubernetes cluster for coaching, the disruption to your current course of is far decrease.

    If you happen to’re already doing this, skip to the subsequent part. If not, right here’s some useful code you should use to get began.

    First, let’s outline a mission construction that can work with Docker.

    your experiment/
    ├── scripts/
    │   ├── prepare.py                 # Most important coaching script
    │   └── health_server.py         # Well being examine server
    ├── necessities.txt             # Python dependencies
    ├── Dockerfile                   # Container definition
    └── run.sh                       # Script to begin coaching + well being examine
    

    We have to guarantee that your prepare.py script can load a configuration file from the cloud, permitting the agent to edit it if wanted.

    I like to recommend utilizing GitHub for this. Right here’s an instance of easy methods to learn a distant config file. The agent could have a corresponding device to learn and modify this config file.

    import os
    import requests
    import yaml
    from field import Field
    
    # add this to `prepare.py`
    GITHUB_RAW = (
        "https://uncooked.githubusercontent.com/"
        "{proprietor}/{repo}/{ref}/{path}"
    )
    
    def load_config_from_github(proprietor, repo, path, ref="most important", token=None):
        url = GITHUB_RAW.format(proprietor=proprietor, repo=repo, ref=ref, path=path)
    
        headers = {}
        if token:
            headers["Authorization"] = f"Bearer {token}"
    
        r = requests.get(url, headers=headers, timeout=10)
        r.raise_for_status()
    
        return Field(yaml.safe_load(r.textual content))
    
    
    config = load_yaml_from_github(...)
    
    # use params all through your `prepare.py` script
    optimizer = Adam(lr=config.lr)
    

    We additionally embrace a well being examine server to run alongside the principle course of. This enables container managers, comparable to Kubernetes, or your agent, to watch the job’s standing with out inspecting logs.

    If the container’s state modifications unexpectedly, it may be routinely restarted. This simplifies agent inspection, as studying and summarizing log recordsdata may be extra pricey in tokens than merely checking the well being of a container.

    # health_server.py
    import time
    from pathlib import Path
    from fastapi import FastAPI, Response
    
    app = FastAPI()
    
    HEARTBEAT = Path("/tmp/heartbeat")
    STATUS = Path("/tmp/standing.json")  # non-compulsory richer state
    MAX_AGE = 300  # seconds
    
    def last_heartbeat_age():
        if not HEARTBEAT.exists():
            return float("inf")
        return time.time() - float(HEARTBEAT.read_text())
    
    @app.get("/well being")
    def well being():
        age = last_heartbeat_age()
    
        # stale -> coaching seemingly hung
        if age > MAX_AGE:
            return Response("stalled", status_code=500)
    
        # non-compulsory: detect NaNs or failure flags written by coach
        if STATUS.exists() and "failed" in STATUS.read_text():
            return Response("failed", status_code=500)
    
        return {"standing": "okay", "heartbeat_age": age}
    

    A small shell script, run.sh, which begins the health_server course of alongside aspect the prepare.py

    #!/bin/bash
    
    # Begin well being examine server within the background
    python scripts/health_server.py &
    # Seize its PID if you wish to terminate later
    HEALTH_PID=$!
    # Begin the principle coaching script
    python scripts/prepare.py
    

    And naturally, our Dockerfile, which is constructed on NVIDIA’s base picture so your container can use the host’s accelerator with zero friction. This instance is for Pytorch, however you’ll be able to merely prolong it to Jax or Tensorflow if wanted.

    FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
    
    RUN apt-get replace && apt-get set up -y 
        python3 python3-pip git
    
    RUN python3 -m pip set up --upgrade pip
    
    # Set up PyTorch with CUDA help
    RUN pip3 set up torch torchvision torchaudio --extra-index-url https://obtain.pytorch.org/whl/cu121
    
    WORKDIR /app
    
    COPY . /app
    
    CMD ["sh", "run.sh"]
    

    ✅ You’re containerized. Easy and minimal.

    Add a light-weight agent

    There are lots of agent frameworks to selected from. For this agent, I like Langchain.

    LangChain is a framework for constructing LLM-driven programs that mix reasoning and execution. It simplifies chaining mannequin calls, managing reminiscence, and integrating exterior capabilities so your LLM can do greater than generate textual content.

    In LangChain, Tools are explicitly outlined, schema-bound capabilities the mannequin can name. Every device is an idempotent ability or job (e.g., studying a file, querying an API, modifying state).

    To ensure that our agent to work, we first must outline the instruments that it could actually use to attain our goal.

    Tool definitions

    1. read_preferences
      • Reads in consumer preferences and experiment notes from a markdown doc
    2. check_tensorboard
      • Makes use of selenium with a chrome webdriver to screenshot metrics
    3. analyze_metric
      • Makes use of multimodal LLM reasoning to grasp what’s taking place within the screenshot
    4. check_container_health
      • Checks our containerized experiment utilizing a well being examine
    5. restart_container
      • Restarts experiment if unhealthy or a hyperparameter must be modified
    6. modify_config
      • Modifies a distant config file and commits to Github
    7. write_memory
      • Writes a sequence of actions to a persistent reminiscence (markdown)

    This set of instruments outline our agent’s operational boundaries. All interplay with our experiment via these instruments, making habits controllable and hopefully, predictable.

    As an alternative of offering these instruments in line — right here’s a github gist containing all of the instruments described above. You may plug these into your agent or modify as you see match.

    The agent

    To be fairly trustworthy, the primary time I attempted to grok the official Langchain documentation, I grew to become instantly turned off of the thought all collectively.

    It’s overly verbose and extra advanced than obligatory. If you happen to’re new to brokers, or simply don’t need to navigate the labyrinth that’s the Langchain documentation, please proceed studying under.

    Langsmith? Random asides? Little tooltips in all places? I’ll move on smiting this worthy foe. Imagined by Grok

    In a nutshell, that is how Langchain brokers work:

    Our agent makes use of a immediate to determine what to do at every step.

    Steps are dynamically created by filling within the immediate with the present context and former outputs. Every LLM name [+ optional tool invocation] is a step, and its output feeds into the subsequent, forming a chain.

    Utilizing this conceptionally recursive loop, the agent can cause and carry out the right meant motion over all of the steps required. What number of steps depends on the agent’s capacity to cause and the way clearly the termination situation is outlined.

    It’s a Lang-chain. Get it? 🤗 

    The immediate

    As famous, the immediate is the recursive glue that maintains context throughout LLM and power invocations. You’ll see placeholders (outlined under) used when the agent is first initialized.

    We use a little bit of LangChain’s built-in reminiscence abstractions, included with every device name. Except for that, the agent fills within the gaps, deciding each the subsequent step and which device to name.

    For readability, the principle immediate is under. You may both plug it straight into the agent script or load it from the filesystem earlier than operating.

    "You're an experiment automation agent liable for monitoring 
    and sustaining ML experiments.
    
    Present context:
    {chat_history}
    
    Your workflow:
    1. First, learn preferences from preferences.md to grasp thresholds and settings
    2. Verify TensorBoard on the specified URL and seize a screenshot
    3. Analyze key metrics (validation loss, coaching loss, accuracy) from the screenshot
    4. Verify Docker container well being for the coaching container
    5. Take corrective actions based mostly on evaluation:
       - Restart unhealthy containers
       - Modify hyperparameters in keeping with consumer preferences 
         and anomalous patterns, restarting the experiment if obligatory
    6. Log all observations and actions to reminiscence
    
    Necessary pointers:
    - All the time learn preferences first to get present configuration
    - Use visible evaluation to grasp metric developments
    - Be conservative with config modifications (solely modify if clearly wanted)
    - Write detailed reminiscence entries for future reference
    - Verify container well being earlier than and after any restart
    - When modifying config, use acceptable values from preferences
    
    Out there instruments: {tool_names}
    Software descriptions: {instruments}
    
    Present job: {enter}
    
    Suppose step-by-step and use instruments to finish the workflow.
    """

    Now with ~100ish traces, we’ve got our agent. The agent is initialized, then we outline a sequence of steps. For every step, the current_task directive is populated in our immediate, and every device updates a shared reminiscence occasion ConverstationSummaryBufferMemory

    We’re going to use OpenAI for this agent, nonetheless, Langchain supplies alternate options, together with internet hosting your personal. If value is a matter, there are open-sourced fashions which can be utilized right here.

    import os
    from datetime import datetime
    from pathlib import Path
    from langchain.brokers import AgentExecutor, create_react_agent
    from langchain_openai import ChatOpenAI
    from langchain.prompts import PromptTemplate
    from langchain.reminiscence import ConversationSummaryBufferMemory
    
    # Import instruments from instruments.py
    from instruments import (
        read_preferences,
        check_tensorboard,
        analyze_metric,
        check_container_health,
        restart_container,
        modify_config,
        write_memory
    )
    
    PROMPT=open("immediate.txt").learn()
    class ExperimentAutomation:
        def __init__(self, openai_key=None):
            """Initialize the agent"""
            self.llm = ChatOpenAI(
                temperature=0.8,
                mannequin="gpt-4-turbo-preview",
                api_key=openai_key or os.getenv('OPENAI_API_KEY')
            )
    
            # Initialize reminiscence for dialog context
            self.reminiscence = ConversationSummaryBufferMemory(
                llm=self.llm,
                max_token_limit=32000,
                memory_key="chat_history",
                return_messages=True
            )
    
        def create_agent(self):
            """Create LangChain agent with imported instruments"""
            instruments = [
                lambda **kwargs: read_preferences(memory=self.memory, **kwargs),
                lambda **kwargs: check_tensorboard(memory=self.memory, **kwargs),
                lambda **kwargs: analyze_metric(memory=self.memory, **kwargs),
                lambda **kwargs: check_container_health(memory=self.memory, **kwargs),
                lambda **kwargs: restart_container(memory=self.memory, **kwargs),
                lambda **kwargs: modify_config(memory=self.memory, **kwargs),
                lambda **kwargs: write_memory(memory=self.memory, **kwargs)
            ]
    
            # Create the immediate template
            immediate = PromptTemplate.from_template(PROMPT)
    
            agent = create_react_agent(
                llm=self.llm,
                instruments=instruments,
                immediate=immediate
            )
    
            # Create agent executor with reminiscence
            return AgentExecutor(
                agent=agent,
                instruments=instruments,
                reminiscence=self.reminiscence,
                verbose=True,
                max_iterations=15,
                handle_parsing_errors=True,
                return_intermediate_steps=True
            )
    
        def run_automation_cycle(self):
            """Execute the complete automation cycle step-by-step"""
            write_memory(
                entry="Automation cycle began",
                class="SYSTEM",
                reminiscence=self.reminiscence
            )
    
            attempt:
                agent = self.create_agent()
    
                # Outline the workflow as particular person steps
                workflow_steps = [
                    "Read preferences from preferences.md to capture thresholds and settings",
                    "Check TensorBoard at the specified URL and capture a screenshot",
                    "Analyze validation loss, training loss, and accuracy from the screenshot",
                    "Check Docker container health for the training container",
                    "Restart unhealthy containers if needed",
                    "Adjust hyperparameters according to preferences and restart container if necessary",
                    "Write all observations and actions to memory"
                ]
    
                # Execute every step individually
                for step in workflow_steps:
                    consequence = agent.invoke({"enter": step})
    
                    # Write step output to reminiscence
                    if consequence.get("output"):
                        memory_summary = f"Step: {step}nOutput: {consequence['output']}"
                        write_memory(entry=memory_summary, class="STEP", reminiscence=self.reminiscence)
    
                write_memory(
                    entry="Automation cycle accomplished efficiently",
                    class="SYSTEM",
                    reminiscence=self.reminiscence
                )
    
                return consequence
    
            besides Exception as e:
                error_msg = f"Automation cycle failed: {str(e)}"
                write_memory(entry=error_msg, class="ERROR", reminiscence=self.reminiscence)
                elevate
    
    
    def most important():
        attempt:
            automation = ExperimentAutomation(openai_key=os.environ["OPENAI_API_KEY"])
            consequence = automation.run_automation_cycle()
    
            if consequence.get('output'):
                print(f"nFinal Output:n{consequence['output']}")
    
            if consequence.get('intermediate_steps'):
                print(f"nSteps Executed: {len(consequence['intermediate_steps'])}")
    
            print("n✓ Automation cycle accomplished efficiently")
    
        besides Exception as e:
            print(f"n✗ Automation failed: {e}")
            write_memory(entry=f"Essential failure: {str(e)}", class="ERROR")
            import sys
            sys.exit(1)
    
    
    if __name__ == "__main__":
        most important()
    

    Now that we’ve got our agent, and instruments, let’s talk about how we really specific our intent as a researcher – crucial piece.

    Outline habits and preferences with pure language

    As described, defining what we’re on the lookout for once we begin an experiment is significant to getting the right habits from an agent.

    Though picture reasoning fashions have come fairly far, and have a great little bit of context, they nonetheless have a methods to go earlier than they will perceive what a great coverage loss curve seems like in Hierarchical Coverage Optimization, or what the perplexity of the codebook ought to appear like in a Vector Quantized Variational Autoencoder, one thing I’ve been optimizing over the previous week.

    For this, we initialize any automated reasoning with a preferences.md.

    Let’s begin with some common settings

    # Experiment Preferences
    
    This file defines my preferences for this experiment.
    The agent ought to all the time learn this primary earlier than taking any motion.
    
    ---
    
    ## Common Settings
    
    - experiment_name: vqvae
    - container_name: vqvae-train
    - tensorboard_url: http://localhost:6006
    - memory_file: reminiscence.md
    - maximum_adjustments_per_run: 4
    ---
    ## Extra particulars
    You may all the time add extra sections right here. The read_preferences job will parse
    and cause over every part. 

    Now, let’s outline metrics of curiosity. That is particularly necessary within the case of visible reasoning.

    Throughout the markdown doc, outline yaml blocks which will likely be parsed by the agent utilizing the read_preferences device. Including this little bit of construction is useful for utilizing preferences as arguments to different instruments.

    ```yaml
    metrics:
      - title: perplexity
        sample: ought to stay excessive via the course of coaching
        restart_condition: untimely collapse to zero
        hyperparameters: |
            if collapse, improve `perplexity_weight` from present worth to 0.2
      - title: prediction_loss
        sample: ought to lower over the course of coaching
        restart_condition: will increase or stalls
        hyperparameters: |
            if will increase, improve the `prediction_weight` worth from present to 0.4
      - title: codebook_usage
        sample: ought to stay fastened at > 90%
        restart_condition: drops under 90% for a lot of epochs
        hyperparameters: |
            lower the `codebook_size` param from 512 to 256. 
    
    ```

    The important thing concept is that the preferences.md ought to present sufficient structured and descriptive element so the agent can:

    Evaluate its evaluation in opposition to your intent, e.g., if the agent sees validation loss = 0.6 however preferences say val_loss_threshold must be 0.5, it is aware of what the corrective motion must be

    Learn the thresholds and constraints (YAML or key-value) for metrics, hyperparameters, and container administration.

    Perceive intent or intent patterns described in human-readable sections, like “solely modify studying charge if validation loss exceeds threshold and accuracy is stagnating.”

    Wiring all of it collectively

    Now that we’ve got a containerized experiment + an agent, we have to schedule the agent. This is so simple as operating the agent course of through a cron job. This runs our agent as soon as each hour, offering a tradeoff between value (in tokens) vs. operational effectivity.

    0 * * * * /usr/bin/python3 /path/to/agent.py >> /var/log/agent.log 2>&1

    I’ve discovered that this agent doesn’t want the most recent reasoning mannequin and performs high-quality with the earlier generations from Anthropic and OpenAI.

    Wrapping up

    If analysis time is finite, it must be spent on analysis, not babysitting experiments.

    Your agent ought to deal with monitoring, restarts, and parameter changes with out fixed supervision. When the drag disappears, what stays is the precise work: forming hypotheses, designing higher fashions, and testing concepts that matter.

    Hopefully, this agent will free you up a bit to dream up the subsequent huge concept. Get pleasure from.

    References

    Müller, T., Smith, J., & Li, Okay. (2023). LangChain: A framework for creating functions with giant language fashions. GitHub repository. https://github.com/hwchase17/langchain

    OpenAI. (2023). OpenAI API documentation. https://platform.openai.com/docs



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Escaping the Valley of Choice in BI

    June 2, 2026

    Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

    June 1, 2026

    RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

    June 1, 2026

    How to Combine Claude Code and Codex for Maximum Coding Power

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Comments are closed.

    Editors Picks

    How small businesses can leverage AI

    June 2, 2026

    Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt

    June 2, 2026

    GM reimagines Hummer off-roader with California ideas unit

    June 2, 2026

    London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Sega’s Re-Released Games for Switch 2 Include Yakuza 0 and Puyo Puyo Tetris 2S

    May 21, 2025

    What Type of Mattress Is Right for You? (2025)

    June 17, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for March 8 #531

    March 8, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.