Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • 2026 Rivian R1S review: Balanced luxury electric SUV
    • Old Oil and Gas Wells Could Find Second Life Producing Clean Energy
    • How tech companies are using open source initiatives to achieve critical strategic goals and how such efforts are reshaping industries like AI, AVs, and more (Bill Gurley/Bill’s Substack)
    • Today’s NYT Wordle Hints, Answer and Help for May 16 #1792
    • Oldest dental procedure found on Neanderthal tooth
    • The Chinese App That Puts Instagram to Shame
    • YouTube, Snap, and TikTok reached agreements to settle a lawsuit set for trial in June over claims social media addiction disrupted students’ learning (Bloomberg)
    • SwitchBot’s New Smart Lock Uses Face ID to Unlock Your Door
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, May 16
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To Dashboard
    Artificial Intelligence

    How to Build an AI-Powered Weather ETL Pipeline with Databricks and GPT-4o: From API To Dashboard

    Editor Times FeaturedBy Editor Times FeaturedDecember 26, 2025No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    , Databricks has shaken the information market as soon as once more. The corporate launched its free version of the Databricks platform with all of the functionalities included. It’s a nice useful resource for studying and testing, to say the least.

    With that in thoughts, I created an end-to-end mission that will help you studying the basics of the primary assets inside Databricks.

    This mission demonstrates a whole Extract, Rework, Load (ETL) workflow inside Databricks. It integrates the OpenWeatherMap API for information retrieval and the OpenAI GPT-4o-mini mannequin to offer personalised, weather-based dressing solutions.

    Let’s study extra about it.

    The Mission

    The mission implements a full information pipeline inside Databricks, following these steps.

    1. Extract: Fetches present climate information for New York Metropolis through the OpenWeatherMap API [1].
    2. Rework: Converts UTC timestamps to New York native time and makes use of OpenAI’s [2] GPT-4o-mini to generate personalised dressing solutions based mostly on the temperature.
    3. Load: Persists the information into the Databricks Unity Catalog as each uncooked JSON recordsdata and a structured Delta desk (Silver Layer).
    4. Orchestration: The pocket book with this ETL code is added to a job and scheduled to run each 1 hour in Databricks.
    5. Analytics: The silver layer feeds a Databricks Dashboard that shows related climate info alongside the LLM’s solutions.

    Right here is the structure.

    Mission Structure. Picture by the writer.

    Nice. Now that we perceive what we have to do, let’s transfer on with the how piece of this tutorial.

    Word: should you nonetheless don’t have an account in Databricks, go to Databricks Free Version web page [3], click on Join Free Version and observe the prompts on display screen to get your free entry.

    Extract: Integrating API And Databricks

    As I normally say, an information mission wants information to start, proper? So our activity right here is integrating OpenWeatherMap API to ingest information straight right into a PySpark pocket book inside Databricks. This activity might look difficult at first, however belief me, it’s not.

    On Databricks’ preliminary web page, create a brand new pocket book utilizing the +New button, then choose Pocket book.

    Create a brand new Pocket book. Picture by the writer.

    For the Extract half, we are going to want:

    1. The API Key from the API OpenWeatherMap.

    To get that, go to the API’s signup page and full your free registration course of. As soon as logged in to the dashboard, click on on the API Key tab, the place it is possible for you to to see it.

    2. Import packages

    # Imports
    import requests
    import json

    Subsequent, we’re going to create a Python class to modularize our code and make it production-ready as effectively.

    • This class receives the API_KEY we simply created, in addition to town and nation for the climate fetch.
    • Returns the response in JSON format.
    # Creating a category to modularize our code
    
    class Climate:
        
        # Outline the constructor
        def __init__(self, API_KEY):
            self.API_KEY = API_KEY
    
        # Outline a technique to retrieve climate information
        def get_weather(self, metropolis, nation, items='imperial'):
            self.metropolis = metropolis
            self.nation = nation
            self.items = items
    
            # Make a GET request to an API endpoint that returns JSON information
            url = f"https://api.openweathermap.org/information/2.5/climate?q={metropolis},{nation}&APPID={w.API_KEY}&items={items}"
            response = requests.get(url)
    
            # Use the .json() methodology to parse the response textual content and return
            if response.status_code != 200:
                increase Exception(f"Error: {response.status_code} - {response.textual content}")
            return response.json()

    Good. Now we will run this class. Discover we use dbutils.widgets.get(). This command appears to be like on the Parameters within the scheduled job, which we are going to see later on this article. It’s a greatest apply to maintain the secrets and techniques protected.

    # Get the API OpenWeatherMap key
    API_KEY = dbutils.widgets.get('API_KEY')
    
    # Instantiate the category
    w = Climate(API_KEY=API_KEY)
    
    # Get the climate information
    nyc = w.get_weather(metropolis='New York', nation='US')
    nyc

    Right here is the response.

    {'coord': {'lon': -74.006, 'lat': 40.7143},
     'climate': [{'id': 804,
       'main': 'Clouds',
       'description': 'overcast clouds',
       'icon': '04d'}],
     'base': 'stations',
     'principal': {'temp': 54.14,
      'feels_like': 53.44,
      'temp_min': 51.76,
      'temp_max': 56.26,
      'stress': 992,
      'humidity': 89,
      'sea_level': 992,
      'grnd_level': 993},
     'visibility': 10000,
     'wind': {'pace': 21.85, 'deg': 270, 'gust': 37.98},
     'clouds': {'all': 100},
     'dt': 1766161441,
     'sys': {'kind': 1,
      'id': 4610,
      'nation': 'US',
      'dawn': 1766146541,
      'sundown': 1766179850},
     'timezone': -18000,
     'id': 5128581,
     'title': 'New York',
     'cod': 200}

    With that response in hand, we will transfer on to the Transformation a part of our mission, the place we are going to clear and rework the information.

    Rework: Formatting The Information

    On this part, we are going to take a look at the clear and rework duties carried out over the uncooked information. We are going to begin by choosing the items of information wanted for our dashboard. That is merely getting information from a dictionary (or a JSON).

    # Getting info
    id = nyc['id']
    timestamp = nyc['dt']
    climate = nyc['weather'][0]['main']
    temp = nyc['main']['temp']
    tmin = nyc['main']['temp_min']
    tmax = nyc['main']['temp_max']
    nation = nyc['sys']['country']
    metropolis = nyc['name']
    dawn = nyc['sys']['sunrise']
    sundown = nyc['sys']['sunset']

    Subsequent, let’s rework the timestamps to the New York time zone, because it comes with Greenwich time.

    # Rework dawn and sundown to datetime in NYC timezone
    from datetime import datetime, timezone
    from zoneinfo import ZoneInfo
    import time
    
    # Timestamp, Dawn and Sundown to NYC timezone
    target_timezone = ZoneInfo("America/New_York")
    dt_utc = datetime.fromtimestamp(dawn, tz=timezone.utc)
    sunrise_nyc = str(dt_utc.astimezone(target_timezone).time()) # get solely dawn time time
    dt_utc = datetime.fromtimestamp(sundown, tz=timezone.utc)
    sunset_nyc = str(dt_utc.astimezone(target_timezone).time()) # get solely sundown time time
    dt_utc = datetime.fromtimestamp(timestamp, tz=timezone.utc)
    time_nyc = str(dt_utc.astimezone(target_timezone))

    Lastly, we format it as a Spark dataframe.

    # Create a dataframe from the variables
    df = spark.createDataFrame([[id, time_nyc, weather, temp, tmin, tmax, country, city, sunrise_nyc, sunset_nyc]], schema=['id', 'timestamp','weather', 'temp', 'tmin', 'tmax', 'country', 'city', 'sunrise', 'sunset'])
    Information cleaned and reworked. Picture by the writer.

    The ultimate step on this part is including the suggestion from an LLM. On this step, we’re going to choose a few of the information fetched from the API and go it to the mannequin, asking it to return a suggestion of how an individual may gown to be ready for the climate.

    • You’ll need an OpenAI API Key.
    • Go the climate situation, max and min temperatures (climate, tmax, tmin)
    • Ask the LLM to return a suggestion about methods to gown for the climate.
    • Add the suggestion to the ultimate dataframe.
    %pip set up openai --quiet
    from openai import OpenAI
    import pyspark.sql.features as F
    from pyspark.sql.features import col
    
    # Get OpenAI Key
    OPENAI_API_KEY= dbutils.widgets.get('OPENAI_API_KEY')
    
    shopper = OpenAI(
        # That is the default and could be omitted
        api_key=OPENAI_API_KEY
    )
    
    response = shopper.responses.create(
        mannequin="gpt-4o-mini",
        directions="You're a weatherman that offers solutions about methods to gown based mostly on the climate. Reply in a single sentence.",
        enter=f"The climate is {climate}, with max temperature {tmax} and min temperature {tmin}. How ought to I gown?"
    )
    
    suggestion = response.output_text
    
    # Add the suggestion to the df
    df = df.withColumn('suggestion', F.lit(suggestion))
    show(df)

    Cool. We’re virtually carried out with the ETL. Now it’s all about loading it. That’s the following part.

    Load: Saving the Information and Creating the Silver Layer

    The final piece of the ETL is loading the information. We are going to load it in two alternative ways.

    1. Persisting the uncooked recordsdata in a Unity Catalog Quantity.
    2. Saving the reworked dataframe straight into the silver layer, which is a Delta Desk prepared for the Dashboard consumption.

    Let’s create a catalog that can maintain all of the climate information that we get from the API.

    -- Making a Catalog
    CREATE CATALOG IF NOT EXISTS pipeline_weather
    COMMENT 'That is the catalog for the climate pipeline';

    Subsequent, we create a schema for the Lakehouse. This one will retailer the amount with the uncooked JSON recordsdata fetched.

    -- Making a Schema
    CREATE SCHEMA IF NOT EXISTS pipeline_weather.lakehouse
    COMMENT 'That is the schema for the climate pipeline';

    Now, we create the amount for the uncooked recordsdata.

    -- Let's create a quantity
    CREATE VOLUME IF NOT EXISTS pipeline_weather.lakehouse.raw_data
    COMMENT 'That is the uncooked information quantity for the climate pipeline';

    We additionally create one other schema to carry the Silver Layer Delta Desk.

    --Creating Schema to carry reworked information
    CREATE SCHEMA IF NOT EXISTS pipeline_weather.silver
    COMMENT 'That is the schema for the climate pipeline';

    As soon as we now have every part arrange, that is how our Catalog appears to be like.

    Catalog able to obtain information. Picture by the writer.

    Now, let’s save the uncooked JSON response into our Uncooked Quantity. To maintain every part organized and stop overwriting, we’ll connect a novel timestamp to every filename.

    By appending these recordsdata to the amount slightly than simply overwriting them, we’re making a dependable “audit path”. This acts as a security internet, which means that if a downstream course of fails or we run into information loss later, we will at all times return to the supply and re-process the unique information every time we want it.

    # Get timestamp
    stamp = datetime.now().strftime('%Y-%m-%d_percentH-%M-%S')
    
    # Path to save lots of
    json_path = f'/Volumes/pipeline_weather/lakehouse/raw_data/weather_{stamp}.json'
    
    # Save the information right into a json file
    df.write.mode('append').json(json_path)

    Whereas we hold the uncooked JSON as our “supply of reality,” saving the cleaned information right into a Delta Desk within the Silver layer is the place the actual magic occurs. By utilizing .mode(“append”) and the Delta format, we guarantee our information is structured, schema-enforced, and prepared for high-speed analytics or BI instruments. This layer transforms messy API responses right into a dependable, queryable desk that grows with each pipeline run.

    # Save the reworked information right into a desk (schema)
    (
        df
        .write
        .format('delta')
        .mode("append")
        .saveAsTable('pipeline_weather.silver.climate')
    )

    Stunning! With this all set, let’s test how our desk appears to be like now.

    Silver Layer Desk. Picture by the writer.

    Let’s begin automating this pipeline now.

    Orchestration: Scheduling the Pocket book to Run Routinely

    Shifting on with the mission, it’s time to make this pipeline run by itself, with minimal supervision. For that, Databricks has the Jobs & Pipelines tab, the place it’s straightforward we will schedule jobs to run.

    1. Click on the Jobs & Pipelines tab on the left panel
    2. Discover the button Create and choose Job
    3. Click on on Pocket book so as to add it to the Job.
    4. Configure like the information beneath.
    5. Add the API Keys to the Parameters.
    6. Click on Create activity.
    7. Click on Run Now to check if it really works.
    Including a Pocket book to the Job. Picture by the writer

    When you click on the Run Now button, it ought to begin operating the pocket book and show the Succeeded message.

    Jobs ran. Picture by the writer.

    If the job is working high-quality, it’s time to schedule it to run robotically.

    1. Click on on Add set off on the best facet of the display screen, proper beneath the part Schedules & Triggers.
    2. Set off kind = Scheduled.
    3. Schedule kind: choose Superior
    4. Choose Each 1 hour from the drop-downs.
    5. Reserve it.

    Glorious. Our Pipeline is on auto-mode now! Each hour, the system will hit the OpenWeatherMap API and get contemporary climate info for NYC and put it aside to our Silver Layer Desk.

    Analytics: Constructing a Dashboard for Information-Pushed Selections

    The final piece of this puzzle is creating the Analytics deliverable, which is able to present the climate info and supply the person with actionable details about methods to gown for the climate outdoors.

    1. Click on on the Dashboards tab on the left facet panel.
    2. Click on on the Create dashboard button
    3. It can open a clean canvas for us to work on.
    Dashboard began. Picture by the writer.

    Now dashboards work based mostly on information fetched from SQL queries. Due to this fact, earlier than we begin including textual content and graphics to the canvas, first we have to create some metrics that would be the variables to feed the dashboard playing cards and graphics.

    So, click on on the +Create from SQL button to start out a metric. Give it a reputation. For instance, Location, to retrieve the most recent fetched metropolis title, I have to use this question that follows.

    -- Get the most recent metropolis title fetched
    SELECT metropolis
    FROM pipeline_weather.silver.climate
    ORDER BY timestamp DESC
    LIMIT 1

    And we should create one SQL question for every metric. You possibly can see all of them within the GitHub repository [ ].

    Subsequent, we click on on the Dashboard tab and begin dragging and dropping parts to the canvas.

    Dashboard creation parts menu. Picture by the writer.

    When you click on on the Textual content, it enables you to insert a field into the canvas and edit the textual content. While you click on on the graphic factor, it inserts a placeholder for a graphic, and opens the best facet menu for number of the variables and configuration.

    Interacting with Dashboards in Databricks. Picture by the writer.

    Okay. In any case the weather are added, the dashboard will seem like this.

    Accomplished Dashboard. Picture by the writer.

    So good! And that concludes our mission.

    Earlier than You Go

    You possibly can simply replicate this mission in about an hour, relying in your expertise with the Databricks ecosystem. Whereas it’s a fast construct, it packs rather a lot by way of the core engineering abilities you’ll get to train:

    • Architectural Design: You’ll learn to construction a contemporary Lakehouse setting from the bottom up.
    • Seamless Information Integration: You’ll bridge the hole between exterior net APIs and the Databricks platform for real-time information ingestion.
    • Clear, Modular Code: We transfer past easy scripts through the use of Python lessons and features to maintain the codebase organized and maintainable.
    • Automation & Orchestration: You’ll get hands-on expertise scheduling jobs to make sure your mission runs reliably on autopilot.
    • Delivering Actual Worth: The objective isn’t simply to maneuver information; it’s to offer worth. By reworking uncooked climate metrics into actionable dressing solutions through AI, we flip “chilly information” right into a useful service for the top person.

    For those who appreciated this content material, discover my contacts and extra about me in my web site.

    https://gustavorsantos.me

    GitHub Repository

    Right here is the repository for this mission.

    https://github.com/gurezende/Databricks-Weather-Pipeline

    References

    [1. OpenWeatherMap API] (https://openweathermap.org/)

    [2. Open Ai Platform] (https://platform.openai.com/)

    [3. Databricks Free Edition] (https://www.databricks.com/learn/free-edition)

    [4. GitHub Repository] (https://github.com/gurezende/Databricks-Weather-Pipeline)



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Proxy-Pointer RAG — Structure-Aware Document Comparison at Enterprise Scale

    May 16, 2026

    Why My Coding Assistant Started Replying in Korean When I Typed Chinese

    May 15, 2026

    From Raw Data to Risk Classes

    May 15, 2026

    How I Continually Improve My Claude Code

    May 15, 2026

    Stop Evaluating LLMs with “Vibe Checks”

    May 15, 2026

    I Let CodeSpeak Take Over My Repository

    May 14, 2026

    Comments are closed.

    Editors Picks

    2026 Rivian R1S review: Balanced luxury electric SUV

    May 16, 2026

    Old Oil and Gas Wells Could Find Second Life Producing Clean Energy

    May 16, 2026

    How tech companies are using open source initiatives to achieve critical strategic goals and how such efforts are reshaping industries like AI, AVs, and more (Bill Gurley/Bill’s Substack)

    May 16, 2026

    Today’s NYT Wordle Hints, Answer and Help for May 16 #1792

    May 16, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Best Smart Locks of 2025: High-Tech Door Defenses

    November 25, 2025

    Fall asleep peacefully with Jabees Peace Pillow bone conduction speaker

    April 19, 2025

    German PropTech startup Arbio raises €31 million to build Europe’s first AI-native platform for holiday rentals

    October 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.