Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • 4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers
    • Metajets use light propulsion for future space travel
    • Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)
    • Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed
    • Champions League Soccer: Stream Atletico Madrid vs. Arsenal Live
    • Ensembles of Ensembles of Ensembles: A Guide to Stacking
    • This region in space poses the greatest danger in our Solar System
    • Practical info and special tips for the EU-Startups Summit 2026 in Malta – look inside!
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, April 29
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Is Your Model Time-Blind? The Case for Cyclical Feature Encoding
    Artificial Intelligence

    Is Your Model Time-Blind? The Case for Cyclical Feature Encoding

    Editor Times FeaturedBy Editor Times FeaturedDecember 25, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    : The Midnight Paradox

    Think about this. You’re constructing a mannequin to foretell electrical energy demand or taxi pickups. So, you feed it time (akin to minutes) beginning at midnight. Clear and easy. Proper?

    Now your mannequin sees 23:59 (minute 1439 within the day) and 00:01 (minute 1 within the day). To you, they’re two minutes aside. To your mannequin, they’re very far aside. That’s the midnight paradox. And sure, your mannequin might be time-blind.

    Why does this occur?

    As a result of most machine studying fashions deal with numbers as straight traces, not circles.

    Linear regression, KNN, SVMs, and even neural networks will deal with numbers logically, assuming increased numbers are “extra” than decrease ones. They don’t know that point wraps round. Midnight is the sting case they by no means forgive.

    In the event you’ve ever added hourly data to your mannequin with out success, questioning later why your mannequin struggles round day boundaries, that is possible why.

    The Failure of Normal Encoding

    Let’s speak in regards to the normal approaches. You’ve most likely used not less than considered one of them.

    You encode hours as numbers from 0 to 23. Now there’s a man-made cliff between hour 23 and hour 0. Thus, this mannequin thinks midnight is the most important bounce of the day. Nonetheless, is midnight actually extra totally different from 11 PM than 10 PM is from 9 PM?

    In fact not. However your mannequin doesn’t know that.

    Right here’s the hours illustration once they’re within the “linear” mode.

    # Generate knowledge
    date_today = pd.to_datetime('immediately').normalize()
    datetime_24_hours = pd.date_range(begin=date_today, durations=24, freq='h')
    df = pd.DataFrame({'dt': datetime_24_hours})
    df['hour'] = df['dt'].dt.hour	
    
    # Calculate Sin and Cosine
    df["hour_sin"] = np.sin(2 * np.pi * df["hour"] / 24)
    df["hour_cos"] = np.cos(2 * np.pi * df["hour"] / 24)
    
    # Plot the Hours in Linear mode
    plt.determine(figsize=(15, 5))
    plt.plot(df['hour'], [1]*24, linewidth=3)
    plt.title('Hours in Linear Mode')
    plt.xlabel('Hour')
    plt.xticks(np.arange(0, 24, 1))
    plt.ylabel('Worth')
    plt.present()
    Hours within the Linear Mode. Picture by the creator.

    What if we one-hot encode the hours? Twenty-four binary columns. Downside solved, proper? Properly… partially. You mounted the substitute hole, however you misplaced proximity. 2 AM is now not nearer to three AM than to 10 PM.
    You additionally exploded dimensionality. For bushes, that’s annoying. For linear fashions, it’s most likely inefficient.

    So, let’s transfer on to a possible different.

    • The Answer: Trigonometric Mapping

    Right here’s the mindset shift:

    Cease interested by time as a line. Give it some thought as a circle.

    A 24-hour day loops again to itself. So your encoding ought to loop too, pondering in circles. Every hour is an evenly spaced level on a circle. Now, to signify a degree on a circle, you don’t use one quantity, however as an alternative you utilize two coordinates: x and y.

    That’s the place sine and cosine are available.

    The geometry behind it

    Each angle on a circle could be mapped to a novel level utilizing sine and cosine. This offers your mannequin a easy, steady illustration of time.

    plt.determine(figsize=(5, 5))
    plt.scatter(df['hour_sin'], df['hour_cos'], linewidth=3)
    plt.title('Hours in Cyclical Mode')
    plt.xlabel('Hour')
    Hours in cyclcical mode after sine and cosine. Picture by the creator.

    Right here’s the maths components to calculate cycles for hours of the day:

    • First, 2 * π * hour / 24 converts every hour into an angle. Midnight and 11 PM find yourself nearly on the identical place on the circle.
    • Then sine and cosine challenge that angle into two coordinates.
    • These two values collectively uniquely outline the hour. Now 23:00 and 00:00 are shut in characteristic area. Precisely what you wished all alongside.

    The identical concept works for minutes, days of the week, or months of the 12 months.

    Code

    Let’s experiment with this dataset Home equipment Power Prediction [4]. We’ll attempt to enhance the prediction utilizing a Random Forest Regressor mannequin (a tree-based mannequin).

    Candanedo, L. (2017). Home equipment Power Prediction [Dataset]. UCI Machine Studying Repository. https://doi.org/10.24432/C5VC8G. Inventive Commons 4.0 License.

    # Imports
    from sklearn.ensemble import RandomForestRegressor
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import root_mean_squared_error
    from ucimlrepo import fetch_ucirepo 

    Get knowledge.

    # fetch dataset 
    appliances_energy_prediction = fetch_ucirepo(id=374) 
      
    # knowledge (as pandas dataframes) 
    X = appliances_energy_prediction.knowledge.options 
    y = appliances_energy_prediction.knowledge.targets 
      
    # To Pandas
    df = pd.concat([X, y], axis=1)
    df['date'] = df['date'].apply(lambda x: x[:10] + ' ' + x[11:])
    df['date'] = pd.to_datetime(df['date'])
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df['hour'] = df['date'].dt.hour
    df.head(3)

    Let’s create a fast mannequin with the linear time first, as our baseline for comparability.

    # X and y
    # X = df.drop(['Appliances', 'rv1', 'rv2', 'date'], axis=1)
    X = df[['hour', 'day', 'T1', 'RH_1', 'T_out', 'Press_mm_hg', 'RH_out', 'Windspeed', 'Visibility', 'Tdewpoint']]
    y = df['Appliances']
    
    # Prepare Check Cut up
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Match the mannequin
    lr = RandomForestRegressor().match(X_train, y_train)
    
    # Rating
    print(f'Rating: {lr.rating(X_train, y_train)}')
    
    # Check RMSE
    y_pred = lr.predict(X_test)
    rmse = root_mean_squared_error(y_test, y_pred)
    print(f'RMSE: {rmse}')

    The outcomes are right here.

    Rating: 0.9395797670166536
    RMSE: 63.60964667197874

    Subsequent, we’ll encode the cyclical time elements (day and hour) and retrain the mannequin.

    # Add cyclical hours sin and cosine
    df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
    df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
    df['day_sin'] = np.sin(2 * np.pi * df['day'] / 31)
    df['day_cos'] = np.cos(2 * np.pi * df['day'] / 31)
    
    # X and y
    X = df[['hour_sin', 'hour_cos', 'day_sin', 'day_cos','T1', 'RH_1', 'T_out', 'Press_mm_hg', 'RH_out', 'Windspeed', 'Visibility', 'Tdewpoint']]
    y = df['Appliances']
    
    # Prepare Check Cut up
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Match the mannequin
    lr_cycle = RandomForestRegressor().match(X_train, y_train)
    
    # Rating
    print(f'Rating: {lr_cycle.rating(X_train, y_train)}')
    
    # Check RMSE
    y_pred = lr_cycle.predict(X_test)
    rmse = root_mean_squared_error(y_test, y_pred)
    print(f'RMSE: {rmse}')

    And the outcomes. We’re seeing an enchancment of 1% within the rating and 1 level within the RMSE.

    Rating: 0.9416365489096074
    RMSE: 62.87008070927842

    I’m positive this doesn’t appear like a lot, however let’s do not forget that this toy instance is utilizing a easy out-of-the-box mannequin with none knowledge remedy or cleanup. We’re seeing principally the impact of the sine and cosine transformation.

    What’s actually taking place right here is that, in actual life, electrical energy demand doesn’t reset at midnight. And now your mannequin lastly sees that continuity.

    Why You Want Each Sine and Cosine

    Don’t fall into the temptation of utilizing solely sine, because it feels sufficient. One column as an alternative of two. Cleaner, proper?

    Sadly, it breaks symmetry. On a 24-hour clock, 6 AM and 6 PM can produce the identical sine worth. Completely different occasions with equivalent encoding could be unhealthy as a result of the mannequin now confuses morning rush hour with night rush hour. Thus, not ultimate until you get pleasure from confused predictions.

    Utilizing each sine and cosine fixes this. Collectively, they offer every hour a novel fingerprint on the circle. Consider it like latitude and longitude. You want each to know the place you might be.

    Actual-World Impression & Outcomes

    So, does this truly assist fashions? Sure. Particularly sure ones.

    Distance-based fashions

    KNN and SVMs rely closely on distance calculations. Cyclical encoding prevents faux “lengthy distances” at boundaries. Your neighbors truly develop into neighbors once more.

    Neural networks

    Neural networks be taught quicker with easy characteristic areas. Cyclical encoding removes sharp discontinuities at midnight. That often means quicker convergence and higher stability.

    Tree-based fashions

    Gradient Boosted Timber like XGBoost or LightGBM can ultimately be taught these patterns. Cyclical encoding offers them a head begin. In the event you care about efficiency and interpretability, it’s price it.

    7. When Ought to You Use This?

    All the time ask your self the query: Does this characteristic repeat in a cycle? If sure, think about cyclical encoding.

    Widespread examples are:

    • Hour of day
    • Day of week
    • Month of 12 months
    • Wind path (levels)
    • If it loops, you may attempt encoding it like a loop.

    Earlier than You Go

    Time is not only a quantity. It’s a coordinate on a circle.

    In the event you deal with it like a straight line, your mannequin can stumble at boundaries and have a tough time understanding that variable as a cycle, one thing that repeats and has a sample.

    Cyclical encoding with sine and cosine fixes this elegantly, preserving proximity, lowering artifacts, and serving to fashions be taught quicker.

    So subsequent time your predictions look bizarre round day modifications, do that new device you’ve realized, and let it make your mannequin shine because it ought to.

    In the event you favored this content material, discover extra of my work and my contacts at my web site.

    https://gustavorsantos.me

    GitHub Repository

    Right here’s the entire code of this train.

    https://github.com/gurezende/Time-Series/tree/main/Sine%20Cosine%20Time%20Encode

    References & Additional Studying

    [1. Encoding hours Stack Exchange]: https://stats.stackexchange.com/questions/451295/encoding-cyclical-feature-minutes-and-hours

    [2. NumPy trigonometric functions]: https://numpy.org/doc/stable/reference/routines.math.html

    [3. Practical discussion on cyclical features]:
    https://www.kaggle.com/code/avanwyk/encoding-cyclical-features-for-deep-learning

    [4. Appliances Energy Prediction Dataset] https://archive.ics.uci.edu/dataset/374/appliances+energy+prediction



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

    April 29, 2026

    Ensembles of Ensembles of Ensembles: A Guide to Stacking

    April 29, 2026

    How AI Policy in South Africa Is Ruining Itself

    April 29, 2026

    PyTorch NaNs Are Silent Killers — So I Built a 3ms Hook to Catch Them at the Exact Layer

    April 28, 2026

    Correlation Doesn’t Mean Causation! But What Does It Mean?

    April 28, 2026

    Let the AI Do the Experimenting

    April 28, 2026

    Comments are closed.

    Editors Picks

    4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

    April 29, 2026

    Metajets use light propulsion for future space travel

    April 29, 2026

    Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)

    April 29, 2026

    Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed

    April 29, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Google Will Use AI to Guess People’s Ages Based on Search History

    August 2, 2025

    How the Internet Broke Everyone’s Bullshit Detectors

    April 12, 2026

    OpenAI’s New Social Media App Will Be All AI Videos, Thanks to New Sora 2 Model

    September 30, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.