Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Today’s NYT Connections: Sports Edition Hints, Answers for April 20 #574
    • Will Humans Live Forever? AI Races to Defeat Aging
    • AI evolves itself to speed up scientific discovery
    • Australia’s privacy commissioner tried, in vain, to sound the alarm on data protection during the u16s social media ban trials
    • Nothing Phone (4a) Pro Review: A Close Second
    • Match Group CEO Spencer Rascoff says growing women’s share on Tinder is his “primary focus” to stem user declines; Sensor Tower says 75% of Tinder users are men (Kieran Smith/Financial Times)
    • Today’s NYT Connections Hints, Answers for April 20 #1044
    • AI Machine-Vision Earns Man Overboard Certification
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, April 20
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data
    Artificial Intelligence

    Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data

    Editor Times FeaturedBy Editor Times FeaturedJanuary 21, 2026No Comments11 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    . What a present to society that is. If not for google traits, how would we have now ever identified that more Disney movies released in the 2000s led to fewer divorces in the UK. Or that drinking Coca Cola is an unknown remedy for cat scratches.

    Wait, am I getting confused by correlation vs causation once more?

    In case you choose watching over studying, you are able to do so proper right here:

    Google Tendencies is among the most generally used instruments for analysing human behaviour at scale. Journalists use it. Information scientists use it. Complete papers are constructed on it. However there’s a basic property of Google Tendencies information that makes it very straightforward to misuse, particularly if you’re working with time collection or attempting to construct fashions, and most of the people by no means realise they’re doing it.

    All charts and screenshots are created by the writer except acknowledged in any other case.

    The Drawback with Google Tendencies Information

    Google doesn’t really publish figures on their search quantity. That data prints {dollars} for them and there’s no method they’d open that up for different folks to monetise. However what they do give us is a method to see a time collection, to know adjustments in folks’s searches of a specific time period and the best way they do that’s by giving us a normalised set of information.

    This doesn’t sound like an issue till you attempt to do some machine studying with it. As a result of relating to getting a machine to study something, we have to give it plenty of information.

    My preliminary concept was to seize a window of 5 years however I instantly have an issue: the bigger the time window, the much less granular the information. I couldn’t get every day information for 5 years and whereas I then thought “simply take the utmost time interval you will get every day information for and transfer that window”, that was an issue too. As a result of it was right here that I found the true terror of normalisation:

    No matter time interval I exploit or no matter single search time period I exploit, the information level with the best variety of searches is straight away set to 100. Meaning the which means of 100 adjustments with each window I exploit.

    This whole put up exists for that reason.

    Google Tendencies Fundamentals

    Now, I don’t know in case you’ve used Google Trends earlier than however in case you haven’t, I’m going to speak you thru it so we are able to get to the meat of the issue.

    So I’m going to look the phrase “motivation” and it’s going to default to the UK as a result of that’s the place I’m from and to the previous day and we have now a beautiful graph which exhibits how usually folks had been looking the phrase “motivation” within the final 24 hours.

    24 Hours of Motivation within the UK, Screenshot by Creator

    I really like this as a result of you’ll be able to see actually clearly that persons are principally trying to find motivation in the course of the working day, nobody is looking it when many of the nation is asleep and there’s positively a few children needing some encouragement for his or her homework. I don’t have a proof for the late night time searches however I’d form of guess these are folks not prepared to return to work tomorrow.

    Now that is pretty however whereas eight minute increments over 24 hours does give us a pleasant 180 information factors to make use of, most of them are literally zero and I don’t know if the previous 24 hours have been extremely demotivating in comparison with the remainder of the 12 months or if at present represents the 12 months’s highest GDP contribution, so I’m going to extend the window a bit of bit.

    The second we go to per week, the very first thing you discover is that the information is loads much less granular. We’ve per week of information however now it’s solely hourly and I nonetheless have the identical core drawback of not realizing how consultant this week is.

    I can preserve zooming out. 30 days, 90 days. At every level we lose granularity and don’t have anyplace close to as many information factors as we did for twenty-four hours. If I’m going to construct an precise mannequin, this isn’t going to chop it. I have to go huge.

    And once I choose 5 years is the place we’re going to come across the issue that motivated this complete video (excuse the pun, that was unintentional): I can’t get every day information. And in addition, why is at present not at 100 anymore?

    5 years of UK motivation searches, Screenshot by Creator

    Herein lies the actual drawback with google traits information

    As I discussed earlier, google traits information is normalised. Which means that no matter time interval I exploit or no matter single search time period I exploit, the information level with the best variety of searches is straight away set to 100. All the opposite factors are scaled down accordingly. If the first of April had half the searches of the utmost, then the first of April goes to have a google traits rating of fifty.

    So let’s look at an example here just to illustrate the point. Let’s take the months of May and June 2025, both 30 or 31 days so we have daily data here, we actually lose it beyond 90 days. If I look at May you can see we’re scaled so we hit 100 on the 13th and in June we hit it on the 10th. So does that mean motivation was searched just as often on the 10th of June as it was on the 13th of May?

    Google trends data for May, Screenshot by Author
    Google trends data for June, Screenshot by Author

    If I zoom out now so that I have May and June on the same graph, you can immediately see that that’s not the case. When both months are included we see that the searches for motivation had a google trends score of 83 on the 10th of June, meaning as a proportion of searches in the UK, it was 81% of the proportion of searches on the 13th May. If we didn’t zoom out, we wouldn’t have known that.

    May and June on the same graph, screenshot by Author

    Now all is not lost, we did get a good bit of information from this experiment because we know that we can see the relative difference between two data points if they’re both included in the same graph, so if we did load May and June separately, knowing 10th of June is 81% of 13th of May means we can scale June down accordingly and the data will be comparable.

    So that’s what I decided I’d do. I’d fetch my google trends data with a one day overlap on each window, so 1st of Jan to 31st of March, then 31st of March to 31st of July. Then I could use March 31st in both data sets to scale the second set to be comparable to the first.

    But while this is close to something we can use, there’s one more problem I need to make you aware of.

    Google Trends: Another Layer of Randomness

    So when it comes to google trends data, google isn’t actually tracking every single search. That would be a computational nightmare. Instead, Google makes use of sampling techniques so to build a representation of search volumes.

    This means that while the sample is likely very well-built, it is Google after all, each day will have some natural random variation. If by chance March 31st was a day where Google’s sample happened to be unusually high or low compared to the real world, our overlap method would introduce an error into our entire data set.

    On top of this, we also have to consider rounding. Google trends rounds everything to the nearest whole number. There’s no 50.5, it’s 50 or it’s 51. Now this seems like a small detail but it can actually become a big problem. Let me show you why.

    On the 4th of October 2021, there was a massive spike in searches for Facebook. This massive spike gets scaled to 100 and as a result everything else in that period is much closer to zero. When you’re rounding to the nearest whole number that tiny error of 0.5 suddenly becomes a huge proportional error when your number is only 1 or 2. This means that our solution has to be robust enough to handle noise, not just scaling.

    So how do we solve this? Well we know that on average the samples will be representative, so let’s just take a bigger sample. If we use a larger window to get our overlap, the random variation and rounding errors have less of an impact.

    So here’s the final plan. I know I can get daily data for up to 90 days. I’m going to load a rolling window of 90-day periods but I’ll make sure each window overlaps by a full month with the next. That way, our overlap isn’t just one potentially noisy day but a stable month-long anchor that we can use to scale our data more accurately.

    So it sounds like we’ve got a plan. I’ve got some concerns, mainly that by having lots of batches there’s going to be compounding errors and it could result in big numbers absolutely blowing up. But in order to see how this shakes out with real data we have to go and do it. So here’s one I made earlier.

    Writing Code to Figure Out Google Trends

    After writing up everything we’ve discussed in code form and, after having some fun getting temporarily banned from google trends for pulling too much data, I’ve put together some graphs. My immediate reaction when I saw this was: “Oh no, it blew up”.

    Those big spikes area little scary for our project, Image by Author

    The graph below shows my chained-together five years of search volumes for Facebook. You’ll see a pretty steady downward trend but two spikes stand out. The first of these was the massive spike on 4th October 2021 that we mentioned earlier.

    These spikes are even scarier, Image by Author

    My first thought was to verify the spikes. I, unironically, googled it and found out about widespread Meta outages that day. I pulled data for Instagram and Whatsapp over the same period and saw similar spikes. So I knew the spike was real but I still had a question: Was it too big?

    When I put my time series side-by-side with Google Trends’ own graph, my heart sank. My spikes were huge in comparison. I started thinking about how to handle this. Should I cap the maximum spike value? That felt arbitrary and would lose information about the relative sizes of spikes. Should I apply an arbitrary scaling factor? Again, it felt like a guess.

    Five years of Facebook searches on google trends, Screenshot by Author

    That was until I had a bolt of inspiration. Remember, Google Trends is giving us weekly data for this period, that’s the whole reason we’re doing this. What if I averaged my data for that week to see how it compared to Google’s weekly value?

    This is where I breathed a huge sigh of relief. That week was the biggest spike on Google Trends so set to 100. When I averaged my data for the same week, I got 102.8. Incredibly close to Google Trends. We also finish in about the same place. This means the compounding errors from my scaling method haven’t blown up my data. I have something that looks and behaves just like the Google Trends data!

    So now we have a robust methodology for creating a clean, comparable daily time series for any search term. Which is great. But what if we actually want to do something useful with it, like comparing search terms around the world for example?

    Because while Google Trends allows you to compare multiple search terms it doesn’t allow direct comparison of multiple countries. So I can grab a dataset of motivation for each country using the method we’ve discussed today, but how do I make them comparable? Facebook is part of the solution.

    But this solution is one for a later blog post, one in which we’re going to build a “basket of goods” to compare countries and see exactly how Facebook fits into all of this.

    So today we started with the question of whether we can model national motivation and in trying to do so immediately hit a wall. Because Google Trends daily data is misleading. Not due to an error, but by its very design. We’ve found a way to tackle that now, but in the life of a data scientist, there are always more problems lurking around the corner.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Will Humans Live Forever? AI Races to Defeat Aging

    April 20, 2026

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Comments are closed.

    Editors Picks

    Today’s NYT Connections: Sports Edition Hints, Answers for April 20 #574

    April 20, 2026

    Will Humans Live Forever? AI Races to Defeat Aging

    April 20, 2026

    AI evolves itself to speed up scientific discovery

    April 20, 2026

    Australia’s privacy commissioner tried, in vain, to sound the alarm on data protection during the u16s social media ban trials

    April 20, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    ‘Bridgerton’ Season 4, Part 2: Release Date and Time on Netflix

    February 22, 2026

    First UK phones to get satellite connectivity in signal blackspots announced

    October 30, 2025

    The New Surveillance State Is You

    December 29, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.