feeling a continuing sense of AI FOMO. Every single day, I see individuals sharing AI ideas, new brokers and abilities they constructed, and vibe-coded apps. I’m more and more realizing that adapting rapidly to AI is changing into a requirement for staying aggressive as an information scientist right now.
However I’m not solely speaking about brainstorming with ChatGPT, producing code with Cursor, or sharpening a report with Claude. The larger shift is that AI can now take part in a way more end-to-end information science workflow.
To make the thought concrete, I attempted it on an actual mission utilizing my Apple Well being information.
A Easy Instance — Apple Well being Evaluation
Context
I’ve been carrying an Apple Watch every single day since 2019 to trace my well being information, corresponding to coronary heart price, vitality burned, sleep high quality, and so forth. This information incorporates years of behavioral indicators about my every day life, however the Apple Well being app principally surfaces it with easy development views.
I attempted to research a two-year Apple Well being export six years in the past. Nevertheless it ended up changing into a kind of facet initiatives that you simply by no means completed… My aim this time is to extract extra insights from the uncooked information rapidly with the assistance of AI.
What I needed to work with
Listed here are the related assets I’ve:
- Uncooked Apple Well being export information: 1.85GB in XML, uploaded to my Google Drive.
- Pattern code to parse the uncooked export to structured datasets in my GitHub repo from six years in the past. However the code could possibly be outdated.
Workflow with out AI
A typical workflow with out AI would look loads like what I attempted six years in the past: Examine the XML construction, write Python to parse it into structured native datasets, conduct EDA with Pandas and Numpy, and summarize the insights.
I’m positive each information scientist is conversant in this course of — it’s not rocket science, nevertheless it takes time to construct. To get to a elegant insights report, it will take a minimum of a full day. That’s why that 6-year-old repo remains to be marked as WIP…
AI end-to-end workflow
My up to date workflow with AI is:
- AI locates the uncooked information in my Google Drive and downloads it.
- AI references my outdated GitHub code and writes a Python script to parse the uncooked information.
- AI uploads the parsed datasets to Google BigQuery. After all, the evaluation may be performed domestically with out BigQuery, however I set it up this strategy to higher resemble an actual work surroundings.
- AI runs SQL queries towards BigQuery to conduct the evaluation and compile an evaluation report.
Primarily, AI handles practically each step from information engineering to evaluation, with me performing extra as a reviewer and decision-maker.
AI-generated report
Now, let’s see what Codex was in a position to generate with my steerage and a few back-and-forth in half-hour, excluding the time to arrange the surroundings and tooling.
I selected Codex as a result of I primarily use Claude Code at work, so I wished to discover a distinct device. I used this opportunity to arrange my Codex surroundings from scratch so I can higher consider all the trouble required.
You may see that this report is nicely structured and visually polished. It summarized invaluable insights into annual tendencies, train consistency, and the impression of journey on exercise ranges. It additionally supplied suggestions and acknowledged limitations and assumptions. What impressed me most was not simply the pace, however how rapidly the output started to appear to be a stakeholder-facing evaluation as an alternative of a tough pocket book.
Please be aware that the report is sanitized for my information privateness.



How I Truly Did It
Now that we have now seen the spectacular work AI can generate in half-hour, let me break it down and present you all of the steps I took to make it occur. I used Codex for this experiment. Like Claude Code, it could run within the desktop app, an IDE, or the CLI.
1. Arrange MCP
To allow Codex to entry instruments, together with Google Drive, GitHub, and Google BigQuery, the subsequent step was to arrange Mannequin Context Protocol (MCP) servers.
The simplest strategy to arrange MCP is to ask Codex to do it for you. For instance, after I requested it to arrange Google Drive MCP, it configured my native recordsdata rapidly with clear subsequent steps on tips on how to create an OAuth consumer within the Google Cloud Console.
It doesn’t all the time succeed on the primary attempt, however persistence helps. Once I requested it to arrange BigQuery MCP, it failed a minimum of 10 instances earlier than the connection succeeded. However every time, it supplied me with clear directions on tips on how to take a look at it and what data was useful for troubleshooting.


2. Make a plan with the Plan Mode
After establishing the MCPs, I moved to the precise mission. For an advanced mission that entails a number of information sources/instruments/questions, I often begin with the Plan Mode to choose the implementation steps. In each Claude Code and Codex, you possibly can allow Plan Mode with /plan. It really works like this: you define the duty and your tough plan, the mannequin asks clarifying questions and proposes a extra detailed implementation plan so that you can overview and refine. Within the screenshots beneath, you could find my first iteration with it.



3. Execution and iteration
After I hit “Sure, implement this plan”, Codex began executing by itself, following the steps. It labored for 13 minutes and generated the primary evaluation beneath. It moved quick throughout totally different instruments, nevertheless it did the evaluation domestically because it encountered extra points with the BigQuery MCP. After one other spherical of troubleshooting, it was in a position to add the datasets and run queries in BigQuery correctly.

Nevertheless, the first-pass output was nonetheless shallow, so I guided it to go deeper with follow-up questions. For instance, I’ve flight tickets and journey plans from previous travels in my Google Drive. I requested it to search out them and analyze my exercise patterns throughout journeys. It efficiently situated these recordsdata, extracted my journey days, and ran the evaluation.
After a couple of iterations, it was in a position to generate a way more complete report, as I shared at first, inside half-hour. Yow will discover its code here. That was most likely some of the necessary classes from the train: AI moved quick, however depth nonetheless got here from iteration and higher questions.

Takeaways for Knowledge Scientists
What AI Modifications
Above is a small instance of how I used Codex and MCPs to run an end-to-end evaluation with out manually writing a single line of code. What are the takeaways for information scientists at work?
- Assume past coding help. Quite than utilizing AI just for coding and writing, it’s value increasing its position throughout the total information science lifecycle. Right here, I used AI to find uncooked information in Google Drive and add parsed datasets to BigQuery. There are a lot of extra AI use circumstances associated to information pipelining and mannequin deployment.
- Context turns into a power multiplier. MCPs are what made this workflow rather more highly effective. Codex scanned my Google Drive to find my journey dates and skim my outdated GitHub code to search out pattern parsing code. Equally, you possibly can allow different company-approved MCPs to assist your AI (and your self) higher perceive the context. For instance:
– Connect with Slack MCP and Gmail MCP to seek for previous related conversations.
– Use Atlassian MCP to entry the desk documentation on Confluence.
– Arrange Snowflake MCP to discover the info schema and run queries. - Guidelines and reusable abilities matter. Though I didn’t display it explicitly on this instance, you need to customise guidelines and create abilities to information your AI and lengthen its capabilities. These matters are value their very own article subsequent time 🙂
How the Function of Knowledge Scientists Will Evolve
However does this imply AI will substitute information scientists? This instance additionally sheds mild on how information scientists’ roles will pivot sooner or later.
- Much less handbook execution, extra problem-solving. Within the instance above, the preliminary evaluation Codex generated was very primary. The standard of AI-generated evaluation relies upon closely on the standard of your downside framing. You might want to outline the query clearly, break it into actionable duties, determine the proper method, and push the evaluation deeper.
- Area information is vital. Area information remains to be very a lot required to interpret outcomes accurately and supply suggestions. For instance, AI observed my exercise stage had declined considerably since 2020. It couldn’t discover a convincing rationalization, however mentioned: “Doable causes embody routine adjustments, work schedule, life-style shifts, damage, motivation, or much less structured coaching, however these are inferences, not findings.” However the actual cause behind it, as you might need realized, is the pandemic. I began working from house in early 2020, so naturally, I burned fewer energy. This can be a quite simple instance of why area information nonetheless issues — even when AI can entry all of the previous docs in your organization, it doesn’t imply it’ll perceive all of the enterprise nuances, and that’s your aggressive benefit.
- This instance was comparatively simple, however there are nonetheless many lessons of labor the place I might not belief AI to function independently right now, particularly initiatives that require stronger technical and statistical judgment, corresponding to causal inference.
Necessary Caveats
Final however not least, there are some issues you will have to bear in mind whereas utilizing AI:
- Knowledge safety. I’m positive you’ve heard this many instances already, however let me repeat it as soon as extra. The info safety danger of utilizing AI is actual. For a private facet mission, I can set issues up nonetheless I need and take my very own danger (truthfully, granting AI full entry to Google Drive seems like a dangerous transfer, so that is extra for illustration functions). However at work, all the time comply with your organization’s steerage on which instruments are protected to make use of and the way. And ensure to learn via each single command earlier than clicking “approve”.
- Double-check the code. For my easy mission, AI can write correct SQL with out issues. However in additional difficult enterprise settings, I nonetheless see AI make errors in its code now and again. Typically, it joins tables with totally different granularities, inflicting fanning out and double-counting. Different instances, it misses vital filters and circumstances.
- AI is handy, nevertheless it may accomplish your ask with surprising negative effects… Let me inform you a shaggy dog story to finish this text. This morning, I turned on my laptop computer and noticed an alert of no disk storage left — I’ve a 512GB SSD MacBook Professional, and I used to be fairly positive I had solely used round half of the storage. Since I used to be taking part in with Codex final evening, it grew to become my first suspect. So I truly requested it, “hey did you do something? My ‘system information’ had grown by 150GB in a single day”. It responded, “No, Codex solely takes xx MB”. Then I dug up my recordsdata and noticed a 142GB “bigquery-mcp-wrapper.log”… Possible, Codex arrange this log when it was troubleshooting the BigQuery MCP setup. Later within the precise evaluation process, it exploded into an enormous file. So sure, this magical wishing machine comes at a value.
This expertise summed up the tradeoff nicely for me: AI can dramatically compress the gap between uncooked information and helpful evaluation, however getting probably the most out of it nonetheless requires judgment, oversight, and a willingness to debug the workflow itself.

