Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    • Efficient hybrid minivan delivers MPG
    • How Can Astronauts Tell How Fast They’re Going?
    • A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)
    • Double Dazzle: This Weekend, There Are 2 Meteor Showers in the Night Sky
    • asexual fish defy extinction with gene repair
    • The ‘Lonely Runner’ Problem Only Appears Simple
    • Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Become a Better Data Scientist with These Prompt Engineering Tips and Tricks
    Artificial Intelligence

    Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

    Editor Times FeaturedBy Editor Times FeaturedJuly 1, 2025No Comments11 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    I’m sharing with you my favourite prompts and immediate engineering ideas that assist me sort out Knowledge Science and AI duties.

    As Immediate Engineering is rising as a required ability in most job descriptions, I believed it will be helpful to share with you some ideas and tips to enhance your Knowledge Science workflows.

    We’re speaking right here about particular prompts for cleansing information, exploratory information evaluation, and have engineering.

    That is the first of a collection of 3 articles I’m going to write down about Immediate Engineering for Knowledge Science:

    • Half 1: Immediate Engineering for Planning, Cleansing, and EDA (this text)
    • Half 2: Immediate Engineering for Options, Modeling, and Analysis
    • Half 3: Immediate Engineering for Docs, DevOps, and Studying

    👉All of the prompts on this article can be found on the finish of this text as a cheat sheet 😉

    On this article:

    1. Why Immediate Engineering Is a Superpower for DSs
    2. The DS Lifecycle, Reimagined with LLMs
    3. Immediate Engineering for Planning, Cleansing, and EDA

    Why Immediate Engineering is a superpower for DSs

    I do know, Immediate Engineering sounds identical to a trending buzzword lately. I used to suppose that once I began listening to the time period.

    I’d see it in all places and suppose: it’s simply writing a immediate. Why are individuals so overhyped about it? What could possibly be so troublesome about it?

    After testing a number of prompts and watching a number of tutorials, I now perceive that it is among the most helpful (and in addition underestimated) expertise a knowledge scientist can purchase proper now.

    It’s already widespread to see within the job descriptions that immediate engineering is among the required expertise for the job.

    Mirror with me: how typically do you ask ChatGPT/Claude/your fav chatbot that will help you re-write code, clear information, or simply brainstorm a mission or some concepts you may have? And the way typically do you get helpful and significant, non-generical solutions?

    Immediate Engineering is the artwork (and science) of getting giant language fashions (LLMs) like GPT-4 or Claude to really do what you need, while you need it, in a means that is sensible in your workflow.

    As a result of right here’s the factor: LLMs are in all places now.
    In your notebooks.
    In your IDE.
    In your BI dashboards.
    In your code overview instruments.

    They usually’re solely getting higher.

    As information science work will get extra complicated—extra instruments, extra expectations, extra pipelines—with the ability to speak to AI in a exact, structured means turns into a critical benefit.

    I see immediate engineering as a superpower. Not only for junior of us attempting to hurry issues up, however for knowledgeable information scientists who need to work smarter.

    On this collection, I’ll present you ways immediate engineering can help you at each stage of the info science lifecycle—from brainstorming and cleansing, to modeling, analysis, documentation, and past.

    The DS lifecycle, reimagined with LLMs

    If you end up constructing a Knowledge Science or Machine Studying mission, it actually looks like a complete journey.

    From determining what downside you’re fixing, all the best way to creating a stakeholder perceive why your mannequin issues (with out displaying them a single line of code).

    Right here’s a typical DS lifecycle:

    • You plan & brainstorm, to determining the best inquiries to ask and what issues should be solved
    • You collect information, or information is gathered for you.
    • You clear information and preprocess it – this the place you spend 80% of your time (and persistence!).
    • The enjoyable begins: you begin making exploratory information evaluation (EDA) – getting a really feel for the info, discovering tales in numbers.
    • You begin constructing: characteristic engineering and modeling begins.
    • Then, you consider and validate if issues really do work.
    • Lastly, you doc and report your findings, so others can perceive it too.

    Now… think about having a useful assistant that:

    • Writes strong starter code in seconds,
    • Suggests higher methods to clear or visualize information,
    • Helps you clarify mannequin efficiency to non-tech individuals,
    • Reminds you to verify for stuff you would possibly miss (like information leakage or class imbalance),
    • And is out there 24/7.

    That’s what LLMs will be, when you immediate them the best means!

    They received’t exchange you, don’t worry. They don’t seem to be in a position to do it!

    However they’ll and will certainly amplify you. You continue to have to know what you’re constructing and how (and why!), however now you may have an assistant that means that you can do all of this in a better means.

    Now I’ll present you ways immediate engineering can amplify you as a knowledge scientist.

    Immediate Engineering for planning, cleansing, and EDA

    1. Planning & brainstorming: No extra clean pages

    You’ve received a dataset. You’ve received a objective. Now what?

    You’ll be able to immediate GPT-4 or Claude to record steps for an end-to-end mission given a dataset description and objective.

    This part is the place LLMs can already provide you with a lift.

    Instance: Planning an power consumption prediction mission

    Right here’s an precise immediate I’ve used (with ChatGPT):

    “You’re a senior information scientist. I’ve an power consumption dataset (12,000 rows, hourly information over 18 months) together with options like temperature, usage_kwh, area, and weekday.
    Process: Suggest a step-by-step mission plan to forecast future power consumption. Embody preprocessing steps, seasonality dealing with, characteristic engineering concepts, and mannequin choices. We’ll be deploying a dashboard for inner stakeholders.”

    This type of structured immediate offers:

    • Context (dataset measurement, variables, objective)
    • Constraints (class imbalance)
    • Hints at deployment

    Be aware: in case you are utilizing ChatGPT’s latest mannequin, o3-pro, make certain to present it a lot of context. This new mannequin thrives while you feed it with full transcripts, docs, information, and so forth.

    An identical Claude immediate would work, as Claude additionally favors express directions. Claude’s bigger context window even permits together with extra dataset schema particulars or examples if wanted, which might yield a extra tailor-made plan

    I re-tested this immediate with o3-pro as I used to be curious to see the outcomes

    The response from o3-pro was nothing lower than a full information science mission plan, from cleansing and have engineering to mannequin choice and deployment, however extra importantly: with vital choice factors, reasonable timelines, and questions that problem our assumptions upfront.

    Here’s a snapshot of the response:

    Picture by creator.

    Bonus technique: Make clear – Verify – Full

    In the event you want a extra complicated planning, there’s a trick known as “Make clear, Verify, Full” that you need to use earlier than the AI offers the ultimate plan.

    You’ll be able to ask the mannequin to:

    1. Make clear what it must know first
    2. Verify the best strategy
    3. Then full a full plan

    For instance:

    “I need to analyze late deliveries for our logistics community.
    Earlier than giving an evaluation plan:

    1. Make clear what information or operational metrics may be related to supply delays
    2. Verify the perfect evaluation strategy for figuring out delay drivers
    3. Then full an in depth mission plan (information cleansing, characteristic engineering, mannequin or evaluation methods, and reporting steps).”

    This strategy forces the LLM to first ask questions or state assumptions (e.g., about obtainable information or metrics). This forces the mannequin to decelerate and suppose, identical to we people do!

    Knowledge cleansing & preprocessing: Bye bye boilerplate

    Now that the plan’s prepared, it’s time to roll up your sleeves. Cleansing information is 80% of the job, and for certain not a enjoyable job.

    GPT-4 and Claude can each generate code snippets for widespread duties like dealing with lacking values or reworking variables, given immediate.

    Instance: Write me some pandas code

    Immediate:

    “I’ve a DataFrame df with columns age, revenue, metropolis.
    Some values are lacking, and there are revenue outliers.
    Process:

    1. Drop rows the place metropolis is lacking
    2. Fill lacking age with the median
    3. Cap revenue outliers utilizing IQR methodology
      Embody feedback within the code.”

    Inside seconds, you get a code block with dropna(), fillna(), and the IQR logic, all with explanations.

    Instance: Steerage on cleansing methods

    You’ll be able to question conceptual recommendation as properly.

    Immediate:

    “What are totally different approaches to deal with outliers in a monetary transactions dataset? Clarify when to make use of every and the professionals/cons.”

    A immediate’s reply like this may output the a number of strategies particular to your area of interest, as an alternative of a one-size-fits-all resolution.

    This helps keep away from the simplistic and even deceptive recommendation one would possibly get from a too-general query (for instance, asking “greatest solution to deal with outliers” will most likely output an oversimplified “take away all outliers” suggestion.

    Strive few-shot prompting for Consistency

    Want variable descriptions in a constant format?

    Simply present the LLM how:

    Immediate:

    “Authentic: “Buyer age” → Standardized: “Age of buyer at time of transaction.”
    Authentic: “purchase_amt” → Standardized: “Transaction quantity in USD.”

    Now standardize:

    • Authentic: “cust_tenure”
    • Authentic: “item_ct” “

    It follows the type completely. You need to use this trick to standardize labels, outline options, and even describe mannequin steps later.

    Exploratory information evaluation (EDA): Ask higher questions

    EDA is the place we begin asking, “What’s attention-grabbing right here?” and that is the place imprecise prompts can actually harm.

    A generic “analyze this dataset” will typically return… generic strategies.

    Examples: EDA duties

    “I’ve an e-commerce dataset with customer_id, product, date, and quantity.
    I need to perceive:

    1. Buy habits patterns
    2. Merchandise typically purchased collectively
    3. Adjustments in buying over time
      For every, counsel columns to research and Python strategies.”

    The reply will most likely embody grouped stats, time traits, and even code snippets utilizing groupby(), seaborn, and market basket evaluation.

    If you have already got synopsis statistics, you can even paste them and ask:

    Immediate:

    “Primarily based on these abstract stats, what stands out or what potential points ought to I look into?”.

    GPT-4/Claude would possibly level out a excessive variance in a single characteristic or a suspicious variety of lacking entries in one other. (Be cautious: the mannequin can solely infer from what you present; it could hallucinate patterns if requested to invest with out information.)

    Instance immediate: Guided EDA

    “I’ve a dataset with 50 columns (mixture of numeric and categorical). Counsel an exploratory information evaluation plan: record 5 key analyses to carry out (e.g., distribution checks, correlations, and so forth.). For every, specify which particular columns or pairs of columns to take a look at, given I need to perceive gross sales efficiency drivers.”

    This immediate is restricted concerning the objective (gross sales drivers) so the AI would possibly suggest, say, analyzing gross sales vs marketing_spend scatter plot, a time collection plot if date is current, and so forth., custom-made to “efficiency drivers.” Apart from, the structured output (record of 5 analyses) can be simpler to comply with than a protracted paragraph.

    Instance: Let the LLM clarify your plots

    You’ll be able to even ask:

    “What can a field plot of revenue by occupation inform me?”

    It should clarify quartiles, IQR, and what outliers would possibly imply. That is extra useful when mentoring juniors or getting ready slides for stories, displays, and so forth.

    Pitfalls to watch out about

    This early stage is the place most individuals misuse LLMs. Right here’s what to observe for:

    Broad or imprecise prompts

    In the event you say: “What ought to I do with this dataset?”
    You’ll get one thing like: “Clear the info, analyze it, construct a mannequin.”

    As a substitute, at all times embody:

    • Context (information sort, measurement, variables)
    • Targets (predict churn, analyze gross sales, and so forth.)
    • Constraints (imbalanced information, lacking values, area guidelines)

    Blind belief within the output

    Sure, LLMs write code quick. However check every part.

    I as soon as requested for code to impute lacking values. It used fillna() for all columns, together with the specific ones. It didn’t verify information sorts, and neither did I… the primary time. 😬

    Privateness and leakage

    In the event you’re working with actual firm information, don’t paste uncooked rows into the immediate except you’re utilizing a non-public/enterprise mannequin. Describe the info abstractly as an alternative. And even higher, seek the advice of your supervisor about this matter.


    Thanks for studying!

    👉 Seize the Immediate Engineering Cheat Sheet with all prompts of this text organized. I’ll ship it to you while you subscribe to Sara’s AI Automation Digest. You’ll additionally get entry to an AI device library and my free AI automation e-newsletter each week!

    Thanks for studying! 😉


    I supply mentorship on profession development and transition here.

    If you wish to help my work, you possibly can buy me my favorite coffee: a cappuccino. 😊

    References

    A Guide to Using ChatGPT For Data Science Projects | DataCamp

    (29) Prompt Engineering for Document Analysis: What I Learned Moving from GPT-4 to Claude 4 🧠 | LinkedIn

    Prompt Engineering for Data Professionals – Dataquest

    Geeks for Geeks



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Comments are closed.

    Editors Picks

    1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?

    April 19, 2026

    Efficient hybrid minivan delivers MPG

    April 19, 2026

    How Can Astronauts Tell How Fast They’re Going?

    April 19, 2026

    A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Could prediction markets replace political polls? ‘Maybe yes, maybe no’

    November 30, 2025

    How the UK was connected to the Internet for the first time

    January 20, 2025

    ‘Is Microsoft Down?’ Outlook and Teams Go Dark in Widespread Outage

    January 23, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.