Solving the Human Training Data Problem

Apply Makes Passing

in laptop science was something however simple. I vividly bear in mind reaching a breaking level across the finish of the tenth week of my first semester. With just some weeks till my first last, I sat observing Calc 1 observe issues, spiraling into despair. I’d at all times been good at math. I did all of the homework and paid consideration in all of the lectures. So how might it’s that I didn’t even know the place to begin? Why wasn’t something clicking?

I typically joked with buddies about dropping out of this system, even nicely into my last semester. Week 10 of Semester 1 was the one time I very significantly thought of it.

It was January 2022, proper on the heels of the COVID tech hiring growth. I’d tried my hand at frontend improvement and had a fairly good grasp of React. Not one of the introductory math programs I used to be taking made any sense. Loads of acquaintances and buddies of buddies had gotten comfortable tech jobs with out levels, so why couldn’t I? What use was figuring out how one can show a perform was steady out in the actual world?

Excerpt from Calc 1 lecture notes, circa 2021. Picture by the creator.

Looking back, I understood that that was precisely what I used to be presupposed to really feel. That was once I really determined to pursue my diploma, not once I utilized a 12 months earlier. That feeling of impending doom was what lit a hearth below me and drove me to check like a person possessed for the following few months.

To this present day, I’ve by no means been happier to get again a grade than once I opened the scan of my graded Calc 1 examination to see “61/100” staring me again within the face: a passing grade with a cool margin of two factors above failing. However all that mattered was that it was a passing grade, particularly when virtually half the scholars had failed the category, many for the second or third time.

Calc 1 grade distribution. 42.6% fail price and a failing common grade of 55.5. Picture by the creator.

By all accounts, my first semester of undergrad was tough. Sure, this was by design, and sure, I discovered so much from it, each by way of the fabric itself and (largely) about resilience and perseverance. However it took shifting to Germany and beginning my grasp’s for me to grasp how good I actually had it again then, at the very least in a single specific regard.

The Human Coaching Information Drawback

One of many greatest surprises to me at my new college was that previous exams are a lot much less of a factor right here. For all of the stress and anxiousness I had throughout my bachelor’s, one factor I knew I might at all times rely on was the existence of plentiful and easily-accessible scans of previous exams and exam-relevant downside units, particularly for introductory programs.

For Discrete Math, I solved all the handfuls of previous exams going again virtually a decade. I distinctly bear in mind warming up for Linear Algebra 1 with questions from the Nineteen Nineties. This was so ingrained within the tradition of my program that I utterly took it without any consideration. The one motive I managed to cross Calc 1 (by the pores and skin of my tooth) was as a result of I had spent hours on finish fixing a whole bunch of questions from exams.

I used to be so accustomed to exams from previous years being available that skimming over them had develop into a part of my course of for vetting lessons I used to be contemplating taking. This meant that my impolite awakening got here pretty early on in my first semester of grad faculty, whereas attempting to determine my schedule.

So surprising was the revelation that I can map my response to the 5 levels of grief. At first, I used to be in denial, completely satisfied that there have to be some secret platform the place all of the previous exams have been hiding. Anger, bargaining, and despair quickly adopted. Acceptance didn’t actually, however I used to be keen to postpone my considerations till finals got here nearer on the finish of the semester.

As my first two finals (on back-to-back days, no much less) approached in a rush, I discovered myself confronted with what I prefer to name the Human Coaching Information Drawback. Granted, the human mind and machines are (very!) considerably completely different. However I couldn’t assist however liken my state of affairs to that of a machine studying mannequin with inadequate coaching knowledge. I used to be utterly stumped on how one can bridge the hole between lecture notes and potential examination questions.

My undergrad expertise had granted me the perception of what human underfitting seems like, each at coaching time (learning) and take a look at time (on examination day). I vividly bear in mind a couple of class the place, for one motive or one other, I most well-liked extra in-depth evaluation of lecture slides or notes to fixing observe issues.

This was an method I shortly dropped throughout my freshman 12 months, and for good motive: even in theory-heavy lessons, it yielded disastrous outcomes. Understanding the proofs for all 40 theorems the professor required was a lot much less assist in passing Linear Algebra 2 than training making use of them to unravel issues would have been. That’s to not say an enough grasp of the speculation isn’t mandatory; it completely is. However with the ability to recite the lecture notes by coronary heart gained’t prevent for those who can’t reply questions like those on the ultimate.

Proof of the Riesz illustration theorem (for an internal product area with a finite orthonormal foundation), **written out certainly one of many occasions whereas memorizing it throughout examination prep,** circa 2022. **Even whereas learning, this positively didn’t really feel like the perfect use of my time.** Picture by the creator.

And so, armed with a whole bunch of slides and a imprecise concept of the construction of every examination, I racked my mind for methods to keep away from the pitfall of entering into blind with none observe issues. Denial crept again in, and I desperately looked for previous exams I knew didn’t exist. Ultimately, I shifted my consideration from discovering the Holy Grail to turning my downside into one an LLM may have the ability to resolve.

Artificial Coaching Information for People

Researchers at IBM outline artificial knowledge as “data that’s been generated on a pc to enhance or substitute actual knowledge to enhance AI fashions” [1]. It has many advantages, from mitigating privateness considerations to reducing prices, resulting in its widespread adoption for makes use of as diversified as tooling for monetary establishments [1] and 3D content material technology [2].

In my case, the motivation was easy: the real-world (human) coaching knowledge I wanted to check simply wasn’t out there within the wild.

In fact, utilizing artificial knowledge solely is sensible if that knowledge precisely imitates the info our skilled mannequin will encounter in the actual world. I knew I needed to be very intentional about how I generated the mock exams I wished to make use of. Simply telling Claude to jot down a observe take a look at or two wouldn’t reduce it, even when I gave it all of the slides and materials I needed to work with. Solely when getting down to write an examination does one notice what number of choices there are to be made, nicely past what’s in and what’s out by way of the fabric.

Fortunately, I wasn’t flying utterly blind on that entrance. For one class, I had details about the examination’s construction and the sorts of questions there have been on it from college students who had taken it the 12 months prior. For the opposite, the professor offered a breakdown of the examination into sections and a small handful of open-ended evaluation questions.

Each lessons had Q&A classes after their respective last lectures. I paid particular consideration to something that appeared like a touch as to what they may ask, which later proved to be very useful.

Straightforward Mode: Replicating a Template

The primary examination was simple since I had way more to work with. It additionally had a status for being comparatively formulaic. I gave Claude the instance questions and construction I had and requested it to stay to the identical model.

Lots of the questions lent themselves properly to slight modifications that made them novel sufficient to be price fixing for observe with out straying too removed from what was typical for the precise examination. Aside from a number of LaTeX formatting hiccups, which have been pretty simply resolved, it was easy crusing.

To insure myself in opposition to any surprises, I additionally had it generate some trickier questions based mostly on the lecture slides and my notes from the Q&A session. Regardless that nothing sudden was requested ultimately, doing a little focused evaluation tailor-made to my very own private blind spots was a terrific confidence booster.

Though I positively would have been in a position to research for the primary examination with out the assistance of LLMs, I nonetheless felt like I gained so much by utilizing Claude. I might completely think about how useful it will have been for among the newer or extra superior programs I took in undergrad, the place there have been solely a small handful of previous exams out there.

Onerous Mode: Building from Scratch

The second examination was a a lot harder nut to crack. To start with, the breadth of the fabric was a lot wider. Secondly, the slides solely very loosely mirrored what was mentioned at school. Most significantly, there was far much less data out there on what the examination would appear like. What particulars there have been have been laborious to seek out and imprecise.

The primary two considerations have been at the very least partially mitigated by the truth that I made an effort to take complete notes all through the semester. As for hints on the construction and magnificence of the examination, I scoured each attainable platform and picked up something that appeared even remotely related. In that vein, the Q&A session ended up being a godsend. Transcribing the professor’s solutions and feedback left me with a significantly better (albeit nonetheless incomplete) concept of what to anticipate.

Admittedly, I used to be initially pessimistic in regards to the prospect of Claude with the ability to generate mock exams of a lot worth. Although I had used it pretty extensively for guided materials evaluation, I had my doubts about how it will fare with the uncertainty at play. Nonetheless, I gave it every part I knew in regards to the examination and hoped for the perfect.

I used to be pleasantly stunned on the outcomes. Though the primary few makes an attempt produced exams that didn’t really feel fairly proper, the core did appear promising. They did seem to adequately cowl the fabric and to be difficult sufficient. After some backwards and forwards, Claude began producing checks that I might have been satisfied have been actual.

**Overview of mock exams generated by Claude Sonnet 4.5 for Course #2.** Notice the (fairly typical) yes-man commentary. Picture by the creator.

I solved the improved checks and requested Claude to appropriate my options. The very act of fixing observe checks made me really feel nice about my grasp of the fabric. Claude’s standard sycophancy was the cherry on high. (It did level out errors, however was exceptionally smooth on deducting factors and overly-excited about appropriate solutions.) Finally, nevertheless, I wouldn’t understand how nicely Claude had performed coaching me till take a look at time. With the fateful day quick approaching, I hoped for the perfect.

Generalizing to Check Information and Stopping Dataset Air pollution

When Artificial Information Alone Doesn’t Minimize It

Whereas artificial knowledge actually has its advantages, it has a important downside. What a mannequin learns based mostly on artificial knowledge will, at finest, mannequin the simulated world from which that knowledge is drawn. That simulated world might diverge from actuality in methods we’re utterly unaware of till it’s too late [3].

As Dani Shanley places it in “Synthetic data, real harm,“

“… simply as generative AI fashions can produce believable (however false) textual content or photos, artificial knowledge mills could create datasets that seem statistically legitimate, whereas introducing delicate, hard-to-catch distortions and synthetic patterns, or lacking essential real-world complexities.” [3]

Shanley additionally attracts consideration to the hidden and disproportionate impression of the people tasked with synthesizing knowledge on how fashions finally behave. Largely arbitrary choices on their half might have vital, presumably dangerous, downstream results [3].

I noticed this impression in motion whereas learning for my second examination. Slowly however certainly, I had unintentionally skewed Claude’s outputs based mostly on my private interpretation of what the professor had mentioned. My intestine feeling on what the examination ought to appear like grew to become the arbiter of which questions have been related and which weren’t.

It additionally grew to become clearer as time went on that my coaching dataset was veering ever additional right into a biased tackle actuality. After the sixth mock examination, it was apparent that Claude had simply settled on a set set of a number of dozen questions.

Even when prompted to introduce extra selection, each output from there on out was just a few cobbling collectively of questions I had already seen. Granted, these did embrace many key questions it was closely implied would seem on the precise examination.

On take a look at day, I used to be shocked at how a lot the examination resembled those I had solved for observe. The gimmes the professor had hinted at have been certainly there, however so have been a powerful variety of non-trivial questions I had solved whereas learning. Roughly 60% of the questions have been equivalent or similar to ones I had practiced. Lots of the relaxation have been on subjects I had at the very least touched on.

Nevertheless, one a part of the examination ended up being a major blind spot. It was a bit on subjects we had mentioned solely briefly at the start of the semester. Whereas learning, I used to be unreasonably assured in swiftly dismissing sure forms of questions, be it as a result of they appeared uncharacteristic (e.g., too mathematical) or as a result of they have been about issues I had deemed too insignificant to incorporate within the notes I took at school.

Sadly, these turned out to the precise forms of questions that have been requested in that part. Some have been about subjects that solely appeared on a single slide all semester. Others have been deeply technical in a approach I simply didn’t anticipate. Although I did my finest to reply them, I hadn’t skilled my psychological mannequin on knowledge that might allow it to generalize to those questions nicely sufficient.

The tablet was all of the extra bitter to swallow for the reason that sorts of questions I struggled with have been ones Claude included in its first makes an attempt at mock exams. These have been exactly those I did away with early on based mostly on little greater than hunches.

On this case, the slip up was removed from catastrophic. In my view, it wasn’t even near undoing the advantages of learning utilizing artificial mock exams. Even so, it serves as a cautionary story that hearkens again to Shanley’s warnings about how artificial knowledge can insidiously exacerbate mannequin subjectivity and bias [3].

Overcoming Overfitting: Find out how to Make the Better of Artificial Human Coaching Information

For a lot of real-world functions, an artificial dataset that yields a mannequin with solely 60% accuracy would in all probability be thought of subsequent to ineffective. With enough real-world knowledge (i.e., precise previous exams), there isn’t any doubt in my thoughts that 90%+ accuracy can be achievable.

To be honest, although, the (human) mannequin into consideration has flaws that machines don’t and is, in some ways, a lot more durable to coach. I can say with confidence that that 60% would virtually actually surpass the accuracy of every other technique I might have tried.

I’ll completely persist with this technique for future exams, with three key takeaways I plan to implement:

Separate chats are the way in which to go. The suggestions loop that led Claude to converge on particular questions undoubtedly had so much to do with me working the whole cycle of producing checks and checking solutions in a single large, lengthy context. This meant any new mock examination was straight based mostly on all the earlier ones. Past that, Claude tried to be useful by tailoring the inquiries to what it thought have been my weak spots, main it to develop into much more entrenched in what it thought must be requested. Basic context rot⁽¹⁾ was additionally in all probability an essential issue.
Hold an open thoughts. As talked about above, the key blind spot I developed was largely the results of placing an excessive amount of inventory in my subjective evaluation of what materials would or ought to make the reduce. As a substitute of difficult my assumptions and devoting a while to overlaying minor subjects that appeared like lengthy pictures, I leaned into my biases.
Increase with real-world coaching knowledge! That is, in fact, simpler mentioned than performed. It considerably contradicts the very premise of this text. However what you are able to do as a pupil (or as an educator) is enrich the financial institution of identified questions for future college students. I managed to recollect many of the questions that have been on my second examination and doc them for future college students to make use of when learning.

Afterword: My Ideas on LLMs as a Studying Assist

The elephant within the room is that not one of the examination preparation workflow I described would have been even remotely possible once I began my bachelor’s in late 2021. Perhaps that is what made the method really feel virtually magical to me.

I bear in mind wishing I had a technique to mechanically test and proper my solutions on mock exams when learning in my freshman 12 months.Should you would have instructed me again then that an AI software, not to mention a free one, would have the ability to try this (nevertheless imperfectly) in 2026, I might have thought you have been loopy.

A lot has been written in regards to the new issues LLMs have led to. Lots of the factors which have been made are particularly related to college students. And certainly, I can’t argue that claims like “AI is making folks dumber” are utterly unfounded. I’ve seen firsthand how these instruments let an individual outsource considering and get rid of any mental discomfort. For an ever-growing vary of complicated duties, they symbolize the last word shortcut [4].

Concerningly, I consider individuals who resist the temptation to take these shortcuts are more and more being penalized, at the very least within the brief run. A pal who was the one one to not vibe-code assignments in a sure class involves thoughts. Others cruised to good grades on their homework regardless of threats about how AI-generated submissions would supposedly be rejected. He put within the work and ended up being docked vital factors for minor errors, with little in the way in which of constructive suggestions or recourse.

Nonetheless, in the long term, it’s a well-established proven fact that progress, in its myriad varieties, entails some form of stress. A kind of varieties is studying, and the mandatory stress comes within the type of energetic engagement with the fabric. Few issues are extra rewarding in my view than the lightbulb second of lastly understanding a tough idea after combating it for hours or days. Experiencing such moments with Fourier collection, reductions, metric areas, and lots of different ideas was a significant a part of what led me to decide on to pursue a grasp’s diploma within the area.

LLMs undoubtedly allow would-be learners to deprive themselves of this stress and, in flip, of precise studying. Usually, although, I believe too little consideration is paid to the opposite facet of the coin: with the fitting method, they’ll personalize and democratize studying like no invention for the reason that web has.

Having skilled increased training each pre- and post-ChatGPT, I really feel enormously lucky to have instruments like Claude and Gemini at my fingertips. Their utility for examination preparation was simply the tip of the iceberg. It felt like my productiveness was boosted tenfold all through the semester. Issues clicked a lot sooner than they ever would have in any other case. LLMs have been a recreation changer for every part from technique (when and how one can research what) to reviewing slides and notes to creating real curiosity and curiosity within the materials.

To summarize with a platitude: “With nice energy comes nice duty.” LLMs are what you make of them. With the fitting method, they’ll coach you to tackle the heavy lifting as a substitute of doing it for you.

Should you loved this text, please think about following me on LinkedIn to maintain up with future articles and tasks.

Footnotes

(1) Engineering at Anthropic defines context rot as a phenomenon the place “because the variety of tokens within the context window will increase, the mannequin’s capacity to precisely recall data from that context decreases.” [5]

References

[1] Okay. Martineau and R. Feris, “What’s artificial knowledge?,” IBM Analysis Weblog, Feb. 7, 2023. https://research.ibm.com/blog/what-is-synthetic-data.

[2] Y. Shi, P. Wang, J. Ye, M. Lengthy, Okay. Li, and X. Yang, “MVDream: Multi-view diffusion for 3D technology,” arXiv preprint arXiv:2308.16512, 2023. https://doi.org/10.48550/arXiv.2308.16512.

[3] D. Shanley, “Artificial knowledge, actual hurt,” Ada Lovelace Institute Weblog, Sep. 18, 2025. https://www.adalovelaceinstitute.org/blog/synthetic-data-real-harm/.

[4] S. Bogdanov, “In the long term, LLMs make us dumber,” @desunit (Sergey Bogdanov), Aug. 12, 2025. https://desunit.com/blog/in-the-long-run-llms-make-us-dumber/.

[5] P. Rajasekaran, E. Dixon, C. Ryan, and J. Hadfield, “Efficient context engineering for AI brokers,” Engineering at Anthropic, Sep. 29, 2025. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents.

Source link

Solving the Human Training Data Problem

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

Ultra telephoto lens brings distant shots to life

Balancing cost and performance: Agentic AI development

I’ve Tested 1,000+ Sex Toys. These 8 Are What I Always Recommend

Solving the Human Training Data Problem

Apply Makes Passing

The Human Coaching Information Drawback

Artificial Coaching Information for People

Straightforward Mode: Replicating a Template

Onerous Mode: Building from Scratch

Generalizing to Check Information and Stopping Dataset Air pollution

When Artificial Information Alone Doesn’t Minimize It

Overcoming Overfitting: Find out how to Make the Better of Artificial Human Coaching Information

Afterword: My Ideas on LLMs as a Studying Assist

Footnotes

References

Related Posts