Ten Lessons of Building LLM Applications for Engineers

throughout industries. Conventional engineering domains aren’t any exception.

Up to now two years, I’ve been constructing LLM-powered instruments with engineering area consultants. These are course of engineers, reliability engineers, cybersecurity analysts, and many others., who spend most of their day in logs, specs, schematics, and experiences, and doing duties akin to troubleshooting, failure mode evaluation, check planning, compliance checks, and many others.

The promise is compelling: because of its in depth pre-trained data, the LLMs can, in concept, purpose like area consultants and speed up the tedious, pattern-matching elements of engineering work, and unencumber consultants for higher-order choices.

The apply, nevertheless, is messier. “Simply add a chatbox” not often interprets into helpful engineering instruments. There may be nonetheless fairly a big hole between a formidable demo and a system that engineers really belief and use.

It has all the pieces to do with the right way to body the issue, the right way to construction the workflow, and the right way to combine it into the engineer’s actual atmosphere.

On this put up, I’d prefer to share 10 classes I discovered from my previous tasks. They’re simply my assortment of “discipline notes” slightly than a complete guidelines. However if you happen to additionally plan to construct or are at present constructing LLM functions for area consultants, I hope these classes may aid you keep away from just a few painful useless ends.

Our roadmap. (Picture by creator)

I manage the teachings into three phases, which precisely align with the phases of a typical LLM challenge:

Earlier than you begin: body the fitting drawback and set the fitting expectation.
In the course of the challenge: design clear workflows and implement construction in every single place.
After you’ve constructed: combine the place engineers work and consider with actual instances.

With that in thoughts, let’s get began.

Section 1: Earlier than You Begin

What you do earlier than even writing a single line of code largely shapes whether or not an LLM challenge will succeed or fail.

Meaning in case you are chasing the fallacious drawback or failing to set the fitting expectation upfront, your utility will wrestle to realize traction later, irrespective of how technically sound you make it.

Within the following, I’d prefer to share some classes on laying the fitting basis.

Lesson 1: Not each drawback can or needs to be addressed by LLMs

After I have a look at a brand new use case from engineers, I’d all the time attempt very onerous to problem my “LLM-first” reflex and actually ask myself: can I remedy the issue with out utilizing LLMs?

For the core reasoning logic, that’s, the decision-making bottleneck you wish to automate, there are often at the very least three lessons of strategies you may take into account:

Rule-based and analytical strategies
Information-driven ML fashions
LLMs

Rule-based and analytical strategies are low-cost, clear, and simple to check. Nevertheless, they could be rigid and solely possess restricted energy within the messy actuality.

Basic ML fashions, even a easy regression or classification, can usually offer you quick, dependable, and simply scalable choices. Nevertheless, they require historic information (and very often, additionally the labels) to be taught the patterns.

LLMs, alternatively, shine if the core problem is about understanding, synthesizing, or producing language throughout messy artifacts. Suppose skimming by 50 incident experiences to floor seemingly related ones, or turning free-text logs into labeled, structured occasions. However LLMs are costly, gradual, and often don’t behave deterministically as you may want.

Earlier than deciding to make use of an LLM for a given drawback, ask your self:

May 80% of the issue be solved with a rule engine, an analytical mannequin, or a traditional mannequin? If sure, merely begin there. You may all the time layer an LLM on high later if wanted.
Does this process require exact, reproducible numerical outcomes? If that’s the case, then preserve the computation in analytical code or ML fashions, and use LLMs just for clarification or contextualization.
Will there be no human within the loop to assessment and approve the output? If that’s the case, then an LLM won’t be a good selection because it not often gives sturdy ensures.
At our anticipated pace and quantity, would LLM calls be too costly or too gradual? If that you must course of hundreds of log traces or alerts per minute, counting on LLM alone will rapidly make you hit a wall on each value and latency.

In case your solutions are largely “no”, you’ve in all probability discovered an excellent candidate to discover with LLMs.

Lesson 2: Set the fitting mindset from day one

As soon as I’m satisfied that an LLM-based resolution is acceptable for a selected use case, the subsequent factor I’d do is to align on the fitting mindset with the area consultants.

One factor I discover extraordinarily essential is the positioning of the device. A framing I often undertake that works very nicely in apply is that this: the objective of our LLM device is for augmentation, not automation. The LLM solely helps you (i.e., area consultants) analyze sooner, triage sooner, and discover extra, however you stay the decision-maker.

That distinction issues lots.

Once you place the LLM device as an augmentation, engineers have a tendency to have interaction it with enthusiasm, as they see it as one thing that might make their work sooner and fewer tedious.

However, in the event that they sense that the brand new device is one thing that will threaten their position or autonomy, they’ll distance themselves from the challenge and offer you very restricted assist.

From a developer’s viewpoint (which is you and me), setting this “amplify as an alternative of changing” mindset additionally reduces anxiousness. Why? As a result of it makes it a lot simpler to speak about errors! When the LLM will get one thing fallacious (and it’ll), the dialog gained’t merely be “your AI failed.”, however it’s extra about “the suggestion wasn’t fairly proper, however it’s nonetheless insightful and offers me some concepts.” That’s a really completely different dynamic.

Subsequent time, if you end up constructing LLM Apps for area consultants, attempt to emphasize:

LLMs are, at finest, junior assistants. They’re quick, work across the clock, however not all the time proper.
Consultants are the reviewers and supreme decision-makers. You’re skilled, cautious, and accountable.

As soon as this mindset is in place, you’ll see engineers begin to consider your resolution by the lens of “Does this assist me?” slightly than “Can this exchange me?” That issues lots in constructing belief and enhancing adoption.

Lesson 3: Co-design with consultants and outline what “higher” means

As soon as we’ve agreed that LLMs are acceptable for the duty at hand and the objective is augmentation not automation, the subsequent crucial level I’ll attempt to determine is:

“What does higher really imply for this process?”

To get a very good understanding on that, that you must convey the area consultants into the design loop as early as attainable.

Concretely, you need to spend time to take a seat down with the area consultants, stroll by how they remedy the issue at present, take notes on which instruments they use, and which docs/specs they confer with. Keep in mind to ask them to level out the place the ache level actually is, and higher perceive what’s OK to be “approximate” and what sorts of errors are annoying or unacceptable.

A concrete end result of those conversations with area consultants is a shared definition of “higher” in their very own language. These are the metrics you’re optimizing for, which could possibly be the quantity of triage time being saved, the variety of false leads being decreased, or the variety of guide steps being skipped.

As soon as the metric(s) are outlined, you’d routinely have a practical baseline (i.e., no matter it takes by the present guide course of) to benchmark your resolution later.

Moreover the technical results, I’d say the psychological results are simply as vital: by involving consultants early, you’re displaying to them that you simply’re genuinely attempting to learn the way their world works. That alone goes a good distance in incomes belief.

Section 2: Throughout The Mission

After organising the stage, you’re now able to construct. Thrilling stuff!

In my expertise, there are a few vital choices that you must make to make sure your onerous work really earns belief and will get adopted. Let’s discuss these determination factors.

Lesson 4: It’s Co-pilot, not Auto-pilot

A temptation I see lots (additionally in myself) is the will to construct one thing “totally autonomous”. As an information scientist, who can actually resist constructing an AI system that offers the consumer the ultimate reply with only one button push?

Nicely, the truth is much less flashy however far more practical. In apply, this “autopilot” mindset not often works nicely with area consultants, because it basically goes in opposition to the truth that engineers are used to methods the place they perceive the logic and the failure modes.

In case your LLM app merely does all the pieces within the background and solely presents a last consequence, two issues often occur:

Engineers don’t belief the outcomes as a result of they will’t see the way it bought there.
They can’t right it, even when they see one thing clearly off.

Subsequently, as an alternative of defaulting to an “autopilot” mode, I choose to deliberately design the system with a number of management factors the place consultants can affect the LLMs’ habits. For instance, as an alternative of LLM auto-classifying all 500 alarms and creating tickets, we will design the system to first group alarms into 5 candidate incident threads, pause, present the skilled the grouping rationale and key log traces for every thread. Then, consultants may merge or cut up teams. After consultants approve the grouping, the LLM can proceed to generate draft tickets.

Sure, from a UI perspective, this provides a bit of labor, as it’s important to implement human-input mechanisms, expose intermediate reasoning traces and outcomes clearly, and so forth. However the payoff is actual: your consultants will really belief and use your system as a result of it provides them the sense that they’re in management.

Lesson 5: Deal with workflow, roles, and information move earlier than selecting a framework

As soon as we get into the implementation section, a typical query many builders (together with myself prior to now) are inclined to ask first is:

“Which LLM App framework ought to I exploit? LangGraph? CrewAI? AutoGen? Or one thing else?”

This intuition is completely comprehensible. In spite of everything, there are such a lot of shiny frameworks on the market, and it does really feel like selecting the “proper” one is the primary huge determination. However for prototyping with engineering area consultants, I’d argue that that is often not the fitting place to begin.

In my very own expertise, for the primary model, you may go a good distance with the nice outdated from openai import OpenAI or from google import genai (or every other LLM suppliers you like).

Why? As a result of at this stage, probably the most urgent query shouldn’t be which framework to construct upon, however:

“Does an LLM really assist with this particular area process?”

And that you must confirm it as rapidly as attainable.

To do this, I’d prefer to give attention to three pillars as an alternative of frameworks:

Pipeline design: How can we decompose the issue into clear steps?
Position design: How ought to we instruct the LLMs at every step?
Information move & context design: What goes out and in of every step?

Should you deal with every LLM name as a pure operate, like this:

inputs → LLM reasoning → output

Then, you may wire these “capabilities” along with simply regular management move, e.g., if/else situations, for/whereas loops, retries, and many others., that are already pure to you as a developer.

This is applicable to device calling, too. If the LLM decides it must name a device, it may possibly merely output the operate title and the related parameters, and your common code can execute the precise operate and feed the consequence again into the subsequent LLM name.

You actually don’t want frameworks simply to specific the pipeline.

After all, I’m not saying that you need to keep away from utilizing frameworks. They’re fairly useful in manufacturing as they supply observability, concurrency, state administration, and many others., out of the field. However for the early stage, I feel it’s an excellent technique to only preserve issues easy, so as to iterate sooner with area consultants.

After getting verified your key assumptions together with your consultants, it’s not going to be tough emigrate your pipeline/position/information design to a extra production-ready framework.

In my view, that is lean improvement in motion.

Lesson 6: Attempt workflows earlier than leaping to brokers

Just lately, there was numerous dialogue round workflows vs. brokers. Each main participant within the discipline appears keen to emphasise that they’re “constructing brokers,” as an alternative of simply “operating predefined workflows.”

As builders, it’s very straightforward to really feel the temptation:

“Yeah, we positively ought to construct autonomous brokers that determine issues out on their very own, proper?“

No.

On paper, AI brokers sound tremendous engaging. However in apply, particularly in engineering domains, I’d argue {that a} well-orchestrated workflow with domain-specific logics can already remedy a big fraction of the true issues.

And right here is the factor: it does so with far much less randomness.

Generally, engineers already comply with a sure workflow to resolve that particular drawback. As a substitute of letting LLM brokers “rediscover” that workflow, it’s much better if you happen to translate that “area data” straight right into a deterministic, staged workflow. This instantly provides you a few advantages:

Workflows are approach simpler to debug. In case your system begins to behave surprisingly, you may simply spot which step is inflicting the difficulty.
Area consultants can simply perceive what you’re constructing, as a result of a workflow maps naturally to their psychological mannequin.
Workflows naturally invite human suggestions. They will simply be paused, settle for new inputs, after which resume.
You get far more constant habits. The identical enter would result in the same path or end result, and that issues a ton in engineering problem-solving.

Once more, I’m not saying that AI brokers are ineffective. There are definitely many conditions the place extra versatile, agentic-like habits is justified. However I’d say all the time begin with a transparent, deterministic workflow that explicitly encodes area data, and validate with consultants that it’s really useful. You may introduce extra agentic habits if you happen to hit limitations {that a} easy workflow can’t remedy.

Sure, it would sound boring. However your final objective is to resolve the issue in a predictable and explainable approach that convey enterprise values, not some fancy agentic demos. It’s good to all the time preserve that in thoughts.

Lesson 7: Construction all the pieces you may – inputs, outputs, and data

A standard notion of LLMs is that they’re good at dealing with free-form texts. So the pure intuition is: let’s simply feed experiences and logs in and ask the mannequin to purpose, proper?

No.

In my expertise, particularly in engineering domains, that’s leaving a number of efficiency on the desk. Actually, LLMs are inclined to behave a lot better whenever you give them structured enter and ask them to provide structured output.

Engineering artifacts usually are available semi-structured type already. As a substitute of dumping whole uncooked paperwork into the immediate, I discover it very useful to extract and construction the important thing data first. For instance, for free-text incident experiences, we will parse them into the next JSON:

{
  "incident_id": "...",
  "tools": "...",
  "signs": ["..."],
  "start_time": "...",
  "end_time": "...",
  "suspected_causes": ["..."],
  "mitigations": ["..."]
}

That structuring step may be finished in numerous methods: we will resort to traditional regexes, or develop small helper scripts. We are able to even make use of a separate LLM whose solely job is to normalize the free-texts right into a constant schema.

This fashion, you can provide the principle reasoning LLMs a clear view of what occurred. And the bonus level is, with this construction in place, you may ask the LLMs to quote particular information when reaching their conclusion. And that saves you fairly a while in debugging.

Should you’re doing RAG, this structured layer can be what you need to retrieve over, as an alternative of the uncooked PDFs or logs. You’d get higher precision and extra dependable citations when retrieving over clear, structured artifacts.

Now, on the output aspect, construction is principally necessary if you wish to plug the LLM into a bigger workflow. Concretely, this implies as an alternative of asking:

“Clarify what occurred and what we should always do subsequent.”

I choose one thing like:

“Fill this JSON schema together with your evaluation.”

{
  "likely_causes": [
    medium
  ],
  "recommended_next_steps": [
    {"description": "...", "priority": 1}
  ],
  "abstract": "quick free-text abstract for the human"
}

Normally, that is outlined as a Pydantic mannequin and you’ll leverage the “Structured Output” characteristic to explicitly instruct the LLMs to provide output that conforms to it.

I used to see LLMs as “textual content in, textual content out”. However now I see it extra as “construction in, construction out”, and that is very true in engineering domains the place we’d like precision and robustness.

Lesson 8: Don’t overlook about analytical AI

I do know we’re constructing LLM-based options. However as we discovered in Lesson 1, LLMs should not the one device in your toolbox. We even have the “old fashioned” analytical AI fashions.

In lots of engineering domains, there’s a lengthy observe file of making use of traditional analytical AI/ML strategies to handle numerous points of the issues, e.g., anomaly detection, time-series forecasting, clustering, classification, you title it.

These strategies are nonetheless extremely worthwhile, and in lots of instances, they need to be doing the heavy lifting as an alternative of being thrown away.

To successfully remedy the issue at hand, many occasions it could be price contemplating a hybrid strategy of analytical AI + GenAI: analytical ML to deal with the heavy-lifting of the sample matching and detection, and LLMs function on high to purpose, clarify, and advocate subsequent steps.

For instance, say you’ve hundreds of incident occasions per week. You could possibly begin with utilizing classical clustering algorithms to group comparable occasions into patterns, possibly additionally compute some combination stats for every cluster. Then, the workflow can feed these cluster analytical outcomes into an LLM and ask it to label every sample, describe what it means, and recommend what to verify first. Afterward, engineers assessment and refine the labels.

So why does this matter? As a result of analytical strategies provide the pace, reliability, and precision on structured information. They’re deterministic, they scale to tens of millions of information factors, and so they don’t hallucinate. LLMs, alternatively, excels nicely at synthesis, context, and communication. It’s best to use every for what it’s finest at.

Section 3: After You Have Constructed

You’ve constructed a system that works technically. Now comes the toughest half: getting it adopted. Irrespective of how good your implementation is, a device that’s placed on a shelf is a device that brings zero worth.

On this part, I’d prefer to share two last classes on integration and analysis. You wish to be sure your system lands in the true world and earns belief by proof, proper?

Lesson 9: Combine the place engineers really work

A separate UI, akin to a easy internet app or a pocket book, works completely high-quality for exploration and getting first-hand suggestions. However for actual adoption, you need to suppose past what your app does and give attention to the place your app reveals up.

Engineers have already got a set of instruments they depend on daily. Now, in case your LLM device presents itself as “one more internet app with a login and a chat field”, you may already see that it’s going to wrestle to change into a part of the engineers’ routine. Folks will attempt it a few times, then when issues get busy, they only fall again to no matter they’re used to.

So, the right way to handle this concern?

I’d ask myself this query at this level:

“The place within the current workflow would this app really be used, and what wouldn’t it appear to be there?”

In apply, what does this indicate?

Essentially the most highly effective integration is usually UI-level embedding. That principally means you embed LLM capabilities straight into the instruments engineers already use. For instance, in an ordinary log viewer, apart from the standard dashboard plots, you may add a aspect panel with buttons like “summarize the chosen occasions” or “recommend subsequent diagnostic steps”. This empowers the engineers with the LLM intelligence with out interrupting their ordinary workflow.

One caveat price mentioning, although: UI-level embedding usually requires buy-in from the crew that owns that device. If attainable, begin constructing these relationships early.

Then, as an alternative of a generic chat window, I’d give attention to buttons with concrete verbs that match how engineers take into consideration their duties, be it summarize, group, clarify, or evaluate. A chat interface (or one thing comparable) can nonetheless exist if engineers have follow-up questions, want clarifications, or want to enter free-form suggestions after the LLM produces its preliminary output. However the major interplay right here needs to be task-specific actions, not open-ended dialog.

Additionally vital: you need to make the context of LLMs dynamic and adaptive. If the system already is aware of which incident or time window consultants are , go that context on to the LLM calls. Don’t make them copy-paste IDs, logs, or descriptions into one more UI.

If this integration is finished nicely, the barrier to attempting it (and in the end adopting it) would change into a lot decrease. And for you as a developer, it’s a lot simpler to get richer and extra sincere suggestions because it’s examined below actual situations.

Lesson 10: Analysis, analysis, analysis

After getting shipped the primary model, you may suppose your work is finished. Nicely, the reality is, in apply, that’s precisely the purpose the place the true work begins.

It’s the start of the analysis.

There are two issues I wish to focus on right here:

Make the system present its work in a approach that engineers can examine.
Sit down with consultants and stroll by actual instances collectively.

Let’s focus on them in flip.

First, make the system present its work. After I say “present its work”, I don’t simply imply a last reply. I would like the system to show, at an affordable stage of element, three concrete issues: what it checked out, what steps it took, and the way assured LLMs are.

What it checked out: these are basically the proof LLMs use. It’s an excellent apply to all the time instruct LLMs to quote particular proof once they produce a conclusion or advice. That proof may be the particular log traces, the particular incident IDs, or spec sections that assist the declare. Keep in mind in Lesson 7, we talked about structured enter? That turns out to be useful for LLM quotation administration and verification.
What steps did it take: these confer with the reasoning hint produced by LLMs. Right here, I’d expose the output produced in key intermediate steps of the pipeline. Should you’re adopting a multi-step workflow (Classes 5 & 6), you’ll have already got these steps as separate LLM calls or capabilities. And if you happen to’re imposing structured output (Lesson 7), surfacing them on UI turns into straightforward.
How assured LLMs are: lastly, I virtually all the time ask the LLM to output a confidence stage (low/medium/excessive), plus a brief rationale on why assigning this confidence stage. In apply, what you’ll acquire is one thing like this: “The LLM stated A, primarily based on B and C, with medium confidence due to D and E assumptions.” Engineers are far more snug with that sort of assertion, and once more, this can be a essential step in direction of constructing belief.

Now, let’s go to the second level: consider with consultants utilizing actual instances.

My suggestion is, as soon as the system can correctly present its work, you need to schedule devoted analysis classes with area consultants.

It’s like doing consumer testing.

A typical session may appear to be this: you and the skilled choose a set of actual instances. These generally is a mixture of typical ones, edge instances, and some historic instances with identified outcomes. You run them by the device collectively. In the course of the course of, ask the skilled to suppose aloud: What do you count on the device to do right here? Is that this abstract correct? Are these advised subsequent steps affordable? Would you agree that the cited proof really helps the conclusion? In the meantime, keep in mind to take detailed notes on issues like the place the device clearly saves time, the place it nonetheless fails, and what vital context is at present lacking.

After a few classes with the consultants, you may tie the outcomes again to the “higher” we outlined earlier (Lesson 3). This doesn’t must be a “formal” quantitative analysis, however belief me, even a handful of concrete earlier than/after comparisons may be eye-opening, and offer you a strong basis to maintain iterating your resolution.

Conclusion

Now, trying again at these ten classes, what recurring themes do you see?

Here’s what I see:

First, respect the area experience. Begin from how area engineers really work, genuinely be taught their ache factors and desires. Place your device as one thing that helps them, not one thing that replaces them. At all times let consultants keep in management.

Second, engineer the system. Begin with easy SDK calls, deterministic workflows, structured inputs/outputs, and blend conventional analytics with the LLM if that is sensible. Keep in mind, LLMs are only one element in a bigger system, not all the resolution.

Third, deal with deployment as the start, not the tip. The second you ship the primary working model is when you may lastly begin having significant conversations with consultants. Strolling by actual instances collectively, gathering their suggestions, and preserving iterating.

After all, these classes are simply my present reflections of what appears to work when constructing LLM functions for engineers, and they’re definitely not the one technique to go. Nonetheless, they’ve served me nicely, and I hope they will spark some concepts for you, too.

Joyful constructing!

Source link

Ten Lessons of Building LLM Applications for Engineers

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

Berlin-based re:cap lands in UK to scale its Capital OS for tech companies, securing €125 million

Building a Production-Grade Multi-Node Training Pipeline with PyTorch DDP

The Blink Arc Can Merge Two Security Cameras for a 180-Degree View

Ten Lessons of Building LLM Applications for Engineers

Section 1: Earlier than You Begin

Lesson 1: Not each drawback can or needs to be addressed by LLMs

Lesson 2: Set the fitting mindset from day one

Lesson 3: Co-design with consultants and outline what “higher” means

Section 2: Throughout The Mission

Lesson 4: It’s Co-pilot, not Auto-pilot

Lesson 5: Deal with workflow, roles, and information move earlier than selecting a framework

Lesson 6: Attempt workflows earlier than leaping to brokers

Lesson 7: Construction all the pieces you may – inputs, outputs, and data

Lesson 8: Don’t overlook about analytical AI

Section 3: After You Have Constructed

Lesson 9: Combine the place engineers really work

Lesson 10: Analysis, analysis, analysis

Conclusion

Related Posts