is likely one of the most promising purposes of LLMs, and CrewAI has rapidly change into a preferred framework for constructing agent groups. However one in every of its most vital options—the hierarchical manager-worker course of—merely doesn’t perform as documented. In actual workflows, the supervisor doesn’t successfully coordinate brokers; as a substitute, CrewAI executes duties sequentially, resulting in incorrect reasoning, pointless device calls, and intensely excessive latency. This situation has been highlighted in a number of on-line boards with no clear decision.
On this article, I display why CrewAI’s hierarchical course of fails, present the proof from precise Langfuse traces, and supply a reproducible pathway to make the manager-worker sample work reliably utilizing customized prompting.
Multi-agent Orchestration
Earlier than we get into the main points, allow us to perceive what orchestration means in an agentic context. In easy phrases, orchestration is managing and coordinating a number of inter-dependent duties in a workflow. However have’nt workflow administration instruments (eg; RPA) been accessible perpetually to do exactly that? So what modified with LLMs?
The reply is the power of LLMs to know that means and intent from pure language directions, simply as individuals in a staff would. Whereas earlier workflow instruments had been rule-based and inflexible, with LLMs functioning as brokers, the expectation is that they’ll be capable to perceive the intent of the person’s question, use reasoning to create a multi-step plan, infer the instruments for use, derive their inputs within the right codecs, and synthesize all of the completely different intermediate ends in a exact response to the person’s question. And the orchestration frameworks are supposed to information the LLM with applicable prompts for planning, tool-calling, producing response and so forth.
Among the many orchestration frameworks, CrewAI, with its pure language based mostly definition of duties, brokers and crews relies upon essentially the most on the LLM’s capability to know language and handle workflows. Whereas not as deterministic as LangGraph (since LLM outputs can’t be absolutely deterministic), it abstracts away many of the complexity of routing, error dealing with and so forth into easy, user-friendly constructs with parameters, which the person can tune for applicable conduct. So it’s a good framework for creating prototypes by product groups and even non-developers.
Besides that the manager-worker sample doesn’t work as supposed…
For example, let’s take a use-case to work with. And in addition consider the response based mostly on the next standards:
- High quality of orchestration
- High quality of ultimate response
- Explainability
- Latency and utilization value
Use Case
Take the case the place a staff of buyer assist brokers resolve technical or billing tickets. When a ticket comes, a triage agent categorizes the ticket, then assigns to the technical or billing assist specialist for decision. There’s a Buyer Help Supervisor who coordinates the staff, delegates duties and validates high quality of response.
Collectively they are going to be fixing queries akin to:
- Why is my laptop computer overheating?
- Why was I charged twice final month?
- My laptop computer is overheating and likewise, I used to be charged twice final month?
- My bill quantity is inaccurate after system glitch?
The primary question is only technical, so solely the technical assist agent needs to be invoked by the supervisor, the second is Billing solely and the third and fourth ones require solutions from each technical and billing brokers.
Let’s construct this staff of CrewAI brokers and see how properly it really works.
Hierarchical Course of
In keeping with CrewAI documentation ,“adopting a hierarchical strategy permits for a transparent hierarchy in activity administration, the place a ‘supervisor’ agent coordinates the workflow, delegates duties, and validates outcomes for streamlined and efficient execution. “ Additionally, the supervisor agent could be created in two methods, mechanically by CrewAI or explicitly set by the person. Within the latter case, you’ve gotten extra management over directions to the supervisor agent. We’ll attempt each methods for our use case.
CrewAI Code
Following is the code for the use case. I’ve used gpt-4o because the LLM and Langfuse for observability.
from crewai import Agent, Crew, Course of, Activity, LLM
from dotenv import load_dotenv
import os
from observe import * # Langfuse hint
load_dotenv()
verbose = False
max_iter = 4
API_VERSION = os.getenv(API_VERSION')
# Create your LLM
llm_a = LLM(
mannequin="gpt-4o",
api_version=API_VERSION,
temperature = 0.2,
max_tokens = 8000,
)
# Outline the supervisor agent
supervisor = Agent(
function="Buyer Help Supervisor",
objective="Oversee the assist staff to make sure well timed and efficient decision of buyer inquiries. Use the device to categorize the person question first, then determine the subsequent steps.Syntesize responses from completely different brokers if wanted to offer a complete reply to the shopper.",
backstory=( """
You don't attempt to discover a solution to the person ticket {ticket} your self.
You delegate duties to coworkers based mostly on the next logic:
Be aware the class of the ticket first through the use of the triage agent.
If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket based mostly on the responses from each Technical and Billing Help Specialists.
ELSE
If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
ELSE
If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
Lastly, compile and current the ultimate response to the shopper based mostly on the outputs from the assigned brokers.
"""
),
llm = llm_a,
allow_delegation=True,
verbose=verbose,
)
# Outline the triage agent
triage_agent = Agent(
function="Question Triage Specialist",
objective="Categorize the person question into technical or billing associated points. If a question requires each points, reply with 'Each'.",
backstory=(
"You're a seasoned knowledgeable in analysing intent of person question. You reply exactly with one phrase: 'Technical', 'Billing' or 'Each'."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline the technical assist agent
technical_support_agent = Agent(
function="Technical Help Specialist",
objective="Resolve technical points reported by clients promptly and successfully",
backstory=(
"You're a extremely expert technical assist specialist with a robust background in troubleshooting software program and {hardware} points. "
"Your main duty is to help clients in resolving technical issues, making certain their satisfaction and the graceful operation of their merchandise."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline the billing assist agent
billing_support_agent = Agent(
function="Billing Help Specialist",
objective="Handle buyer inquiries associated to billing, funds, and account administration",
backstory=(
"You're an skilled billing assist specialist with experience in dealing with buyer billing inquiries. "
"Your most important goal is to offer clear and correct info concerning billing processes, resolve fee points, and help with account administration to make sure buyer satisfaction."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline duties
categorize_tickets = Activity(
description="Categorize the incoming buyer assist ticket: '{ticket} based mostly on its content material to find out whether it is technical or billing-related. If a question requires each points, reply with 'Each'.",
expected_output="A categorized ticket labeled as 'Technical' or 'Billing' or 'Each'. Don't be verbose, simply reply with one phrase.",
agent=triage_agent,
)
resolve_technical_issues = Activity(
description="Resolve technical points described within the ticket: '{ticket}'",
expected_output="Detailed options supplied to every technical situation.",
agent=technical_support_agent,
)
resolve_billing_issues = Activity(
description="Resolve billing points described within the ticket: '{ticket}'",
expected_output="Complete responses to every billing-related inquiry.",
agent=billing_support_agent,
)
# Instantiate your crew with a customized supervisor and hierarchical course of
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
# manager_llm = llm_a, # Uncomment for auto-created supervisor
manager_agent=supervisor, # Remark for auto-created supervisor
course of=Course of.hierarchical,
verbose=verbose,
)
As is clear, this system displays the staff of human brokers. Not solely is there a manger, triage agent, technical and billing assist agent, however the CrewAI objects akin to Agent, Activity and Crew are self-evident of their that means and straightforward to visualise. One other remark is that there’s little or no python code and many of the reasoning, planning and conduct is pure language based mostly which relies upon upon the power of the LLM to derive that means and intent from language, then motive and plan for the objective.
A CrewAI code due to this fact, scores excessive on ease of improvement. It’s a low-code method of making a stream rapidly with many of the heavy-lifting of the workflow being performed by the orchestration framework quite than the developer.
How properly does it work?
As we’re testing the hierarchical course of, the method parameter is ready to Course of.hierarchical within the Crew definition. We will attempt completely different options of CrewAI as follows and measure efficiency:
- Supervisor agent auto-created by CrewAI
- Utilizing our customized supervisor agent
1. Auto-created supervisor agent
Enter question: Why is my laptop computer overheating?
Right here is the Langfuse hint:

The important thing observations are as follows:
- First the output is “Primarily based on the supplied context, it appears there’s a misalignment between the character of the difficulty (laptop computer overheating) and its categorization as a billing concern. To make clear the connection, it might be vital to find out if the shopper is requesting a refund for the laptop computer as a result of overheating situation, disputing a cost associated to the acquisition or restore of the laptop computer, or in search of compensation for restore prices incurred as a result of overheating…” For a question that was clearly a technical situation, this can be a poor response.
- Why does it occur? The left panel reveals that the execution first went to triage specialist, then to technical assist after which unusually, to billing assist specialist as properly. The next graphic depicts this as properly:

Wanting intently, we discover that the triage specialist appropriately recognized the ticket as “Technical” and the technical assist agent gave a fantastic reply as follows:

However then, as a substitute of stopping and replying with the above because the response, the Crew Supervisor went to the Billing assist specialist and tried to discover a non-existent billing situation within the purely technical person question.

This resulted within the Billing agent’s response overwriting the Technical agent’s response, with the Crew Supervisor doing a sub-optimal job of validating the standard of the ultimate response towards the person’s question.
Why did it occur?
As a result of within the Crew activity definition, I specified the duties as categorize_tickets, resolve_technical_issues, resolve_billing_issues and though the method is meant to be hierarchical, the Crew Supervisor doesn’t carry out any orchestration, as a substitute merely executing all of the duties sequentially.
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
manager_llm = llm_a,
course of=Course of.hierarchical,
verbose=verbose,
)
Should you now ask a billing-related question, it’ll seem to present an accurate reply just because the resolve_billing_issues is the final activity within the sequence.
What a couple of question that requires each technical and billing assist, akin to “My laptop computer is overheating and likewise I used to be charged twice final month?” On this case additionally, the triage agent appropriately categorizes the ticket sort as “Each”, and the technical and billing brokers give right solutions to their particular person queries, however the supervisor is unable to mix all of the responses right into a coherent reply to person’s question. As an alternative, the ultimate response solely considers the billing response since that’s the final activity to be known as in sequence.

Latency and Utilization: As could be seen from the above picture, the Crew execution took virtually 38 secs and spent 15759 tokens. The ultimate output is barely about 200 tokens. The remainder of the tokens had been spent in all of the considering, agent calling, producing intermediate responses and so forth – all to generate an unsatisfactory response on the finish. The efficiency could be categorised as “Poor”.
Analysis of this strategy
- High quality of orchestration: Poor
- High quality of ultimate output: Poor
- Explainability: Poor
- Latency and Utilization: Poor
However maybe, the above outcome is because of the truth that we relied on CrewAI’s built-in supervisor, which didn’t have our customized directions. Subsequently, in our subsequent strategy we change the CrewAI automated supervisor with our customized Supervisor agent, which has detailed directions on what to do in case of Technical, Billing or Each tickets.
2. Utilizing Customized Supervisor Agent
Our Buyer Help Supervisor is outlined with the next very particular directions. Be aware that this requires some experimentation to get it working, and a generic supervisor immediate akin to that talked about within the CrewAI documentation will give the identical misguided outcomes because the built-in supervisor agent above.
function="Buyer Help Supervisor",
objective="Oversee the assist staff to make sure well timed and efficient decision of buyer inquiries. Use the device to categorize the person question first, then determine the subsequent steps.Syntesize responses from completely different brokers if wanted to offer a complete reply to the shopper.",
backstory=( """
You don't attempt to discover a solution to the person ticket {ticket} your self.
You delegate duties to coworkers based mostly on the next logic:
Be aware the class of the ticket first through the use of the triage agent.
If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket based mostly on the responses from each Technical and Billing Help Specialists.
ELSE
If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
ELSE
If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
Lastly, compile and current the ultimate response to the shopper based mostly on the outputs from the assigned brokers.
"""
And within the Crew definition, we use the customized supervisor as a substitute of the built-in one:
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
# manager_llm = llm_a,
manager_agent=supervisor,
course of=Course of.hierarchical,
verbose=verbose,
)
Let’s repeat the take a look at instances
Enter question: Why is my laptop computer overheating?
The hint is the next:


An important remark is that now for this technical question, the stream didn’t go to the Billing assist specialist agent. The supervisor appropriately adopted directions, categorized the question as technical and stopped execution as soon as the Technical Help Specialist had generated its response. From the response preview displayed, it’s evident that it’s a good response for the person question. Additionally, the latency is 24 secs and token utilization is 10k.
Enter question: Why was I charged twice final month?
The hint is as follows:


As could be seen, the supervisor appropriately skipped executing the Technical Help Specialist, though that was earlier than the Billing agent within the Crew definition. As an alternative the response generated is of excellent high quality from the Billing Help Specialist solely. Latency is 16 secs and token utilization 7,700 solely
Enter question: My laptop computer is overheating and likewise, I used to be charged twice final month?
The hint reveals the Supervisor executed each Technical and Billing assist brokers and supplied a mixed response.

The response preview within the determine above doesn’t present the complete response, which is as follows, and combines responses from each assist brokers. Latency is 38 secs and token utilization is 20k, which is commensurate with the a number of brokers orchestration and the detailed response generated.
Expensive Buyer,
Thanks for reaching out to us concerning the problems you're experiencing. We sincerely apologize for any inconvenience brought about. Beneath are the detailed options to deal with your issues:
**1. Laptop computer Overheating Concern:**
- **Test for Correct Air flow**: Guarantee your laptop computer is positioned on a tough, flat floor to permit correct airflow. Keep away from utilizing it on gentle surfaces like beds or couches that may block the vents. Think about using a laptop computer cooling pad or stand with built-in followers to enhance airflow.
- **Clear the Laptop computer's Vents and Followers**: Mud and particles can accumulate within the vents and followers, limiting airflow. Energy off the laptop computer, unplug it, and use a can of compressed air to softly blow out mud from the vents. If you're comfy, you'll be able to clear the interior followers and elements extra completely, or take the laptop computer to an expert technician for inside cleansing.
- **Monitor Operating Purposes and Processes**: Open the Activity Supervisor (Home windows: Ctrl + Shift + Esc, macOS: Exercise Monitor) and verify for processes consuming excessive CPU or GPU utilization. Shut pointless purposes or processes to scale back the load on the system.
- **Replace Drivers and Software program**: Replace your working system, drivers (particularly graphics drivers), and some other vital software program to the newest variations.
- **Test for Malware or Viruses**: Run a full system scan utilizing a good antivirus program to detect and take away any malware.
- **Alter Energy Settings**: Alter your energy settings to "Balanced" or "Energy Saver" mode (Home windows: Management Panel > Energy Choices, macOS: System Preferences > Power Saver).
- **Examine the Laptop computer's {Hardware}**: If the laptop computer remains to be overheating, there could also be a difficulty with the {hardware}, akin to a failing fan or thermal paste that wants substitute. Seek the advice of an expert technician to examine and change the thermal paste or defective {hardware} elements if mandatory.
- **Environmental Elements**: Function the laptop computer in a cool, well-ventilated atmosphere. Keep away from utilizing the laptop computer in direct daylight or close to warmth sources.
- **Take into account Upgrading Elements**: If the laptop computer is older, take into account upgrading elements akin to RAM or switching to an SSD to scale back the pressure on the system and assist with warmth administration.
- **Monitor Temperature Ranges**: Set up a temperature monitoring device (e.g., HWMonitor, Core Temp, or Macs Fan Management) to maintain monitor of the CPU and GPU temperatures. This will help determine if the laptop computer is persistently working at excessive temperatures and supply insights into when the overheating happens.
If the issue persists after attempting all of the above options, please contact the laptop computer producer’s assist staff or a licensed restore technician for additional help.
**2. Duplicate Billing Concern:**
- **Verification of Account Particulars**: I reviewed your account info to make sure accuracy and confirmed that the account is lively and matches the reported situation.
- **Billing Information Test**: Upon inspecting the billing information for final month, I recognized two equivalent prices processed on the identical day. This confirms the duplicate billing situation.
- **Cost Technique Particulars**: I verified the fee methodology used for the transactions and confirmed that each prices had been processed utilizing the identical fee methodology.
- **Decision**:
- For the reason that duplicate cost was confirmed, I've initiated a refund for the second transaction. The refund shall be processed again to the unique fee methodology inside 5-7 enterprise days.
- Moreover, I've utilized a credit score to your account as a gesture of goodwill for the inconvenience brought about.
- **Buyer Communication**: You'll obtain an e-mail or telephone name from us to tell you of the decision, together with the refund timeline and the utilized credit score.
We sincerely apologize for the inconvenience brought on by these points. Please tell us for those who want additional help or have any extra questions. We're right here to assist
And our closing, extra nuanced question is as follows:
Enter question: My bill quantity is inaccurate after system glitch?
The hint reveals that the question was appropriately categorised as ‘Each’, requiring responses from Technical and Billing brokers and a mixed response was supplied. Latency is 24 secs and tokens used are 12k.

The total response is included right here:
Thanks for bringing this situation to our consideration. Primarily based on the technical investigation, listed below are the steps we're taking to resolve the system glitch that brought about the wrong bill quantity:
1. We have now verified the main points of the wrong bill quantity and recognized the difficulty within the system.
2. Our technical staff has analyzed the system logs and reviewed the bill calculation logic to pinpoint the foundation explanation for the error.
3. The mandatory fixes have been applied to deal with the glitch, and the system has been completely examined to make sure the difficulty is resolved.
4. A corrected bill has been generated, and we now have verified its accuracy.
Moreover, right here’s the entire decision for the billing situation associated to the wrong bill quantity brought on by the system glitch:
1. **Corrected Bill Verification**: The technical staff has resolved the system glitch and generated a corrected bill. The corrected bill has been personally verified to make sure its accuracy, together with cross-checking the quantities, dates, and any relevant prices or credit.
2. **Buyer Communication**: We'll promptly notify you concerning the decision of the difficulty. This communication contains:
- An apology for the inconvenience brought on by the wrong bill.
- Affirmation that the system glitch has been resolved.
- Assurance that the corrected bill has been completely reviewed for accuracy.
- A replica of the corrected bill to your information.
3. **Further Steps Taken**: To stop comparable points sooner or later, the technical staff has applied measures to make sure system stability and accuracy in bill technology.
4. **Account Adjustment (if relevant)**: If the wrong bill resulted in any overpayment or underpayment, the required changes shall be made to your account. This contains issuing a refund for any overpayment or offering clear directions for settling any excellent stability.
5. **Comply with-Up**: We're right here to help you with any additional questions or issues concerning your account or billing. Please don't hesitate to succeed in out to us, and we shall be glad to assist. In your comfort, we now have supplied direct contact info for additional communication.
We sincerely apologize for any inconvenience this will likely have brought about and guarantee you that we're taking steps to forestall comparable points sooner or later. Thanks to your understanding and persistence.
Analysis of this strategy
- High quality of orchestration: Good
- High quality of ultimate output: Good
- Explainability: Good (we perceive why it did what it did)
- Latency and Utilization: Honest (commensurate with the complexity of the output)
Takeaway
In abstract, the hierarchical Supervisor–Employee sample in CrewAI doesn’t perform as documented. The core orchestration logic is weak; as a substitute of permitting the supervisor to selectively delegate duties, CrewAI executes all duties sequentially, inflicting incorrect agent invocation, overwritten outputs, and inflated latency/token utilization. Why it failed comes all the way down to the framework’s inside routing—hierarchical mode doesn’t implement conditional branching or true delegation, so the ultimate response is successfully decided by whichever activity occurs to run final. The repair is introducing a customized supervisor agent with express, step-wise directions: it makes use of the triage outcome, conditionally calls solely the required brokers, synthesizes their outputs, and terminates execution on the proper level—restoring right routing, bettering output high quality, and considerably optimising token prices.
Conclusion
CrewAI, within the spirit of preserving the LLM on the heart of orchestration, relies upon upon it for many of the heavy-lifting of orchestration, utilising person prompts mixed with detailed scaffolding prompts embedded within the framework. In contrast to LangGraph and AutoGen, this strategy sacrifices determinism for developer-friendliness. And generally ends in sudden conduct for vital options such because the manager-worker sample, essential for a lot of real-life use instances. This text makes an attempt to display a pathway for reaching the specified orchestration for this sample utilizing cautious prompting. In future articles, I intend to discover extra options for CrewAI, LangGraph and others for his or her applicability in sensible use instances.
You need to use CrewAI to design an interactive conversational assistant on a doc retailer and additional make the responses really multimodal. Refer my articles on GraphRAG Design and Multimodal RAG.
Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI
All pictures on this article drawn by me or generated utilizing Copilot or Langfuse. Code shared is written by me.

