, the usual “textual content in, textual content out” paradigm will solely take you thus far.
Actual functions that ship precise worth ought to be capable of look at visuals, cause by means of advanced issues, and produce outcomes that programs can truly use.
On this publish, we’ll design this stack by bringing collectively three highly effective capabilities: multimodal enter, reasoning, and structured output.
As an example this, we’ll stroll by means of a hands-on instance: constructing a time-series anomaly detection system for e-commerce order knowledge utilizing OpenAI’s o3 mannequin. Particularly, we’ll present how one can pair o3’s reasoning functionality with picture enter and emit validated JSON, in order that the downstream system can simply devour it.
By the top, our app will:
- See: analyze charts of e-commerce order quantity time collection
- Assume: determine uncommon patterns
- Combine: output a structured anomaly report
You’ll go away with practical code you may reuse for numerous use instances that transcend simply anomaly detection.
Let’s dive in.
Occupied with studying the broader panorama of how LLMs are being utilized for anomaly detection? Take a look at my earlier publish: Boosting Your Anomaly Detection With LLMs, the place I summarized 7 rising utility patterns that you simply shouldn’t miss.
1. Case Examine
On this publish, we intention to construct an anomaly detection resolution for figuring out irregular patterns in e-commerce order time collection knowledge.
For this case examine, we generated three units of artificial day by day order knowledge. The datasets symbolize three totally different profiles of the day by day order over roughly one month of time. To make seasonality apparent, we have now shaded the weekends. The x-axis reveals the day of the week.


Every determine accommodates one particular sort of anomaly (can you discover them?). We’ll later use these figures to check our anomaly detection resolution and see if it could possibly precisely get better these anomalies.
2. Our Resolution
2.1 Overview
In contrast to the standard machine studying approaches that require tedious function engineering and mannequin coaching, our present method is way easier. It really works with the next steps:
- We put together the determine for visualizing the e-commerce order time collection knowledge.
- We immediate the reasoning mannequin o3, ask it to take a better take a look at the time collection picture we fed to it, and decide if an uncommon sample exists.
- The o3 mannequin will then output its findings in a pre-defined JSON format.
And that’s it. Easy.
After all, to ship this resolution, we have to allow o3 mannequin to take picture enter and emit structured output. We’ll see how to try this shortly.
2.2 Establishing the reasoning mannequin
As talked about earlier than, we’ll use o3 mannequin, which is the flagship reasoning mannequin from OpenAI that may deal with advanced multi-step issues with state-of-the-art efficiency. Particularly, we’ll use the Azure OpenAI endpoint to name the mannequin.
Ensure you have put the endpoint, API key, and deployment title in an .env file, we are able to then proceed to organising the LLM consumer:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from openai import AzureOpenAI
from dotenv import load_dotenv
import os
load_dotenv()
# Setup LLM consumer
endpoint = os.getenv("api_base")
api_key = os.getenv("o3_API_KEY")
api_version = "2025-04-01-preview"
model_name = "o3"
deployment = os.getenv("deployment_name")
LLM_client = AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=endpoint
)
We use the next instruction because the system message for the o3 mannequin (tuned by GPT-5):
instruction = f"""
[Role]
You're a meticulous knowledge analyst.
[Task]
You can be given a line chart picture associated to day by day e-commerce orders.
Your activity is to determine distinguished anomalies within the knowledge.
[Rules]
The anomaly varieties may be spike, drop, level_shift, or seasonal_outlier.
A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single level.
A seasonal_outlier occurs if a weekend/weekday behaves not like friends in its class.
For instance, weekend orders are often decrease than the weekdays'.
Learn dates/values from axes; for those who can’t learn precisely, snap to the closest tick and observe uncertainty in clarification.
The weekends are shaded within the determine.
"""
Within the above instruction, we clearly outlined the function of the LLM, the duty that the LLM ought to full, and the principles the LLM ought to observe.
To restrict the complexity of our case examine, we deliberately specified solely 4 anomaly varieties that LLM must determine. We additionally supplied clear definitions of these anomaly varieties to take away ambiguity.
Lastly, we injected a little bit of area information about e-commerce patterns, i.e., decrease weekend orders are anticipated in comparison with weekdays. Incorporating area know-how is mostly thought-about good observe for guiding the mannequin’s analytical course of.
Now that we have now our mannequin arrange, let’s talk about how one can put together the picture for o3 mannequin to devour.
2.3 Picture preparation
To allow o3’s multimodal capabilities, we have to present figures in a particular format, i.e., both publicly accessible net URLs or as base64-encoded knowledge URLs. Since our figures are generated regionally, we’ll use the second method.
What’s Base64 Encoding anyway? Base64 is a option to symbolize binary knowledge (like our picture information) utilizing solely textual content characters which are secure to transmit over the web. It converts binary picture knowledge right into a string of letters, numbers, and some symbols.
And what about knowledge URL? An information URL is a sort of URL that embeds the file content material straight within the URL string, relatively than pointing to a file location.
We are able to use the next operate to deal with this conversion routinely:
import io
import base64
def fig_to_data_url(fig, fmt="png"):
"""
Converts a Matplotlib determine to a base64 knowledge URL with out saving to disk.
Args:
-----
fig (matplotlib.determine.Determine): The determine to transform.
fmt (str): The format of the picture ("png", "jpeg", and so on.)
Returns:
--------
str: The info URL representing the determine.
"""
buf = io.BytesIO()
fig.savefig(buf, format=fmt, bbox_inches="tight")
buf.search(0)
base64_encoded_data = base64.b64encode(buf.learn()).decode("utf-8")
mime_type = f"picture/{fmt.decrease()}"
return f"knowledge:{mime_type};base64,{base64_encoded_data}"
Primarily, our operate first saves the matplotlib determine to a reminiscence buffer. It then encodes the binary PNG knowledge as base64 textual content and wraps it within the desired knowledge URL format.
Assuming we have now entry to the artificial day by day order knowledge, we are able to use the next operate to generate the plot and convert it into a correct knowledge URL format in a single go:
def create_fig(df):
"""
Create a Matplotlib determine and convert it to a base64 knowledge URL.
Weekends (Sat–Solar) are shaded.
Args:
-----
df: dataframe accommodates one profile of day by day order time collection.
dataframe has "date" and "orders" columns.
Returns:
--------
image_url: The info URL representing the determine.
"""
df = df.copy()
df['date'] = pd.to_datetime(df['date'])
fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(df["date"], df["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Every day Orders', fontsize=14)
# Weekend shading
begin = df["date"].min().normalize()
finish = df["date"].max().normalize()
cur = begin
whereas cur <= finish:
if cur.weekday() == 5: # Saturday 00:00
span_start = cur # Sat 00:00
span_end = cur + pd.Timedelta(days=1) # Mon 00:00
ax.axvspan(span_start, span_end, alpha=0.12, zorder=0)
cur += pd.Timedelta(days=2) # skip Sunday
else:
cur += pd.Timedelta(days=1)
# Title
title = f'Every day Orders: {df["date"].min():%b %d, %Y} - {df["date"].max():%b %d, %Y}'
ax.set_title(title, fontsize=16)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
plt.tight_layout()
# Acquire url
image_url = fig_to_data_url(fig)
return image_url
Figures 1-3 are generated by the above plotting routine.
2.4 Structured output
On this part, let’s talk about how to make sure the o3 mannequin outputs a constant JSON format as a substitute of free-form textual content. That is what’s generally known as “structured output,” and it’s one of many key enablers for integrating LLMs into present computerized workflows.
To attain that, we begin by defining the schema that governs the anticipated output construction. We’ll be utilizing a Pydantic mannequin:
from pydantic import BaseModel, Area
from typing import Literal
from datetime import date
AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]
class DateWindow(BaseModel):
begin: date = Area(description="Earliest believable date the anomaly begins (ISO YYYY-MM-DD)")
finish: date = Area(description="Newest believable date the anomaly ends, inclusive (ISO YYYY-MM-DD)")
class AnomalyReport(BaseModel):
when: DateWindow = Area(
description=(
"Minimal window that accommodates the anomaly. "
"For single-point anomalies, use the interval that covers studying uncertainty, if the tick labels are unclear"
)
)
y: int = Area(description="Approx worth on the anomaly’s most consultant day (peak/lowest), rounded")
form: AnomalyKind = Area(description="The kind of the anomaly")
why: str = Area(description="One-sentence cause for why this window is uncommon")
date_confidence: Literal["low","medium","high"] = Area(
default="medium", description="Confidence that the window localization is right"
)
Our Pydantic schema tries to seize each the quantitative and qualitative elements of the detected anomalies. For every subject, we specify its knowledge sort (e.g., int for numerical values, Literal for a hard and fast set of decisions, and so on.).
Additionally, we use Area operate to supply detailed descriptions of every key. These descriptions are particularly necessary as they successfully function inline directions for o3, in order that it understands the semantic that means of every part.
Now, we have now lined the multimodal enter and structured output, time to place them collectively in a single LLM name.
2.5 o3 mannequin invocation
To work together with o3 utilizing multimodal enter and structured output, we use LLM_client.beta.chat.completions.parse() API. A number of the key arguments embrace:
mannequin: the deployment title;messages: the message object despatched to o3 mannequin;max_completion_token: the utmost variety of tokens the mannequin can generate in its closing response. Be aware that for reasoning fashions like o3, they may generate reasoning_tokens internally to “suppose by means of” the issue. The presentmax_completion_tokensolely limits the seen output tokens that customers obtain;response_format: the Pydantic mannequin that defines the anticipated JSON schema construction;reasoning_effort: a management knob that dictates how a lot computational effort o3 ought to use for reasoning. The out there choices embrace low, medium, and excessive.
We are able to outline a helper operate to work together with the o3 mannequin:
def anomaly_detection(instruction, fig_path,
response_format, immediate=None,
deployment="o3", reasoning_effort="excessive"):
# Compose messages
messages=[
{ "role": "system", "content": instruction},
{ "role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": fig_path,
"detail": "high"
}
},
]}
]
# Add immediate whether it is given
if immediate isn't None:
messages[1]["content"].append({"sort": "textual content", "textual content": immediate})
# Invoke LLM API
response = LLM_client.beta.chat.completions.parse(
mannequin=deployment,
messages=messages,
max_completion_tokens=4000,
reasoning_effort=reasoning_effort,
response_format=response_format
)
return response.decisions[0].message.parsed.model_dump()
Be aware that the messages object accepts each textual content and picture content material. Since we’ll solely use figures to immediate the mannequin, the textual content immediate is optionally available.
We set the "element": "excessive" to allow high-resolution picture processing. For our present case examine, that is almost certainly mandatory as we’d like o3 to raised learn positive particulars like axis tick labels, knowledge level values, and refined visible patterns. Nevertheless, keep in mind that high-detail processing would incur extra tokens and better API prices.
Lastly, by utilizing .parsed.model_dump(), we flip the JSON output right into a traditional Python dictionary.
That’s it for the implementation. Let’s see some outcomes subsequent.
3. Outcomes
On this part, we’ll enter the beforehand generated figures into the o3 mannequin and ask it to determine potential anomalies.
3.1 Spike anomaly
# df_spike_anomaly is the dataframe of the primary set of artificial knowledge (Determine 1)
spike_anomaly_url = create_fig(df_spike_anomaly)
# Anomaly detection
outcome = anomaly_detection(instruction,
spike_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(outcome)
Within the name above, the spike_anomaly_url is the info URL for Determine 1. The output of the result’s proven under:
{
'when': {'begin': datetime.date(2025, 8, 19), 'finish': datetime.date(2025, 8, 21)},
'y': 166,
'form': 'spike',
'why': 'Single day orders soar to ~166, far above adjoining days that sit close to 120–130.',
'date_confidence': 'medium'
}
We see that o3 mannequin faithfully returned the output precisely within the format we designed. Now, we are able to seize this outcome and generate a visualization programmatically:
# Create picture
fig, ax = plt.subplots(figsize=(8, 4.5))
df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Every day Orders', fontsize=14)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
# Add anomaly overlay
start_date = pd.to_datetime(outcome['when']['start'])
end_date = pd.to_datetime(outcome['when']['end'])
# Add shaded area
ax.axvspan(start_date, end_date, alpha=0.3, coloration='pink', label=f"Anomaly ({outcome['kind']})")
# Add textual content annotation
mid_date = start_date + (end_date - start_date) / 2 # Center of anomaly window
ax.annotate(
outcome['why'],
xy=(mid_date, outcome['y']),
xytext=(10, 20), # Offset from the purpose
textcoords='offset factors',
bbox=dict(boxstyle='spherical,pad=0.5', fc='yellow', alpha=0.7),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
fontsize=10,
wrap=True
)
# Add legend
ax.legend()
plt.xticks(rotation=0)
plt.tight_layout()
The generated visualization appears to be like like this:

We are able to see that the o3 mannequin appropriately recognized the spike anomaly introduced on this first set of artificial knowledge.
Not dangerous, particularly contemplating the truth that we didn’t do any typical mannequin coaching, simply by prompting an LLM.
3.2 Degree shift anomaly
# df_level_shift_anomaly is the dataframe of the 2nd set of artificial knowledge (Determine 2)
level_shift_anomaly_url = create_fig(df_level_shift_anomaly)
# Anomaly detection
outcome = anomaly_detection(instruction,
level_shift_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(outcome)
The output of the result’s proven under:
{
'when': {'begin': datetime.date(2025, 8, 26), 'finish': datetime.date(2025, 9, 2)},
'y': 150,
'form': 'level_shift',
'why': 'Orders instantly soar from the 120-135 vary to ~150 on Aug 26 and stay elevated for all subsequent days, indicating a sustained baseline change.',
'date_confidence': 'excessive'
}
Once more, we see that the mannequin precisely recognized {that a} “level_shift” anomaly is current within the plot:

3.3 Seasonality anomaly
# df_seasonality_anomaly is the dataframe of the third set of artificial knowledge (Determine 3)
seasonality_anomaly_url = create_fig(df_seasonality_anomaly)
# Anomaly detection
outcome = anomaly_detection(instruction,
seasonality_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(outcome)
The output of the result’s proven under:
{
'when': {'begin': datetime.date(2025, 8, 23), 'finish': datetime.date(2025, 8, 24)},
'y': 132,
'form': 'seasonal_outlier',
'why': 'Weekend of Aug 23-24 reveals order volumes (~130+) on par with surrounding weekdays, whereas different weekends constantly drop to ~115, making it an out-of-season spike.',
'date_confidence': 'excessive'
}
This can be a difficult case. Nonetheless, our o3 mannequin managed to deal with it correctly, with correct localization and a transparent reasoning hint. Fairly spectacular:

4. Abstract
Congratulations! We’ve efficiently constructed an anomaly detection resolution for time-series knowledge that labored totally by means of visualization and prompting.
By feeding day by day order plots into the o3 reasoning mannequin and constraining its output to a JSON schema, the LLM managed to determine three totally different anomaly varieties with correct localization. All of this was achieved with out coaching any ML mannequin. Spectacular!
If we take a step again, we are able to see that the answer we constructed illustrates the broader sample of mixing three capabilities:
- See: multimodal enter to let the mannequin devour figures straight.
- Assume: step-by-step reasoning functionality to deal with advanced issues.
- Combine: structured output that downstream programs can simply devour (e.g., producing visualizations).
The mix of multimodal enter + reasoning + structured output actually creates a flexible basis for helpful LLM functions.
You now have the constructing blocks prepared. What do you wish to construct subsequent?

