Turning tedious lab work shortly into actionable insights
Throughout my time as a Physics pupil, manually extracting and analysing experimental measurements was typically an unavoidable and irritating a part of Physics labs. Studying values from devices, writing them down, transferring them into spreadsheets, and eventually plotting the outcomes was sluggish, repetitive, and error-prone.
Now that I work in Generative AI, I puzzled: Why not automate this with AI?
This led me to construct AI-OCR, an open-source prototype that makes use of AI to extract numerical information from photographs and switch it into insightful plots. The method of extracting textual content or numbers from photographs is usually known as Optical Character Recognition (OCR) – therefore the title for this undertaking.
The way it works:
- Add photographs of measurements (or structured PDFs like monetary experiences)
- Immediate the AI to extract particular values right into a clear DataFrame
- Immediate the AI to generate visualisations like time sequence, histograms, scatter plots, and so on.
By automating what was tedious, AI-OCR helps cut back guide work whereas additionally breaking free from vendor lock-in. In lots of lab and industrial environments, even digital information typically lives in proprietary codecs, requiring costly and/or restrictive software program for entry and evaluation. With AI-OCR, you may merely {photograph} the measurements, extract the information from the picture, and analyse in addition to visualise the outcomes with a easy immediate.
Whereas conceived with simplifying lab workflows in thoughts, the instrument’s purposes prolong far past science. From monitoring well being metrics to analysing utility payments or monetary statements, AI-OCR can help a variety of on a regular basis information duties.
On this article, I’ll stroll by:
- Actual-world use instances for the prototype
- A breakdown of the way it works below the hood
- Challenges, limits, and trade-offs encountered
- Potential approaches for additional improvement
Sensible use instances: The place AI-OCR shines
Since I now not work in a physics lab and I sadly wouldn’t have one in my basement, I used to be not in a position to check AI-OCR in its initially meant surroundings. As an alternative, I found a number of on a regular basis use instances the place this prototype proved to be surprisingly useful.
On this part, I’ll stroll by 4 real-world examples. I used AI-OCR to extract numerical information from on a regular basis photographs/paperwork like those within the picture under and generate significant plots with minimal effort. For every of those use instances, I used OpenAI’s API to the GPT-4.1 mannequin for each the OCR and the information visualisation (extra technical particulars in part 3).
Blood strain monitoring
On this first use case, I used AI-OCR to trace my blood strain and coronary heart charge all through the day. You’ll be able to see a full demonstration of this use case within the following video:
🎥 https://youtu.be/pTk9RgQ5SkM
Right here is how I used it in follow:
- I recorded my blood strain roughly each half-hour by taking images of the monitor’s show.
- I uploaded the pictures and prompted the AI to extract: systolic strain, diastolic strain, and coronary heart charge.
- AI-OCR returned a
pandas.DataFramewith the extracted values, timestamped utilizing the picture metadata. - Lastly, I requested the AI to plot systolic and diastolic strain as a time sequence, together with horizontal strains indicating commonplace wholesome ranges, in addition to the center charge in a separate subplot.

The consequence? A visible overview of my (barely elevated) blood strain fluctuations all through the day, with a transparent drop after lunch at 1PM. What’s significantly encouraging is that the plot doesn’t present any apparent outliers, sanity examine that signifies the AI extracted the values accurately from the pictures.
Most trendy blood strain displays solely retailer a restricted variety of readings internally. The machine I used, for instance, can maintain as much as 120 values. Nevertheless, many inexpensive fashions (like mine) don’t help information export. Even after they do, they typically require proprietary apps, locking your well being information into closed ecosystems. As you may see, this isn’t the case right here.
Physique weight monitoring
In one other health-related use case, I used AI-OCR to trace my physique weight over a number of weeks throughout a private weight- loss effort.
Historically, you would possibly weigh your self and manually enter the consequence right into a health app. Some trendy scales supply synchronisation by way of Bluetooth, however once more the information is usually locked inside proprietary apps. These apps usually restrict each information entry and the sorts of visualisations you may generate, making it troublesome to actually personal or analyse your individual well being information.
With AI-OCR, I merely took a photograph of my scale studying each morning. For somebody who is just not precisely a morning individual, it felt far simpler than fidgeting with an app earlier than my breakfast tea. As soon as I had a batch of photographs, I uploaded them and requested AI-OCR to extract the load values and generate a time sequence plot of my weight.

From the ensuing graph, you may see that I misplaced round 3 kg over roughly two month. I additionally requested the AI to carry out a linear regression, estimating a weight reduction charge of ~0.4 kg/week. With this method, the consumer has full management over the evaluation: I can ask the AI to generate a pattern line, estimate my weight reduction charge, or apply any customized logic I would like.
Monetary information evaluation
AI-OCR isn’t just helpful for well being monitoring. It may additionally assist make sense of your private funds. In my case, I discovered that the analytics offered by my brokerage app supplied solely primary summaries of my portfolio and sometimes missed key insights about my funding technique. Some numbers have been even inaccurate or incomplete.
One instance: after shifting my portfolio to a brand new brokerage, I wished to confirm that my buy-in values have been transferred accurately. This may be cumbersome, particularly when shares are collected over time by financial savings plans or a number of partial purchases. Doing this manually would imply digging by many PDFs, copying numbers into spreadsheets, and double-checking formulation, all of which is time-consuming and error-prone.
AI-OCR automated the complete workflow. I uploaded all of the PDF buy confirmations from my earlier dealer and prompted the AI to extract share title, nominal worth, and buy value. Within the second step, I requested it to compute the buy-in values for every share and generate a bar plot of the outcomes. Within the immediate I defined how one can calculate the buy-in worth:
“Purchase-in worth = share value × nominal worth, normalized over complete nominal worth.”
The generated plot let me shortly spot inconsistencies within the switch of the purchase in-values. In truth, this plot allowed me to catch just a few errors within the numbers from my new brokerage app.
Equally, you may immediate AI-OCR to calculate realised features or losses over time, primarily based in your transaction historical past. It is a metric my brokerage app doesn’t even present.
Electrical energy metre readings
For the ultimate use case, I’ll display how I digitised and tracked my electrical energy consumption utilizing this prototype.
Like many older homes in Germany, mine nonetheless makes use of an analogue electrical energy metre, which makes day by day monitoring practically unimaginable utilizing trendy (digital) expertise. If I need to analyse consumption over a time interval, I’ve to learn the metre manually at the start and finish of the interval. Then I need to repeat this for every interval/day. Doing this over a number of days shortly turns into mundane and error-prone.
As an alternative, I photographed the metre (virtually) on daily basis for just a few weeks and uploaded the pictures to AI-OCR. With two easy prompts, the instrument extracted the readings and generated a time-series plot of my cumulative electrical energy consumption in kWh.

The plot reveals a typically linear pattern, an indication that my day by day consumption was comparatively regular. Nevertheless, three outliers may be seen. These weren’t brought on by my secret bitcoin mining rigs however as an alternative resulted from misinterpret digits throughout the OCR course of. In three out of the 27 photographs, the mannequin merely made a recognition error.
These glitches level us to present limitations of AI-OCR, which I’ll discover in additional element shortly. However first, let’s have a better have a look at how this prototype really works below the hood.
Beneath the hood: How AI-OCR works
AI-OCR is cut up into two principal parts: a frontend and a backend. The frontend is constructed utilizing Streamlit, a Python library that permits you to shortly flip Python scripts into internet apps with little effort. It’s a in style alternative for machine studying prototypes and proofs of idea, due to its simplicity. That mentioned: Streamlit is just not meant for production-scale purposes although.
Because of this the principle focus of this text is on the backend, which is the place information extraction and visualisation happen. It’s designed round two distinct processes:
- OCR (Optical Character Recognition): Recognising the numerical information from photographs or paperwork utilizing AI.
- Knowledge visualisation: Remodeling the extracted information into insightful plots.
One in all AI-OCR’s strengths is its flexibility: it’s model-agnostic. You aren’t locked right into a single Giant Language Mannequin (LLM) vendor. Each business and open-source fashions may be configured and swapped relying on the use case. Every course of is powered by configurable LLMs. Moreover OpenAI fashions similar to GPT-4.1, the prototype helps (to this point) quantised fashions in GGUF format, a binary file format that packages mannequin weights and metadata collectively. These are loaded in and run regionally by way of the llama.cpp Python library.
For the OCR activity, Hugging Face provides an enormous number of quantised fashions similar to LLaVa, DeepSeek-VL, or Llama-3-vision. For the code era of the visualisation element, fashions with sturdy coding capabilities are best. Because of lack of computational assets at residence (I wouldn’t have entry to a strong GPU), I’ve solely completely examined this prototype with OpenAI fashions by way of the API.
The OCR element: Extracting the information
To show photographs into insights, the related information have to be recognised from the pictures, which is dealt with within the OCR element. The method begins when the consumer uploads photographs and submits a immediate describing which values ought to be recognised from the picture and non-compulsory extra context to help the mannequin. The output is a pandas.DataFrame containing the extracted values alongside the timestamps of the pictures.
The diagram under illustrates the design of the information extraction pipeline. The outer field represents the Streamlit-based frontend, whereas the interior part particulars the backend structure, a REST API. Arrows connecting the frontend and backend symbolize API calls. Inside the backend, every icon symbolises a definite element of the backend.

On the core of the backend is the OCR Modelling object. When a immediate is submitted, this object receives it together with the chosen mannequin configuration. It masses the suitable mannequin and accesses the uploaded photographs.
One significantly instructive a part of this design is the way in which the immediate is dealt with. Earlier than the precise OCR activity is carried out, the immediate from the consumer is enhanced with the assistance of a Small Language Mannequin (SLM). The SLM’s function is to establish the particular values talked about within the consumer’s immediate and return them as a listing. For instance, within the blood strain use case, the SLM would return:
[“heart rate”, “diastolic pressure”, “systolic pressure”].
This data is used to mechanically improve the unique consumer immediate. The LLM is all the time requested to return structured output. Thus, the immediate must be enhanced by the particular JSON output format, which for the blood strain case reads:
{“coronary heart charge”: “worth”, “diastolic strain”: “worth”, “systolic strain”: “worth”}.
Discover that the SLM used right here runs regionally utilizing llama.cpp. For the use instances mentioned beforehand, I used Gemma-2 9B (in quantised GGUF format). This system highlights how smaller, light-weight fashions can be utilized for environment friendly and computerized immediate optimisation.
This enhanced immediate is then despatched sequentially, together with every picture, to the chosen LLM. The mannequin infers the requested values from the picture. The responses are then aggregated right into a pandas.DataFrame, which is finally returned to the consumer for viewing and downloading.
Visualising the consequence
The second a part of turning your photographs into insights is the visualisation course of. Right here, the numerical information extracted into the DataFrame throughout the OCR course of is reworked into significant plots primarily based on the consumer’s request.
The consumer offers a immediate describing the kind of visualisation they need (e.g., time sequence, histogram, scatter plot). The LLM then generates Python code to create the requested plot. This generated code is executed on the frontend, and the ensuing visualisation is displayed immediately inside the frontend.
The diagram under as soon as once more illustrates this course of intimately. The core of this explicit course of is the Plot Modelling object. It receives two key inputs:
- The consumer’s immediate describing the specified visualisation
- The
pandas.DataFramegenerated by the OCR course of.

Earlier than passing the immediate and metadata concerning the DataFrame to the LLM, the immediate first passes by a Governance Gateway. Its job is to make sure safety by stopping the era or execution of malicious code. It’s applied as an SLM. As beforehand, I used Gemma-2 9B (in quantized GGUF format) as an SLM that runs regionally utilizing llama.cpp.
Particularly, the Governance Gateway first verifies by way of the instructed SLM that the consumer’s immediate accommodates a legitimate information visualisation request and doesn’t embody any dangerous or suspicious directions. Provided that the immediate passes this preliminary examine, it’s forwarded to the LLM to generate the Python plotting code. After the code is generated, it’s despatched again to the SLM for a second safety evaluation to make sure the code is secure to execute.
After passing the second safety validation, the generated code is distributed again to the frontend, the place it’s executed to generate the requested plot. This two-factor governance method helps be certain that AI-generated code runs safely and securely, whereas giving the consumer the flexibleness to generate any desired information visualisation inside the Matplotlib ecosystem.
Challenges, limits, and trade-offs
As already touched upon within the use case part, this prototype, significantly the OCR element, has notable limitations as a result of constraints of the underlying language fashions.
On this part, I need to explicitly display two eventualities (illustrated within the picture under) the place the instrument at present struggles considerably, and why, in some instances, it won’t be essentially the most optimum resolution. These two eventualities each require deciphering analogue information. Regardless of the growing digitization of lab tools, that is nonetheless an necessary requirement for the appliance of such a instrument in lots of physics labs.

On the left is an try to measure the size of an object (on this case, a e book) utilizing a bodily ruler. On the fitting is a picture of my automotive’s analogue RPM metre. In each instances, I processed a number of photographs with the prototype: static photographs for measuring the size of the e book and video frames for studying the RPM metre. Regardless of supplying high-quality inputs and thoroughly crafted prompts, the ensuing measurements have been imprecise. Whereas the extracted values all the time fell inside the anticipated numeric vary, they have been persistently too far off for real-world purposes.
Whereas AI-OCR provides comfort, in some instances the general price would possibly outweigh its advantages. In instances just like the physique weight tracker, it’s price mentioning, the instrument offers comfort however at a price in reminiscence and token utilization. Every picture could also be a number of megabytes, whereas the extracted information (a single float) is just some bytes. Picture evaluation with LLMs will also be costly. These trade-offs spotlight the necessity to all the time align AI purposes with clear enterprise worth.
Conclusion: Customized AI brokers for tomorrow’s lab
On this article, we explored how one can construct an LLM-powered prototype that transforms measurement photographs into structured information and insightful plots. Customers can add photographs, describe the values they need to be recognised from the pictures in addition to the kind of information visualisation to be carried out. The consumer then receives each uncooked values and visible interpretations.
You probably have tried ChatGPT or different LLM platforms, you will have seen: it may possibly already do a lot of this and maybe extra. Merely add a picture to the chat, describe your required information visualisation (optionally add context), and the system (e.g. ChatGPT) figures out the remainder. Beneath the hood, this seemingly depends on a system of AI brokers working in live performance.
That very same sort of structure is what a future model of AI-OCR may embrace. However why trouble constructing it if one may merely use ChatGPT as an alternative? Due to customisation and management. In contrast to ChatGPT, the AI brokers in AI-OCR may be tailor-made to your particular wants (like that of a lab assistant), and with native fashions, you keep full management over your information. As an illustration, you very seemingly would favor to not add your private finance paperwork to ChatGPT.
A attainable structure for such a system of AI brokers (that ChatGPT very seemingly depends on as nicely) is illustrated within the diagram under:

On the prime stage, a Root Agent receives the consumer’s enter and delegates duties by way of an Agent Communication Protocol (ACP). It may select between two auxiliary brokers:
- OCR Agent: Extracts related numerical information from photographs and interfaces with a Mannequin Context Protocol (MCP) server that manages CSV information storage.
- The Knowledge Vis(ualisation) Agent: Connects to a separate MCP plot server able to executing Python code. This server contains the Governance Gateway powered by an SLM, which ensures all code is secure and acceptable earlier than execution.
In contrast to ChatGPT, this setup may be absolutely tailor-made: from native LLMs for information safety to system immediate tuning of the brokers for area of interest duties. AI-OCR is just not meant to interchange ChatGPT, however quite complement it. It may evolve into an autonomous lab assistant that streamlines information extraction, plotting, and evaluation in specialised environments.
Acknowledgement
For those who’re interested by the way forward for AI-OCR or taken with exploring concepts and collaborations, be at liberty to attach with me on LinkedIn.
On the finish, I wish to thank Oliver Sharif, Tobias Jung, Sascha Niechciol, Oisín Culhane, and Justin Mayer for his or her suggestions and sharp proofreading. Your insights tremendously improved this text.

