Building a Python Workflow That Catches Bugs Before Production

of these languages that may make you’re feeling productive nearly instantly.

That could be a large a part of why it’s so fashionable. Shifting from thought to working code could be very fast. You don’t want plenty of scaffolding simply to check an thought. Some enter parsing, a couple of capabilities possibly, sew them collectively, and fairly often you’ll have one thing helpful in entrance of you inside minutes.

The draw back is that Python can be very forgiving in locations the place generally you would like it to not be.

It’ll fairly fortunately assume a dictionary key exists when it doesn’t. It’ll help you cross round information constructions with barely totally different shapes till one lastly breaks at runtime. It’ll let a typo survive longer than it ought to. And maybe, sneakily, it’ll let the code be “right” whereas nonetheless being far too gradual for real-world use.

That’s why I’ve develop into extra occupied with code improvement workflows generally reasonably than in any single testing approach.

When individuals speak about code high quality, the dialog normally goes straight to checks. Exams matter, and I exploit them continually, however I don’t suppose they need to carry the entire burden. It could be higher if most errors had been caught earlier than the code is even run. Possibly some points must be caught as quickly as you save your code file. Others, once you commit your modifications to GitHub. And if these cross OK, maybe you need to run a collection of checks to confirm that the code behaves correctly and performs properly sufficient to resist real-world contact.

On this article, I need to stroll via a set of instruments you need to use to construct a Python workflow to automate the duties talked about above. Not a large enterprise setup or an elaborate DevOps platform. Only a sensible, comparatively easy toolchain that helps catch bugs in your code earlier than deployment to manufacturing.

To make that concrete, I’m going to make use of a small however practical instance. Think about I’m constructing a Python module that processes order payloads, calculates totals, and generates recent-order summaries. Right here’s a intentionally tough first cross.

from datetime import datetime
import json

def normalize_order(order):
    created = datetime.fromisoformat(order["created_at"])
    return {
        "id": order["id"],
        "customer_email": order.get("customer_email"),
        "gadgets": order["items"],
        "created_at": created,
        "discount_code": order.get("discount_code"),
    }

def calculate_total(order):
    complete = 0
    low cost = None

    for merchandise so as["items"]:
        complete += merchandise["price"] * merchandise["quantity"]

    if order.get("discount_code"):
        low cost = 0.1
        complete *= 0.9

    return spherical(complete, 2)

def build_order_summary(order): normalized = normalize_order(order); complete = calculate_total(order)
    return {
        "id": normalized["id"],
        "e-mail": normalized["customer_email"].decrease(),
        "created_at": normalized["created_at"].isoformat(),
        "complete": complete,
        "item_count": len(normalized["items"]),
    }

def recent_order_totals(orders):
    summaries = []
    for order in orders:
        summaries.append(build_order_summary(order))

    summaries.kind(key=lambda x: x["created_at"], reverse=True)
    return summaries[:10]

There’s quite a bit to love about code like this once you’re “transferring quick and breaking issues”. It’s brief and readable, and possibly even works on the primary couple of pattern inputs you strive.

However there are additionally a number of bugs or design issues ready within the wings. If customer_email is lacking, for instance, the .decrease() technique will increase an AttributeError. There may be additionally an assumption that the gadgets variable all the time comprises the anticipated keys. There’s an unused import and a leftover variable from what seems to be an incomplete refactor. And within the remaining perform, the complete end result set is sorted despite the fact that solely the ten most up-to-date gadgets are wanted. That final level issues as a result of we wish our code to be as environment friendly as doable. If we solely want the highest ten, we should always keep away from absolutely sorting the dataset every time doable.

It’s code like this the place a great workflow begins paying for itself.

With that being stated, let’s take a look at among the instruments you need to use in your code improvement pipeline, which is able to guarantee your code has the absolute best probability to be right, maintainable and performant. All of the instruments I’ll focus on are free to obtain, set up and use.

Be aware that among the instruments I point out are multi-purpose. For instance among the formatting that the black utility can do, can be accomplished with the ruff device. Typically it’s simply down to private desire which of them you utilize.

Software #1: Readable code with no formatting noise

The primary device I normally set up is known as Black. Black is a Python code formatter. Its job could be very easy, it takes your supply code and mechanically applies a constant fashion and format.

Set up and use

Set up it utilizing pip or your most well-liked Python package deal supervisor. After that, you’ll be able to run it like this,

$ black your_python_file.py

or

$ python -m black your_python_file

Black requires Python model 3.10 or later to run.

Utilizing a code formatter might sound beauty, however I believe formatters are extra essential than individuals generally admit. You don’t need to spend psychological vitality deciding how a perform name ought to wrap, the place a line break ought to go, or whether or not you might have formatted a dictionary “properly sufficient.” Your code must be constant so you’ll be able to concentrate on logic reasonably than presentation.

Suppose you might have written this perform in a rush.

def build_order_summary(order): normalized=normalize_order(order); complete=calculate_total(order)
return {"id":normalized["id"],"e-mail":normalized["customer_email"].decrease(),"created_at":normalized["created_at"].isoformat(),"complete":complete,"item_count":len(normalized["items"])}

It’s messy, however Black turns that into this.

def build_order_summary(order):
    normalized = normalize_order(order)
    complete = calculate_total(order)
    return {
        "id": normalized["id"],
        "e-mail": normalized["customer_email"].decrease(),
        "created_at": normalized["created_at"].isoformat(),
        "complete": complete,
        "item_count": len(normalized["items"]),
    }

Black hasn’t mounted any enterprise logic right here. However it has accomplished one thing extraordinarily helpful: it has made the code simpler to examine. When the formatting disappears as a supply of friction, any actual coding issues develop into a lot simpler to see.

Black is configurable in many various methods, which you’ll be able to examine in its official documentation. (Hyperlinks to this and all of the instruments talked about are on the finish of the article)

Software #2: Catching the small suspicious errors

As soon as formatting is dealt with, I normally add Ruff to the pipeline. Ruff is a Python linter written in Rust. Ruff is quick, environment friendly and superb at what it does.

Set up and use

Like Black, Ruff could be put in with any Python package deal supervisor.

$ pip set up ruff

$ # And used like this
$ ruff verify your_python_code.py

Linting is beneficial as a result of many bugs start life as little suspicious particulars. Not deep logic flaws or intelligent edge circumstances. Simply barely mistaken code.

For instance, let’s say we now have the next easy code. In our pattern module, for instance, there’s a few unused imports and a variable that’s assigned however by no means actually wanted:

from datetime import datetime
import json

def calculate_total(order):
    complete = 0
    low cost = 0

    for merchandise so as["items"]:
        complete += merchandise["price"] * merchandise["quantity"]

    if order.get("discount_code"):
        complete *= 0.9

    return spherical(complete, 2)

Ruff can catch these instantly:

$ ruff verify test1.py

F401 [*] `datetime.datetime` imported however unused
 --> test1.py:1:22
  |
1 | from datetime import datetime
  |                      ^^^^^^^^
2 | import json
  |
assist: Take away unused import: `datetime.datetime`

F401 [*] `json` imported however unused
 --> test1.py:2:8
  |
1 | from datetime import datetime
2 | import json
  |        ^^^^
3 |
4 | def calculate_total(order):
  |
assist: Take away unused import: `json`

F841 Native variable `low cost` is assigned to however by no means used
 --> test1.py:6:5
  |
4 | def calculate_total(order):
5 |     complete = 0
6 |     low cost = 0
  |     ^^^^^^^^
7 |
8 |     for merchandise so as["items"]:
  |
assist: Take away task to unused variable `low cost`

Discovered 3 errors.
[*] 2 fixable with the `--fix` choice (1 hidden repair could be enabled with the `--unsafe-fixes` choice).

Software #3: Python begins feeling a lot safer

Formatting and linting assist, however neither actually addresses the supply of a lot of the difficulty in Python: assumptions about information.

That’s the place mypy is available in. Mypy is a static sort checker for Python.

Set up and use

Set up it with pip, then run it like this

$ pip set up mypy

$ # To run use this

$ mypy test3.py

Mypy will run a kind verify in your code (with out truly executing it). This is a vital step as a result of many Python bugs are actually data-shape bugs. You assume a subject exists. You assume a worth is a string or {that a} perform returns one factor when in actuality it generally returns one other.

To see it in motion, let’s add some sorts to our order instance.

from datetime import datetime
from typing import NotRequired, TypedDict

class Merchandise(TypedDict):
    worth: float
    amount: int

class RawOrder(TypedDict):
    id: str
    gadgets: listing[Item]
    created_at: str
    customer_email: NotRequired[str]
    discount_code: NotRequired[str]

class NormalizedOrder(TypedDict):
    id: str
    customer_email: str | None
    gadgets: listing[Item]
    created_at: datetime
    discount_code: str | None

class OrderSummary(TypedDict):
    id: str
    e-mail: str
    created_at: str
    complete: float
    item_count: int

Now we are able to annotate our capabilities.

def normalize_order(order: RawOrder) -> NormalizedOrder:
    return {
        "id": order["id"],
        "customer_email": order.get("customer_email"),
        "gadgets": order["items"],
        "created_at": datetime.fromisoformat(order["created_at"]),
        "discount_code": order.get("discount_code"),
    }

def calculate_total(order: RawOrder) -> float:
    complete = 0.0

    for merchandise so as["items"]:
        complete += merchandise["price"] * merchandise["quantity"]

    if order.get("discount_code"):
        complete *= 0.9

    return spherical(complete, 2)

def build_order_summary(order: RawOrder) -> OrderSummary:
    normalized = normalize_order(order)
    complete = calculate_total(order)

    return {
        "id": normalized["id"],
        "e-mail": normalized["customer_email"].decrease(),
        "created_at": normalized["created_at"].isoformat(),
        "complete": complete,
        "item_count": len(normalized["items"]),
    }

Now the bug is far more durable to cover. For instance,

$ mypy test3.py
take a look at.py:36: error: Merchandise "None" of "str | None" has no attribute "decrease"  [union-attr]
Discovered 1 error in 1 file (checked 1 supply file)

customer_email comes from order.get(“customer_email”), which suggests it might be lacking and subsequently evaluates to None. Mypy tracks that asstr | None, and accurately rejects calling .decrease() on it with out first dealing with the None case.

It could appear a easy factor, however I believe it’s a giant win. Mypy forces you to be extra sincere in regards to the form of the info that you simply’re truly dealing with. It turns imprecise runtime surprises into early, clearer suggestions.

Software #4: Testing, testing 1..2..3

At first of this text, we recognized three issues in our order-processing code: a crash when customer_email is lacking, unchecked assumptions about merchandise keys, and an inefficient kind, which we’ll return to later. Black, Ruff and Mypy have already helped us deal with the primary two structurally. However instruments that analyse code statically can solely go up to now. In some unspecified time in the future, it’s worthwhile to confirm that the code truly behaves accurately when it runs. That’s what pytest is for.

Set up and use

$ pip set up pytest
$
$ # run it with 
$ pytest your_test_file.py

Pytest has an excessive amount of performance, however its easiest and most helpful characteristic can be its most direct: the assert directive. If the situation you say is fake, the take a look at fails. That’s it. No elaborate framework to be taught earlier than you’ll be able to write one thing helpful.

Assuming we now have a model of the code that handles lacking emails gracefully, together with a pattern base_order, here’s a take a look at that protects the low cost logic:

import pytest

@pytest.fixture
def base_order():
    return {
        "id": "order-123",
        "customer_email": "[email protected]",
        "created_at": "2025-01-15T10:30:00",
        "gadgets": [
            {"price": 20, "quantity": 2},
            {"price": 5, "quantity": 1},
        ],
    }

def test_calculate_total_applies_10_percent_discount(base_order):
    base_order["discount_code"] = "SAVE10"

    complete = calculate_total(base_order)

    subtotal = (20 * 2) + (5 * 1)
    anticipated = subtotal * 0.9

    assert complete == anticipated

And listed below are the checks that defend the e-mail dealing with, particularly the crash we flagged firstly, the place calling .decrease() on a lacking e-mail would convey the entire perform down:

def test_build_order_summary_returns_valid_email(base_order):
    abstract = build_order_summary(base_order)

    assert "e-mail" in abstract
    assert abstract["email"].endswith("@instance.com")

def test_build_order_summary_when_email_missing(base_order):
    base_order.pop("customer_email")

    abstract = build_order_summary(base_order)

    assert abstract["email"] == ""

That second take a look at is essential too. With out it, a lacking e-mail is a silent assumption — code that works superb in improvement after which throws an AttributeError the primary time an actual order is available in with out that subject. With it, the belief is express and checked each time the take a look at suite runs.

That is the division of labour value conserving in thoughts. Ruff catches unused imports and lifeless variables. Mypy catches unhealthy assumptions about information sorts. Pytest catches one thing totally different: it protects behaviour. While you change the best way build_order_summary handles lacking fields, or refactor calculate_total, pytest is what tells you whether or not you’ve damaged one thing that was beforehand working. That’s a unique type of security internet, and it operates at a unique degree from every little thing that got here earlier than it.

Software #5: As a result of your reminiscence isn’t a dependable quality-control system

Even with a great toolchain, there’s nonetheless one apparent weak point: you’ll be able to neglect to run it. That’s the place a device like pre-commit comes into its personal. Pre-commit is a framework for managing and sustaining multi-language hooks, comparable to those who run once you commit code to GitHub or push it to your repo.

Set up and use

The usual setup is to pip set up it, then add a .pre-commit-config.yaml file, and run pre-commit set up so the hooks run mechanically earlier than every decide to your supply code management system, e.g., GitHub

A easy config would possibly appear like this:

repos:
  - repo: https://github.com/psf/black
    rev: 24.10.0
    hooks:
      - id: black

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.11.13
    hooks:
      - id: ruff
      - id: ruff-format

  - repo: native
    hooks:
      - id: mypy
        title: mypy
        entry: mypy
        language: system
        sorts: [python]
        phases: [pre-push]

      - id: pytest
        title: pytest
        entry: pytest
        language: system
        pass_filenames: false
        phases: [pre-push]

Now you run it with,

$ pre-commit set up

pre-commit put in at .git/hooks/pre-commit

$ pre-commit set up --hook-type pre-push

pre-commit put in at .git/hooks/pre-push

From that time on, the checks run mechanically when your code is modified and dedicated/pushed.

git commit → triggers black, ruff, ruff-format
git push → triggers mypy and pytest

Right here’s an instance.

Let’s say we now have the next Python code in file test1.py

from datetime import datetime
import json


def calculate_total(order):
    complete = 0
    low cost = 0

    for merchandise so as["items"]:
        complete += merchandise["price"] * merchandise["quantity"]

    if order.get("discount_code"):
        complete *= 0.9

    return spherical(complete, 2)

Create a file referred to as .pre-commit-config.yaml with the YAML code from above. Now if test1.py is being tracked by git, right here’s the kind of output to count on once you commit it.

$ git commit test1.py

[INFO] Initializing surroundings for https://github.com/psf/black.
[INFO] Initializing surroundings for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Putting in surroundings for https://github.com/psf/black.
[INFO] As soon as put in this surroundings will probably be reused.
[INFO] This may increasingly take a couple of minutes...
[INFO] Putting in surroundings for https://github.com/astral-sh/ruff-pre-commit.
[INFO] As soon as put in this surroundings will probably be reused.
[INFO] This may increasingly take a couple of minutes...
black....................................................................Failed
- hook id: black
- recordsdata had been modified by this hook

reformatted test1.py

All accomplished! ✨ 🍰 ✨
1 file reformatted.

ruff (legacy alias)......................................................Failed
- hook id: ruff
- exit code: 1

test1.py:1:22: F401 [*] `datetime.datetime` imported however unused
  |
1 | from datetime import datetime
  |                      ^^^^^^^^ F401
2 | import json
  |
  = assist: Take away unused import: `datetime.datetime`

test1.py:2:8: F401 [*] `json` imported however unused
  |
1 | from datetime import datetime
2 | import json
  |        ^^^^ F401
  |
  = assist: Take away unused import: `json`

test1.py:7:5: F841 Native variable `low cost` is assigned to however by no means used
  |
5 | def calculate_total(order):
6 |     complete = 0
7 |     low cost = 0
  |     ^^^^^^^^ F841
8 |
9 |     for merchandise so as["items"]:
  |
  = assist: Take away task to unused variable `low cost`

Discovered 3 errors.
[*] 2 fixable with the `--fix` choice (1 hidden repair could be enabled with the `--unsafe-fixes` choice).

Software #6: As a result of “right” code can nonetheless be damaged

There may be one remaining class of issues that I believe will get underestimated when growing code: efficiency. A perform could be logically right and nonetheless be mistaken in observe if it’s too gradual or too memory-hungry.

A profiling device I like for that is referred to as py-spy. Py-spy is a sampling profiler for Python applications. It may profile Python with out restarting the method or modifying the code. This device is totally different from the others we’ve mentioned, as you usually wouldn’t use it in an automatic pipeline. As a substitute, that is extra of a one-off course of to be run in opposition to code that was already formatted, linted, sort checked and examined.

Set up and use

$ pip set up py-spy

Now let’s revisit the “prime ten” instance. Right here is the unique perform once more:

Right here’s the unique perform once more:

def recent_order_totals(orders):
    summaries = []
    for order in orders:
        summaries.append(build_order_summary(order))

    summaries.kind(key=lambda x: x["created_at"], reverse=True)
    return summaries[:10]

If all I’ve is an unsorted assortment in reminiscence, then sure, you continue to want some ordering logic to know which ten are the newest. The purpose is to not keep away from ordering totally, however to keep away from doing a full type of the complete dataset if I solely want the perfect ten. A profiler helps you get to that extra exact degree.

There are numerous totally different instructions you’ll be able to run to profile your code utilizing py-spy. Maybe the best is:

$ py-spy prime python test3.py

Accumulating samples from 'python test3.py' (python v3.11.13)
Complete Samples 100
GIL: 22.22%, Energetic: 51.11%, Threads: 1

  %Personal   %Complete  OwnTime  TotalTime  Perform (filename)
 16.67%  16.67%   0.160s    0.160s   _path_stat ()
 13.33%  13.33%   0.120s    0.120s   get_data ()
  7.78%   7.78%   0.070s    0.070s   _compile_bytecode ()
  5.56%   6.67%   0.060s    0.070s   _init_module_attrs ()
  2.22%   2.22%   0.020s    0.020s   _classify_pyc ()
  1.11%   1.11%   0.010s    0.010s   _check_name_wrapper ()
  1.11%  51.11%   0.010s    0.490s   _load_unlocked ()
  1.11%   1.11%   0.010s    0.010s   cache_from_source ()
  1.11%   1.11%   0.010s    0.010s   _parse_sub (re/_parser.py)
  1.11%   1.11%   0.010s    0.010s    (importlib/metadata/_collections.py)
  0.00%  51.11%   0.010s    0.490s   _find_and_load ()
  0.00%   4.44%   0.000s    0.040s    (pygments/formatters/__init__.py)
  0.00%   1.11%   0.000s    0.010s   _parse (re/_parser.py)
  0.00%   0.00%   0.000s    0.010s   _path_importer_cache ()
  0.00%   4.44%   0.000s    0.040s    (pygments/formatter.py)
  0.00%   1.11%   0.000s    0.010s   compile (re/_compiler.py)
  0.00%  50.00%   0.000s    0.470s    (_pytest/_code/code.py)
  0.00%  27.78%   0.000s    0.250s   get_code ()
  0.00%   1.11%   0.000s    0.010s    (importlib/metadata/_adapters.py)
  0.00%   1.11%   0.000s    0.010s    (e-mail/charset.py)
  0.00%  51.11%   0.000s    0.490s    (pytest/__init__.py)
  0.00%  13.33%   0.000s    0.130s   _find_spec ()

Press Management-C to give up, or ? for assist.

prime provides you a reside view of which capabilities are consuming probably the most time, which makes it the quickest strategy to get oriented earlier than doing something extra detailed.

As soon as we realise there could also be a problem, we are able to think about various implementations of our code. In our instance case, one choice could be to make use of heapq.nlargest in our perform:

from datetime import datetime
from heapq import nlargest

def recent_order_totals(orders):
    return nlargest(
        10,
        (build_order_summary(order) for order in orders),
        key=lambda x: datetime.fromisoformat(x["created_at"]),
    )

The brand new code nonetheless performs comparisons, nevertheless it avoids absolutely sorting each abstract simply to discard nearly all of them. In my checks on giant inputs, the model utilizing the heapq was 2–3 instances quicker than the unique perform. And in an actual system, the perfect optimisation is usually to not clear up this in Python in any respect. If the info comes from a database, I might normally choose to ask the database for the ten most up-to-date rows straight.

The explanation I convey this up is that efficiency recommendation will get imprecise in a short time. “Make it quicker” isn’t helpful. “Keep away from sorting every little thing once I solely want ten outcomes” is beneficial. A profiler helps you get to that extra exact degree.

Sources

Listed here are the official GitHub hyperlinks for every device:

+------------+---------------------------------------------+
| Software       | Official web page                               |
+------------+---------------------------------------------+
| Ruff       | https://github.com/astral-sh/ruff           |
| Black      | https://github.com/psf/black                |
| mypy       | https://github.com/python/mypy              |
| pytest     | https://github.com/pytest-dev/pytest        |
| pre-commit | https://github.com/pre-commit/pre-commit    |
| py-spy     | https://github.com/benfred/py-spy           |
+------------+---------------------------------------------+

Be aware additionally that many trendy IDEs, comparable to VSCode and PyCharm, have plugins for these instruments that present suggestions as you sort, making them much more helpful.

Abstract

Python’s best power — the velocity at which you’ll be able to go from thought to working code — can be the factor that makes disciplined tooling value investing in. The language received’t cease you from making assumptions about information shapes, leaving lifeless code round, or writing a perform that works completely in your take a look at enter however falls over in manufacturing. That’s not a criticism of Python. It’s simply the trade-off you’re making.

The instruments on this article assist get better a few of that security with out sacrificing velocity.

Black handles formatting so that you by no means have to consider it once more. Ruff catches the small suspicious particulars — unused imports, assigned-but-ignored variables — earlier than they quietly survive right into a launch. Mypy forces you to be sincere in regards to the form of the info you’re truly passing round, turning imprecise runtime crashes into early, particular suggestions. Pytest protects behaviour in order that once you change one thing, you recognize instantly what you broke. Pre-commit makes all of this automated, eradicating the only largest weak point in any handbook course of: remembering to run it.

Py-spy sits barely other than the others. You don’t run it on each commit. You attain for it when one thing right remains to be too gradual — when it’s worthwhile to transfer from “make it quicker” to one thing exact sufficient to really act on.

None of those instruments is an alternative to pondering fastidiously about your code. What they do is give errors fewer locations to cover. And in a language as permissive as Python, that’s value quite a bit.

Be aware that there are a number of instruments that may change any a type of talked about above, so you probably have a favorite linter that’s not ruff, for instance, be at liberty to make use of it in your workflow as a substitute.

Source link

Building a Python Workflow That Catches Bugs Before Production

Escaping the Valley of Choice in BI

Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

How to Combine Claude Code and Codex for Maximum Coding Power

It’s the Lessons We Learned Along the Way. Or, Is It?

Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

How small businesses can leverage AI

Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt

GM reimagines Hummer off-roader with California ideas unit

London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

Featured Picks

Lamont bill targets prediction markets after Connecticut sports wagering enforcement actions

Africa Engineering Hardware: Transforming Education

How Long Should You Stay in a Sauna? (2025)

Building a Python Workflow That Catches Bugs Before Production

Software #1: Readable code with no formatting noise

Set up and use

Software #2: Catching the small suspicious errors

Set up and use

Software #3: Python begins feeling a lot safer

Set up and use

Software #4: Testing, testing 1..2..3

Set up and use

Software #5: As a result of your reminiscence isn’t a dependable quality-control system

Set up and use

Software #6: As a result of “right” code can nonetheless be damaged

Set up and use

Sources

Abstract

Related Posts