Multi-object monitoring (MOT) is a job wherein an algorithm should detect and monitor a number of objects in a video. Most identified algorithms are primarily based on utilizing easy detectors (e.g. YOLO) designed for processing particular person pictures. The general technique includes individually utilizing a detector on consecutive video frames after which matching the corresponding bounding packing containers throughout totally different frames that belong to the identical objects.
The core half that makes MOT algorithms totally different is how they carry out matching between video frames. They will keep in mind a number of components to carry out matching:
- bounding field positions;
- occlusions (when bounding packing containers of a number of objects intersect with one another);
- object’s movement;
- bodily object similarity;
In some circumstances, for each pair of bounding packing containers on consecutive frames A and B, these traits are mixed right into a single quantity that describes the chance {that a} pair of bounding packing containers detected in frames A and B belongs to the identical object.
These values are calculated for all pairs of bounding packing containers in frames A and B. Then, the MOT algorithm makes an attempt to establish the absolute best matching between all bounding packing containers. Extra concretely, given n detected bounding packing containers in each frames A and B, the objective is to create a mapping between each bounding field from A to B in a manner that each bounding field is used solely as soon as.
Hungarian algorithm
The Hungarian algorithm is normally studied in algorithms and information construction programs. However, it additionally has purposes in matching methods, and, specifically, is incessantly used to resolve the tracklet matching downside talked about above.
We’re going to examine the workflow of the Hungarian algorithm in easier settings. As soon as we’ve understood how it’s used, we will apply it to MOT issues as nicely simply.
The algorithm known as Hungarian as a result of it’s “was largely primarily based on the sooner works of two Hungarian mathematicians, Dénes Kőnig and Jenő Egerváry” — Hungarian Algorithm | Wikipedia
Formulation
There exist many examples to display the Hungarian algorithm. I just like the one with employees and duties. Right here is the formulation:
There are n employees accessible, and n duties should be accomplished by them. There’s details about the wage each employee receives for each job. As for the corporate director, the issue consists of optimally assigning duties to employees given the next circumstances:
- each employee will get assigned just one job;
- all duties get accomplished;
- the cash spent on salaries must be minimized.
We’re going to remedy this downside by utilizing the next 4 x 4 price matrix for example:

Geometrically, given the matrix above, the target consists of selecting n matrix parts in a way that there are not any repeating parts in any row or column, and the whole sum of chosen parts is minimal.
Concept
The Hungarian technique includes reworking the preliminary price matrix into a brand new kind that facilitates the answer search. For this, we are going to use a number of matrix transformations. Though the matrix can be modified, the issue will all the time stay equal, which means that the answer will nonetheless be the identical.
To maintain issues easy, we aren’t going to show right here mathematically why this or that matrix transformation maintains the issue invariant. As a substitute, we are going to present some logical ideas to clarify why the answer stays the identical.
Instance
1. Row & column discount
Step one consists of figuring out a minimal ingredient in each row of the matrix and subtracting it from every row. The concept right here is to get not less than one zero in each row. In follow, having extra zeros simplifies the issue.
Suppose that some quantity m is subtracted from a given row. Whereas the target worth (the whole minimized wage) modifications in the course of the transformation (it decreases by m), the relative price between the assignments for a similar employee stays unchanged. Due to this fact, the rating of options remained unchanged.
The analogous process is then carried out on columns: a minimal ingredient in each column is subtracted from that column.

After the primary two transformations, we receive a matrix with some zeros representing potential assignments.
The following step consists of drawing the minimal variety of horizontal and vertical traces in a manner that they move by way of all zeros within the matrix. Within the picture under, we are able to draw ok = 3 traces in whole to cowl all zeros.

If the variety of drawn traces equals the dimension n of the matrix, then we’ve discovered an answer. The one step left in such a case is to decide on n zeros such that no zero is on the identical horizontal or vertical line as one other zero.
Since, in our instance, n ≠ ok, (the matrix dimension n = 4; the variety of drawn traces ok = 3), it means we should carry out an adjustment step.
2. Adjustment step
To date, we’ve drawn a number of traces, and we are able to classify the matrix parts into three classes:
- Uncovered parts;
- Coated parts (solely as soon as);
- Nook parts (parts which can be lined twice — horizontally and vertically).
The concept of the adjustment step consists of figuring out the minimal ingredient amongst uncovered parts and subtracting it from all uncovered parts. On the identical time, this worth will get added to all nook parts.
In our instance, the minimal uncovered ingredient is 2. Consequently, we subtract 2 from all uncovered parts (in pink) and a couple of from all nook parts (in inexperienced).

Though it won’t be apparent why the issue invariant stays maintained after the adjustment step, it may be mathematically confirmed that subtracting a quantity from all uncovered prices is equally compensated by its addition to all lined prices twice, which maintains the optimum resolution the identical.
This transformation was a single iteration of the adjustment step, which led us to a different equal matrix kind. As earlier than, we carry out the test to seek out out if we are able to cowl all zeros utilizing solely n traces.

As we are able to see, this time we certainly have to attract ok = n = 4 traces to cowl all zeros. It signifies that we are able to lastly retrieve our resolution!
In any other case, if we may have drawn fewer than ok < 4 traces, we must always have repeated the adjustment step till the variety of traces turned ok = 4.
3. Answer retrieval
The one step left is to seek out n zeros on totally different vertical and horizontal traces. If doing it manually, it’s higher to start out with traces which have fewer zeroes.
After discovering the positions of zeros within the matrix, we are able to change again to the unique matrix and select the preliminary parts with the identical positions because the discovered zeros. And that would be the closing task!

The complexity of the Hungarian algorithm is O(n³), the place n is the matrix dimension.
The task downside we’ve simply seen will also be solved by linear programming strategies.
Functions
One of the apparent purposes of the Hungarian algorithm consists of utilizing it for task issues, the place, given n duties, the objective is to optimally affiliate them with different folks or objects (e.g., machines) that can full them.
There’s additionally a selected software in laptop imaginative and prescient. Many video monitoring algorithms (MOT) are primarily based on the mixture of normal picture detection algorithms (e.g., YOLO) and logic of merging detection outcomes from a number of unbiased frames right into a video stream.
Allow us to take a simplified instance of two consecutive picture frames of a video:

We run a MOT algorithm for object monitoring primarily based on YOLO. The predictions of YOLO are proven under in grey packing containers.

The target is to affiliate bounding packing containers between each frames to maintain monitor of objects. One attainable manner to do that includes analyzing the change in distance between bounding packing containers between the 2 frames. It’s logical to imagine {that a} pair of bounding packing containers belongs to the identical object if their place don’t change rather a lot between the 2 frames.
Right here is the place the Hungarian algorithm comes into play. We will assemble a matrix representing pairwise distances (which would be the price features) between the coordinates of various bounding packing containers in each frames. We will discover such a mapping that minimizes the whole distance between bounding packing containers.

By working the Hungarian algorithm, we get the next mappings:
(A₁, C₂), (B₁, A₂), (C₁, B₂).
We will confirm that they lead to a complete price of 8 + 1 + 5 = 14 which is the minimal attainable perform price for this matrix. Therefore, the discovered mappings are optimum.

Definitely, trendy MOT algorithms contemplate further components when matching bounding packing containers, together with trajectory evaluation, object pace and route, and bodily similarity, amongst others. For simplicity, we solely thought-about a single issue: the space between the bounding packing containers. Nonetheless, in actuality, extra components are taken under consideration.
Conclusion
On this article, we’ve appeared on the Hungarian algorithm used to resolve job task issues. By performing easy operations on the preliminary information matrix, the Hungarian algorithm transforms it to different codecs whereas sustaining the issue invariant.
Regardless of its cubic complexity, the Hungarian algorithm has a variety of purposes in matching and laptop imaginative and prescient issues the place the variety of objects will not be too massive.
Sources
All pictures except in any other case famous are by the creator.

