YOLOv1 Loss Function Walkthrough: Regression for All

In my previous article I defined how YOLOv1 works and how one can assemble the structure from scratch with PyTorch. In as we speak’s article, I’m going to concentrate on the loss perform used to coach the mannequin. I extremely advocate you learn my earlier YOLOv1 article earlier than studying this one because it covers a number of fundamentals you should know. Click on on the hyperlink at reference quantity [1] to get there.

What’s a Loss Perform?

I consider all of us already know that loss perform is an especially essential element in deep studying (and in addition machine studying), the place it’s used to judge how good our mannequin is in predicting the bottom fact. Typically talking, a loss perform ought to have the ability to take two enter values, specifically the goal and the prediction made by the mannequin. This perform goes to return a big worth each time the prediction is way from the bottom fact. Conversely, the loss worth might be small each time the mannequin efficiently offers a prediction near the goal.

Usually, a mannequin is used for both classification or regression solely. Nevertheless, YOLOv1 is a bit particular because it incorporates a classification job — to categorise the detected objects, whereas the objects themselves might be enclosed with bounding packing containers which the coordinates and the sizes are decided utilizing steady numbers — therefore a regression job. We usually use cross entropy loss when coping with classification job, and for regression we will use one thing like MAE, MSE, SSE, or RMSE. However because the prediction made by YOLOv1 contains each classification and regression directly, we have to create a customized loss perform to accommodate each duties. And right here’s the place issues begin to get attention-grabbing.

Breaking Down the Parts

Now let’s take a look on the loss perform itself. Under is what it appears like in keeping with the unique YOLOv1 paper [2].

Determine 1. The loss perform of YOLOv1 [2].

Sure, the above equation appears scary at look, and that’s precisely what I felt once I first noticed it. However don’t fear as you can find this equation simple as we get deeper into it. I’ll undoubtedly attempt my finest to elucidate the whole lot in easy phrases.

Right here you possibly can see that the loss perform principally consists of 5 rows. Now let’s get into every of them one after the other.

Row #1: Midpoint Loss

Determine 2. The half for calculating the midpoint coordinate prediction loss [2].

The primary time period of the loss perform focuses on evaluating the thing midpoint coordinate prediction. You possibly can see in Determine 2 above that what it basically does is solely evaluating the anticipated midpoint (x_hat, y_hat) with the corresponding goal midpoint (x, y) by subtraction earlier than summing the squared outcomes of the x and the y components. We do that iteratively for the 2 predicted bounding packing containers (B) inside all cells (S) and sum the error values from all of them. Or in different phrases, what we principally do right here is to compute the SSE (Sum of Squared Errors) of the coordinate predictions. Assuming that we use the default YOLOv1 configuration (i.e., S=7 and B=2), we could have the primary and the second sigma iterates 49 and a pair of instances, respectively.

Moreover, the 1^obj variable you see here’s a binary masks, through which the worth can be 1 each time there’s an object midpoint throughout the corresponding cell within the floor fact. But when there isn’t a object midpoint contained inside, then the worth can be 0 as a substitute, which cancels out all operations inside that cell as a result of there’s certainly nothing to foretell.

Row #2: Dimension Loss

Determine 3. The half for calculating the bounding field dimension prediction loss [2].

The main target of the second row is to judge the correctness of the bounding field dimension. I consider the above variables are fairly simple: w denotes the width and h denotes the peak, the place those with hats are the predictions made by the mannequin. In case you take a better take a look at this row, you’ll discover that that is principally the identical because the earlier one, besides that right here we take the sq. root of the variables first earlier than doing the remaining computation.

Using sq. root is definitely a really intelligent concept. Naturally, if we immediately compute the variables as they’re (with out sq. root), the identical inaccuracy on small bounding field can be weighted the identical as that on giant bounding field. That is truly not a very good factor as a result of the identical deviation within the variety of pixels on small field will visually seem extra misaligned from the bottom fact than that of the bigger field. Take a look at Determine 4 beneath to raised perceive this concept. Right here you possibly can see that regardless that the deviation of each circumstances are 60 pixels on the peak axis, however on the smaller bounding field the error seems worse. This is usually because within the case of the smaller field the deviation of 60 pixels is 75% of the particular object top, whereas on the bigger field it solely deviates 25% from the goal top.

Determine 4. The identical deviation within the variety of pixels will seem worse on small object than that of the bigger one [3].

By taking the sq. root of w and h, we could have inaccuracy in smaller field penalized greater than that within the bigger one. Let’s perform a little little bit of math to show this. To make issues less complicated, I put the 2 examples in Determine 4 to Gemini and let it compute the peak prediction error based mostly on the equation in Determine 3. You possibly can see the end result beneath that the error of the small bounding field prediction is bigger than that of the massive bounding field (8.349 vs 3.345).

Determine 5. Proof that the sq. root operation permits us to present greater penalty for inaccuracy on smaller field [3].

Row #3: Object Loss

Determine 6. The half for computing the thing loss [2].

Transferring on to the third row, this a part of the YOLOv1 loss perform is used to measure how assured the mannequin is in predicting whether or not or not there’s an object inside a cell. Each time an object is current within the floor fact, we have to set C to the IoU of the bounding field. Assuming that the anticipated field completely matches with the goal field, we basically need our mannequin to provide C_hat near 1. But when the anticipated field just isn’t fairly correct, say it has an IoU of 0.8, then we count on our mannequin to provide C_hat near 0.8 as effectively. Simply consider it like this: if the bounding field itself is inaccurate, then we must always count on our mannequin to know that the thing doesn’t completely current inside that field. In the meantime, each time an object is certainly not current within the floor fact, then the variable C must be precisely 0. Once more, we then sum all of the squared distinction between C and C_hat throughout all predictions made all through your entire picture to acquire the object loss of a single picture.

It’s price noting that C_hat is designed to replicate two issues concurrently: the chance that the thing being there (a.ok.a. objectness) and the accuracy of the bounding field (IoU). That is basically the rationale that we outline floor fact C because the multiplication of the objectness and the IoU as talked about within the paper. By doing so, we implicitly ask the mannequin to present C_hat, whose worth incorporates each parts.

Determine 7. Bounding field confidence is outlined because the multiplication of objectness and IoU [2].

As a refresher, IoU is a metric we generally use to measure how good our bounding field prediction is in comparison with the bottom fact by way of space protection. The way in which to compute IoU is solely to take the ratio of the intersection of the goal and predicted bounding packing containers to the union of them, therefore the identify: Intersection over Union.

Determine 8. An illustration of how one can compute IoU [3]. The IoU of two bounding packing containers that completely overlap one another is 1, whereas if two bounding packing containers don’t overlap in any respect then the IoU can be 0.

Row #4: No Object Loss

Determine 9. The so-called no-object loss time period within the YOLOv1 loss perform [2].

The so-called no-object loss is sort of distinctive. Regardless of having an identical computation because the object loss within the third row, the binary masks 1^noobj causes this half to work one thing just like the inverse of the object loss. It is because the binary masks worth can be 1 if there isn’t a object midpoint current inside a cell within the floor fact. In any other case, if an object midpoint is current, then the binary masks can be 0, inflicting the remaining operations for that single cell to be canceled out. So briefly, this row goes to return a non-zero quantity each time there isn’t a object within the floor fact however is predicted as containing an object midpoint.

Row #5: Classification Loss

Determine 10. The half for computing object classification loss [2].

The final row within the YOLOv1 loss perform is the classification loss. This a part of the loss perform is probably the most simple if I have been to say, as a result of what we basically do right here is simply to check the precise and the anticipated class, which has similarities to the one within the typical multi-class classification job. Nevertheless, what you want to bear in mind right here is that we nonetheless use the identical regression loss (i.e., SSE) to compute the error. It’s talked about within the paper that the authors determined to make use of this regression loss for each the regression and the classification components for the sake of simplicity.

Adjustable Parameters

Discover that I truly haven’t mentioned the λ_coord and λ_noobj parameters. The previous is used to present extra weight to the bounding field prediction, which is why it’s utilized to the primary and the second row of the loss perform. You possibly can return to Determine 1 to confirm this. The λ_coord parameter by default is ready to a big worth (i.e., 5) as a result of we wish our mannequin to concentrate on the correctness of the bounding field creation. So, any small inaccuracy within the xywh prediction might be penalized 5 instances bigger than what it must be.

In the meantime, λ_noobj is used to manage the no-object loss, i.e., the one within the fourth row within the loss perform. It’s talked about within the paper that the authors set a default worth of 0.5 for this parameter, which principally causes the no-object loss half to not be weighted as a lot. That is principally as a result of within the case of object detection the variety of objects is often a lot lower than the whole variety of cells, inflicting nearly all of the cells to not comprise an object. Thus, if we don’t give a small multiplier to the time period, the no-object loss will give a really excessive contribution to the whole loss, which the truth is just isn’t that essential. By setting λ_noobj to a small quantity, we will suppress the contribution of this loss.

Code Implementation

I do acknowledge that our earlier dialogue was very mathy. Don’t fear when you haven’t grasped your entire concept of the loss perform simply but. I consider you’ll ultimately perceive as soon as we get into the code implementation.

So now, let’s begin the code by importing the required modules as proven in Codeblock 1 beneath.

# Codeblock 1
import torch
import torch.nn as nn

The IoU Perform

Earlier than we get into the YOLOv1 loss, we’ll first create a helper to calculate IoU, which might be used inside the primary YOLOv1 perform. Take a look at the Codeblock 2 beneath to see how I implement it.

# Codeblock 2
def intersection_over_union(boxes_targets, boxes_predictions):

    box2_x1 = boxes_targets[..., 0:1] - boxes_targets[..., 2:3] / 2
    box2_y1 = boxes_targets[..., 1:2] - boxes_targets[..., 3:4] / 2
    box2_x2 = boxes_targets[..., 0:1] + boxes_targets[..., 2:3] / 2
    box2_y2 = boxes_targets[..., 1:2] + boxes_targets[..., 3:4] / 2
    
    box1_x1 = boxes_predictions[..., 0:1] - boxes_predictions[..., 2:3] / 2
    box1_y1 = boxes_predictions[..., 1:2] - boxes_predictions[..., 3:4] / 2
    box1_x2 = boxes_predictions[..., 0:1] + boxes_predictions[..., 2:3] / 2
    box1_y2 = boxes_predictions[..., 1:2] + boxes_predictions[..., 3:4] / 2

    x1 = torch.max(box1_x1, box2_x1)
    y1 = torch.max(box1_y1, box2_y1)
    x2 = torch.min(box1_x2, box2_x2)
    y2 = torch.min(box1_y2, box2_y2)

    intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0)    #(1)

    box1_area = torch.abs((box1_x2 - box1_x1) * (box1_y2 - box1_y1))
    box2_area = torch.abs((box2_x2 - box2_x1) * (box2_y2 - box2_y1))

    union = box1_area + box2_area - intersection + 1e-6       #(2)

    iou = intersection / union    #(3)

    return iou

The intersection_over_union() perform above works by taking two enter parameters, specifically the bottom fact (boxes_targets) and the anticipated bounding packing containers (boxes_predictions). These two inputs are principally arrays of size 4, storing the x, y, w, and h values. Be aware that x and y are the coordinate of the field midpoint, not the top-left nook. The bounding field info is then extracted in order that we will compute the intersection (#(1)) and the union (#(2)). We are able to lastly acquire the IoU utilizing the code at line #(3). Along with line #(2), right here we additionally want so as to add a really small worth on the finish of the operation (1e-6 = 0.000001). This quantity is beneficial to forestall division-by-zero error within the case when the realm of the anticipated bounding field is 0 for some causes.

Now let’s run the intersection_over_union() perform we simply created on a number of check circumstances so as to verify if it really works correctly. The three examples in Determine 11 beneath present intersections with excessive, medium, and low IoU (from left to proper, respectively).

Determine 11. Bounding field with totally different overlaps [3].

All of the packing containers you see right here have the dimensions of 200×200 px, and what makes the three circumstances totally different is barely their space of the intersections. In case you take a better take a look at the Codeblock 3 beneath, you will note that the anticipated packing containers (pred_{0,1,2}) are shifted by 20, 100, and 180 pixels from their respective targets (target_{0,1,2}) alongside each the horizontal and vertical axes.

# Codeblock 3
target_0 = torch.tensor([[0., 0., 200., 200.]])
pred_0   = torch.tensor([[20., 20., 200., 200.]])
iou_0    = intersection_over_union(target_0, pred_0)
print('iou_0:', iou_0)

target_1 = torch.tensor([[0., 0., 200., 200.]])
pred_1   = torch.tensor([[100., 100., 200., 200.]])
iou_1    = intersection_over_union(target_1, pred_1)
print('iou_1:', iou_1)

target_2 = torch.tensor([[0., 0., 200., 200.]])
pred_2   = torch.tensor([[180., 180., 200., 200.]])
iou_2    = intersection_over_union(target_2, pred_2)
print('iou_2:', iou_2)

Because the above code is run, you possibly can see that our instance on the left has the best IoU of 0.6807, adopted by the one within the center and the one on the fitting with the scores of 0.1429 and 0.0050, a pattern that’s precisely what we anticipated earlier. This basically proves that our intersection_over_union() perform works effectively.

# Codeblock 3 Output
iou_0: tensor([[0.6807]])
iou_1: tensor([[0.1429]])
iou_2: tensor([[0.0050]])

The YOLOv1 Loss Perform

There may be truly one other factor we have to do earlier than creating the loss perform, specifically instantiating an nn.MSELoss occasion which is able to assist us compute the error values throughout all cells. Because the identify suggests, this perform by default will compute MSE (Imply Squared Error). Since we wish the error worth to be summed as a substitute of averaged, we have to set the discount parameter to sum as proven in Codeblock 4 beneath. Subsequent, we initialize the lambda_coord, lambda_noobj, S, B, and C parameters, which on this case I set all of them to their default values talked about within the unique paper. Right here I additionally initialize the BATCH_SIZE parameter which signifies the variety of samples we’re going to course of in a single ahead cross.

# Codeblock 4
sse = nn.MSELoss(discount="sum")

lambda_coord = 5
lambda_noobj = 0.5

S = 7
B = 2
C = 20

BATCH_SIZE = 1

Alright, as all pre-requisite variables have been initialized, now let’s truly outline the loss() perform for the YOLOv1 mannequin. This perform is sort of lengthy, so I made a decision to interrupt it down into a number of components. Simply be sure that the whole lot is positioned throughout the similar cell if you wish to attempt operating this code by yourself pocket book.

You possibly can see in Codeblock 5a beneath that this perform takes two enter arguments: goal and prediction (#(1)). Keep in mind that initially the output of YOLOv1 (the prediction) is an extended single dimensional tensor of size 1470, whereas the size of the goal tensor is 1225. What we have to do first contained in the loss() perform is to reshape them into 7×7×30 (#(3)) and seven×7×25 (#(2)), respectively, in order that we will course of the data contained in each tensors simply.

# Codeblock 5a
def loss(goal, prediction):    #(1)
    
    goal = goal.reshape(-1, S, S, C+5)                #(2)
    prediction = prediction.reshape(-1, S, S, C+B*5)      #(3)

    obj = goal[..., 20].unsqueeze(3)      #(4)
    noobj = 1 - obj                         #(5)

Subsequent, the code at traces #(4) and #(5) are simply how we implement the 1^obj and 1^noobj binary masks. At line #(4) we take the worth at index 20 from the goal tensor and retailer it in obj variable. Index 20 itself corresponds to the bounding field confidence (see Determine 12), which if there’s an object midpoint throughout the cell, then the worth of that index can be 1. In any other case, if object midpoint just isn’t current, then the worth can be 0. Conversely, the noobj variable I initialize at line #(5) will act because the inverse of obj, which the worth can be 1 if there isn’t a object midpoint current within the grid cell.

Determine 12. What the goal and prediction vector for every grid cell appears like. The goal bounding field confidence is saved at index 20, whereas the anticipated bounding field confidences are at index 20 and 25, every of their corresponding vectors [3]. Learn extra about this in my earlier article at reference quantity [1].

Now let’s transfer on to Codeblock 5b, the place we compute the bounding field error, which corresponds to the primary and the second rows of the loss perform. What we basically do initially is to take the xywh values from the goal tensor (indices 21, 22, 23, and 24). This may be performed with a easy array slicing approach as proven at line #(1). Subsequent, we do the identical factor to the predicted tensor. Nevertheless, do not forget that since our mannequin generates two bounding packing containers for every cell, we have to retailer their xywh values into two separate variables: pred_bbox0 and pred_bbox1 (#(2–3)).

In Determine 12, the sliced indices are those known as x1, y1, w1, h1, and x2, y2, w2, h2. Among the many two bounding field predictions, we’ll solely take the one which finest approximates the goal field. Therefore, we have to compute the IoU between each predicted packing containers and the goal field utilizing the code at line #(4) and #(5). The anticipated bounding field that produces the best IoU is chosen utilizing torch.max() at line #(6). The xywh values of the most effective bounding field prediction will then be saved in best_bbox, whereas the corresponding info of the field that has the decrease IoU might be discarded (#(8)). At traces #(7) and #(8) itself we multiply each the precise xywh and the most effective predicted xywh with obj, which is how we apply the 1^obj masks.

At this level we have already got our x and y values able to be processed with the sse perform we initialized earlier. Nevertheless, do not forget that we nonetheless want to use sq. root to w and h beforehand, which I do at line #(9) and #(10) for the goal and the most effective prediction vectors, respectively. One factor that you just want to bear in mind at line #(10) is that we must always take absolutely the worth of the numbers earlier than making use of torch.sqrt() simply to forestall us from computing the sq. root of detrimental numbers. Not solely that, it is usually mandatory so as to add a really small quantity (1e-6) to make sure that we received’t take the sq. root of 0, which is able to trigger numerical instability. Nonetheless with the identical line, we then multiply the ensuing tensor with its unique signal that we preserved earlier utilizing torch.signal().

Lastly, as we’ve got utilized torch.sqrt() to the w and h parts of target_bbox and best_bbox, we will now cross each tensors to the sse() perform as proven at line #(11). Be aware that the loss worth saved in bbox_loss already contains each the error from the primary and the second row of the YOLOv1 loss perform.

# Codeblock 5b
    target_bbox = goal[..., 21:25]      #(1)
    
    pred_bbox0 = prediction[..., 21:25]   #(2)
    pred_bbox1 = prediction[..., 26:30]   #(3)
    
    iou_pred_bbox0 = intersection_over_union(pred_bbox0, target_bbox)  #(4)
    iou_pred_bbox1 = intersection_over_union(pred_bbox1, target_bbox)  #(5)
    
    iou_pred_bboxes = torch.cat([iou_pred_bbox0.unsqueeze(0), 
                                 iou_pred_bbox1.unsqueeze(0)], 
                                dim=0)
    
    best_iou, best_bbox_idx = torch.max(iou_pred_bboxes, dim=0)    #(6)
    
    target_bbox = obj * target_bbox                                #(7)
    best_bbox   = obj * (best_bbox_idx*pred_bbox1                  #(8)
                         + (1-best_bbox_idx)*pred_bbox0)

    target_bbox[..., 2:4] = torch.sqrt(target_bbox[..., 2:4])      #(9)
    best_bbox[..., 2:4]   = torch.signal(best_bbox[..., 2:4]) * torch.sqrt(torch.abs(best_bbox[..., 2:4]) + 1e-6)  #(10)

    bbox_loss = sse(          #(11)
        torch.flatten(target_bbox, end_dim=-2),
        torch.flatten(best_bbox, end_dim=-2)
    )

The following element we’ll implement is the object loss. Check out the Codeblock 5c beneath to see how I try this.

# Codeblock 5c
    target_bbox_confidence = goal[..., 20:21]      #(1)
    pred_bbox0_confidence = prediction[..., 20:21]   #(2)
    pred_bbox1_confidence = prediction[..., 25:26]   #(3)
    
    target_bbox_confidence = obj * target_bbox_confidence                   #(4)
    best_bbox_confidence   = obj * (best_bbox_idx*pred_bbox1_confidence     #(5)
                                    + (1-best_bbox_idx)*pred_bbox0_confidence)
    
    object_loss = sse(      #(6)
        torch.flatten(obj * target_bbox_confidence * best_iou),           #(7)
        torch.flatten(obj * best_bbox_confidence),
    )

What we initially do within the codeblock above is to take the worth at index 20 from the goal vector (#(1)). In the meantime for the prediction vector, we have to take the values at indices 20 and 25 (#(2–3)), through which they correspond to the arrogance scores of every of the 2 packing containers generated by the mannequin. You possibly can return to Determine 12 to confirm this.

Subsequent, at line #(5) I take the arrogance of the field prediction that has the upper IoU. The code at line #(4) is definitely not mandatory as a result of obj and target_bbox_confidence are principally the identical factor. You possibly can confirm this by checking the code at line #(4) in Codeblock 5a. I truly do that anyway for the sake of readability as a result of we basically have each C and C_hat multiplied with 1^obj within the unique equation (see Determine 6).

Afterwards, we compute the SSE between the bottom fact confidence (target_bbox_confidence) and the anticipated confidence (best_bbox_confidence) (#(6)). You will need to observe at line #(7) that we have to multiply the bottom fact confidence with the IoU of the most effective bounding field prediction (best_iou). It is because the paper mentions that each time there’s an object midpoint inside a cell, then we wish the prediction confidence equal to that IoU rating. — And this ends our dialogue in regards to the implementation of object loss.

Now the Codeblock 5d beneath focuses on computing the no-object loss. The code is sort of easy since right here we reuse the target_bbox_confidence and the pred_bbox{0,1}_confidence we initialized within the earlier codeblock. These variables must be multiplied with the noobj masks earlier than the SSE computation is carried out. Be aware that the error made by the 2 predicted packing containers must be summed, which is the rationale why you see the addition operation at line #(1).

# Codeblock 5d
    no_object_loss = sse(
        torch.flatten(noobj * target_bbox_confidence),
        torch.flatten(noobj * pred_bbox0_confidence),
    )
    
    no_object_loss += sse(          #(1)
        torch.flatten(noobj * target_bbox_confidence),
        torch.flatten(noobj * pred_bbox1_confidence),
    )

Lastly, we compute the classification loss utilizing the Codeblock 5e beneath, through which this corresponds to the fifth row within the unique equation. Keep in mind that the unique YOLOv1 was skilled on the 20-class PASCAL VOC dataset. That is principally the rationale that we take the primary 20 indices from the goal and prediction vectors (#(1–2)). Then, we will merely cross the 2 into the sse() perform (#(3)).

# Codeblock 5e
    target_class = goal[..., :20]      #(1)
    pred_class = prediction[..., :20]    #(2)
    
    
    class_loss = sse(      #(3)
        torch.flatten(obj * target_class, end_dim=-2),
        torch.flatten(obj * pred_class, end_dim=-2),
    )

As we’ve got already accomplished the 5 parts of the YOLOv1 loss perform, what we have to do now could be to sum the whole lot up utilizing the next codeblock. Don’t neglect to present weightings to bbox_loss and no_object_loss by multiplying them with their corresponding lambda parameters we initialized earlier (#(1–2)).

# Codeblock 5f
    total_loss = (
        lambda_coord * bbox_loss           #(1)
        + object_loss
        + lambda_noobj * no_object_loss    #(2)
        + class_loss
    )
    
    return bbox_loss, object_loss, no_object_loss, class_loss, total_loss

Check Circumstances

On this part I’m going to show how one can run the loss() perform we simply created on a number of check circumstances. Now take note of the Determine 13 beneath as I’ll make the next check circumstances based mostly on this picture.

Bounding Field Loss Instance

The bbox_loss_test() perform in Codeblock 6 beneath focuses on testing whether or not the bounding field loss is working correctly. On the traces marked with #(1) and #(2) I initialize two all-zero tensors which I confer with as goal and prediction. I set the dimensions of those two tensors to 1×7×7×25 and 1×7×7×30, respectively, in order that we will modify the weather intuitively. We assume that the picture in Determine 13 as the bottom fact, therefore we have to retailer the bounding field info within the corresponding indices of the goal tensor.

The indexer [0] within the 0th axis signifies that we entry the primary (and the one one) picture within the batch (#(3)). Subsequent, [3,3] within the 1st and 2nd axes denotes the placement of the grid cell the place the thing midpoint is situated. We slice the tensor with [21:25] as a result of we wish to replace the values contained in these indices with [0.4, 0.5, 2.4, 3.2], through which they correspond to the x, y, w and h values of the bounding field. The worth at index 20, which is the place the goal bounding field confidence is saved, is ready to 1 because the object midpoint is situated inside this cell (#(4)). Subsequent, the index that corresponds to class cat (the category at index 7) additionally must be set to 1 (#(5)), identical to how we create one-hot encoding label in a typical classification job. You possibly can refer again to Determine 12 to confirm that the category cat is certainly on the seventh index.

# Codeblock 6
def bbox_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))        #(1)
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))  #(2)
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])    #(3)
    goal[0, 3, 3, 20] = 1.0    #(4)
    goal[0, 3, 3, 7] = 1.0     #(5)
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])       #(6)
    #prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.8, 4.0])      #(7)
    #prediction[0, 3, 3, 21:25] = torch.tensor([0.3, 0.2, 3.2, 4.3])      #(8)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))            #(9)
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))  #(10)

    bbox_loss = loss(goal, prediction)[0]    #(11)
    
    return bbox_loss

bbox_loss_test()

You possibly can see within the above codeblock that I ready three check circumstances at line #(6–8), through which the one at line #(6) is a situation the place the anticipated bounding field midpoint and the thing dimension matches precisely with the bottom fact. In that exact case, our bbox_loss can be 1.8474e-13, which is an especially small quantity. Keep in mind that it doesn’t return precisely 0 due to the 1e-6 we added through the IoU and the sq. root calculations. In the meantime within the second check case, I assume that the midpoint prediction is right, however the field dimension is a bit too giant. In case you attempt to run this, we could have our bbox_loss improve to 0.0600. Third, I additional enlarge the bounding field prediction and in addition shift from the precise place. And in such a case, our bbox_loss will get even bigger to 0.2385.

By the way in which, you will need to do not forget that the loss perform we outlined earlier expects the goal and prediction tensors to have the scale of 1×1225 and 1×1470, respectively. Therefore, we have to reshape them (#(9–10)) accordingly earlier than ultimately computing the loss worth (#(11)).

# Codeblock 6 Output
Case 1: tensor(1.8474e-13)
Case 2: tensor(0.0600)
Case 3: tensor(0.2385)

Object Loss Instance

To verify whether or not the object loss is right, we have to concentrate on the worth at index 20. What we do initially within the object_loss_test() perform beneath is much like the earlier one, specifically creating the goal and prediction tensors (#(1–2)) and initializing floor fact vector for cell (3, 3) (#(3–5)). Right here we assume that the bounding field prediction completely aligns with the precise bounding field (#(6)).

# Codeblock 7
def object_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))        #(1)
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))  #(2)
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])      #(3)
    goal[0, 3, 3, 20] = 1.0    #(4)
    goal[0, 3, 3, 7] = 1.0     #(5)
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])  #(6)
    
    prediction[0, 3, 3, 20] = 1.0    #(7)
    #prediction[0, 3, 3, 20] = 0.9   #(8)
    #prediction[0, 3, 3, 20] = 0.6   #(9)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    object_loss = loss(goal, prediction)[1]
    
    return object_loss

object_loss_test()

I’ve arrange three check circumstances particularly for the object loss. The primary one is the case when the mannequin is completely assured that there’s a field midpoint throughout the cell, or in different phrases, this can be a situation the place the arrogance is 1 (#(7)). In case you attempt to run this, the ensuing object loss can be 1.4211e-14, which is once more a price very near zero. You can too see within the ensuing output beneath that the object loss will increase to 0.0100 and 0.1600 as we lower the anticipated confidence to 0.9 and 0.6 (#(8–9)), which is precisely what we anticipated.

# Codeblock 7 Output
Case 1: tensor(1.4211e-14)
Case 2: tensor(0.0100)
Case 3: tensor(0.1600)

Classification Loss Instance

Speaking in regards to the classification loss, let’s now see if our loss perform can actually penalize misclassifications. Similar to the earlier ones, within the Codeblock 8 beneath I ready three check circumstances, through which the primary one is the situation the place the mannequin accurately offers good confidence to class cat and on the similar time leaving all different class possibilities to 0 (#(1)). In case you attempt to run this, the ensuing classification loss can be precisely 0. Subsequent, when you lower the arrogance of predicting cat to 0.9 whereas barely growing the arrogance for sophistication chair (index 8) to 0.1 as proven at line #(2), we’ll get our classification loss to extend to 0.0200. The loss worth will get even bigger to 1.2800 once I assume that the mannequin misclassifies cat as chair by assigning a really low confidence for the cat (0.2) and a excessive confidence for the chair (0.8) (#(3)). This basically signifies that our loss perform implementation is proven to have the ability to measure errors in classification correctly.

# Codeblock 8
def class_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
    goal[0, 3, 3, 20] = 1.0
    goal[0, 3, 3, 7] = 1.0
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
    
    prediction[0, 3, 3, 7] = 1.0    #(1)
    #prediction[0, 3, 3, 7:9] = torch.tensor([0.9, 0.1])    #(2)
    #prediction[0, 3, 3, 7:9] = torch.tensor([0.2, 0.8])    #(3)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    class_loss = loss(goal, prediction)[3]
    
    return class_loss

class_loss_test()

# Codeblock 8 Output
Case 1: tensor(0.)
Case 2: tensor(0.0200)
Case 3: tensor(1.2800)

No Object Loss Instance

Now so as to check our implementation on the no-object loss half, we’re going to look at the cell that doesn’t comprise any object midpoint, which right here I give you the grid cell at coordinate (1, 1). For the reason that object within the picture is barely the one situated at grid cell (3, 3), the goal bounding field confidence for coordinate (1, 1) must be set to 0, as proven at line #(1) in Codeblock 9. In truth, this step just isn’t fairly mandatory as a result of we already set the tensors to be all-zero within the first place — however I do it anyway for readability. Keep in mind that this no-object loss half might be activated solely when the goal bounding field confidence is 0 like this. In any other case, each time the goal field confidence is 1 (i.e., there’s an object midpoint throughout the cell), then the no-object loss half will at all times return 0.

Right here I ready two check circumstances, through which the primary one is when the values at indices 20 and 25 of the prediction tensor are each 0 as written at line #(2) and #(3), specifically when our YOLOv1 mannequin accurately predicts that there isn’t a bounding field midpoint throughout the cell. The loss worth will improve after we use the code at line #(4) and #(5) as a substitute, through which it simulates the mannequin considerably thinks that there must be objects in there whereas it’s truly not. You possibly can see within the ensuing output beneath that the loss worth now will increase to 0.1300, which is anticipated.

# Codeblock 9
def no_object_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
    
    goal[0, 1, 1, 20] = 0.0        #(1)

    prediction[0, 1, 1, 20] = 0.0    #(2)
    prediction[0, 1, 1, 25] = 0.0    #(3)

    #prediction[0, 1, 1, 20] = 0.2   #(4)
    #prediction[0, 1, 1, 25] = 0.3   #(5)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    no_object_loss = loss(goal, prediction)[2]
    
    return no_object_loss

no_object_loss_test()

# Codeblock 9 Output
Case 1: tensor(0.)
Case 2: tensor(0.1300)

Ending

And effectively, I feel that’s just about the whole lot in regards to the loss perform of the YOLOv1 mannequin. We have now fully mentioned the formal mathematical expression of the loss perform, carried out it from scratch, and carried out testing on every of the parts. Thanks very a lot for studying, I hope you be taught one thing new from this text. Please let me know when you spot any errors in my rationalization or within the code. See ya in my subsequent article!

By the way in which it’s also possible to discover the code in my GitHub repository. Click on the hyperlink at reference quantity [4].

References

[1] Muhammad Ardi. YOLOv1 Paper Walkthrough: The Day YOLO First Noticed the World. In direction of Knowledge Science. https://towardsdatascience.com/yolov1-paper-walkthrough-the-day-yolo-first-saw-the-world/ [Accessed December 18, 2025].

[2] Joseph Redmon et al. You Solely Look As soon as: Unified, Actual-Time Object Detection. Arxiv. https://arxiv.org/pdf/1506.02640 [Accessed July 25, 2024].

[3] Picture created initially by writer.

[4] MuhammadArdiPutra. Regression For All — YOLOv1 Loss Perform. GitHub. https://github.com/MuhammadArdiPutra/medium_articles/blob/main/Regression%20For%20All%20-%20YOLOv1%20Loss%20Function.ipynb [Accessed July 25, 2024].

Source link

YOLOv1 Loss Function Walkthrough: Regression for All

I Built a C++ Backend So My GPU Would Stop Eating Air

I Spent May Evaluating Different Engines for OCR

Why AI Is NOT Stealing Your Job

What AI Agents Should Never Do on Their Own

Exploring Income Patterns with Python Pandas, Matplotlib, and Seaborn

From Local App to Public Website in Minutes

New tiny nudibranch species discovered in Taiwan

Why the Budget’s CGT changes are a disaster for angel investors and startups

OpenAI and Anthropic Sign Letter to Prevent AI-Developed Biological Weapons

New York sports betting statements bill advances

Featured Picks

Crypto casino founder accused of gambling away millions from investors

The foundation for a governed agent workforce: DataRobot and NVIDIA RTX PRO 4500

Andrew Ng: Unbiggen AI – IEEE Spectrum

YOLOv1 Loss Function Walkthrough: Regression for All

What’s a Loss Perform?

Breaking Down the Parts

Row #1: Midpoint Loss

Row #2: Dimension Loss

Row #3: Object Loss

Row #4: No Object Loss

Row #5: Classification Loss

Adjustable Parameters

Code Implementation

The IoU Perform

The YOLOv1 Loss Perform

Check Circumstances

Bounding Field Loss Instance

Object Loss Instance

Classification Loss Instance

No Object Loss Instance

Ending

References

Related Posts