chapter within the in-progress e-book on linear algebra. The desk of contents to this point:
- Chapter-1: The basics
- Chapter-2: Measure of a map (present)
Keep tuned for future chapters.
Linear algebra is the device of many dimensions. It doesn’t matter what you is perhaps doing, as quickly as you scale to ( n ) dimensions, linear algebra comes into the image.
Within the previous chapter, we described summary linear maps. On this one, we roll up our sleeves and begin to take care of matrices. Sensible issues like numerical stability, environment friendly algorithms, and many others. will now begin to be explored.
Word: all photos on this article, except in any other case acknowledged are by the creator.
I) Tips on how to quantify a linear map
Determinants are one of the vital historic ideas in linear algebra. The roots of the topic lay in fixing techniques of linear equations. And determinants would “decide” if there even was an answer value searching for. However in many of the instances, the place the system does have an answer, it gives additional helpful data. Within the trendy framework of linear maps, determinants present a single quantification of linear maps.
We mentioned within the previous chapter the idea of vector areas (mainly n-dimensional collections of numbers — and extra typically collections of fields) and linear maps that function on two of these vector areas, taking objects in a single to the opposite.
For example of those sorts of maps, one vector house could possibly be the floor of the planet you’re sitting on and the opposite could possibly be the floor of the desk you is perhaps sitting at. Literal maps of the world are additionally maps on this sense since they “map” each level on the floor of the Earth to some extent on a paper or floor of a desk, though they aren’t linear maps since they don’t protect relative areas (Greenland seems a lot bigger than it’s for instance in a few of the projections).
As soon as we choose a basis for the vector house (a set of n “unbiased” vectors within the house; there could possibly be infinite decisions on the whole), all linear maps on that vector house get distinctive matrices assigned to them.
In the intervening time, let’s prohibit our consideration to maps that take vectors from an 𝑛-dimensional house again to the 𝑛-dimensional house (we’ll generalize later). The matrices corresponding to those linear maps are 𝑛×𝑛 (see part III of chapter 1). It is perhaps helpful to “quantify” such a linear map, categorical its impact on the vector house, ℝⁿ in a single quantity. The type of map we’re coping with, successfully takes vectors from ℝⁿ and “distorts” them into another vectors in the identical house. Each the unique vector 𝑣 and the vector 𝑢 that the map transformed it into have some lengths (say |𝑣| and |𝑢|). We are able to take into consideration how a lot the size of the vector is modified by the map, |𝑢|∕|𝑣|. Possibly that may quantify the impression of the map? How a lot it “stretches” vectors?
This method has a deadly flaw. The ratio relies upon not simply on the linear map, but additionally on the vector 𝑣 it acts on. It’s subsequently not strictly a property of the linear map itself.
What if we take two vectors as an alternative now, 𝑣₁ and 𝑣₂ that are transformed by the linear map into the vectors 𝑢₁ and 𝑢₂. Simply because the measure of the only vector, 𝑣 was its size, the measure of two vectors is the world of the parallelogram contained between them.

Simply as we thought of the quantity by which the size of 𝑣 modified, we are able to now speak by way of the quantity by which the world between 𝑣₁ and 𝑣₂ modifications as soon as they cross via the linear map and turn out to be 𝑢₁, 𝑢₂. And alas, this once more relies upon not simply on the linear map, but additionally the vectors chosen.
Subsequent, we are able to go to 3 vectors and take into account the change in quantity of the parallelepiped between them and run into the identical drawback of the preliminary vectors having a say.

However now take into account an n-dimensional area within the authentic vector house. This area can have some “n-dimensional measure”. To know this, a two dimensional measure is an space (measured in sq. kilometers). A 3 dimensional measure is the amount used for measuring water (in liters). A 4 dimensional measure has no counterpart within the bodily world we’re used to, however is simply as mathematically sound, a measure of the quantity of 4 dimensional house enclosed inside a parallelepiped fashioned of 4 4- d vectors and so forth.

The 𝑛 authentic vectors (𝑣₁, 𝑣₂, …, 𝑣ₙ) type a parallelepiped which is reworked by the linear map into 𝑛 new vectors, 𝑢₁, 𝑢₂, …, 𝑢ₙ which type their very own parallelepiped. We are able to then ask in regards to the 𝑛-dimensional measure of the brand new area in relation to the unique one. And this ratio, it seems, is certainly a operate solely of the linear map. No matter what the unique area regarded like, the place it was positioned and so forth, the ratio of its measure as soon as the linear map acted on it to its measure earlier than would be the similar — a operate purely of the linear map. This ratio of 𝑛-dimensional measures (after to earlier than) then is what we’ve been searching for: an unique property of the linear map that quantifies its impact in a single quantity.
This ratio by which the measure of any 𝑛-dimensional patch of house is modified by the linear map is an efficient technique to quantify the impact it has on the house it acts on. It’s known as the determinant of the linear map (the explanation for that title will turn out to be obvious in part V).
For now, we merely acknowledged the truth that the quantity by which a linear map from ℝⁿ to ℝⁿ “stretches” any patch of 𝑛-dimensional house relies upon solely on the map with out providing a proof for the reason that function right here was motivation. We’ll cowl a proof later (part VI), as soon as we arm ourselves with some weapons.
II) Calculating determinants
Now, how do we discover this determinant given a linear map from the vector house ℝⁿ again to ℝⁿ? We are able to take any 𝑛 vectors, discover the measure of the parallelepiped between them and the measure of the brand new parallelepiped as soon as the linear map has acted on all of them. Lastly, divide the latter by the previous.
We have to make these steps extra concrete. First, let’s begin taking part in round on this ℝⁿ vector house.
The ℝⁿ vector house is only a assortment of 𝑛 actual numbers. The only vector is simply 𝑛 zeros — [0, 0, …, 0]. That is the zero vector. If we multiply a scalar with it, we simply get the zero vector again. Not attention-grabbing. For the subsequent easiest vector, we are able to substitute the primary 0 with a 1. This results in the vector: 𝑒₁ = [1, 0, 0, …, 0]. Now, multiplying by a scalar, 𝑐 provides us a distinct vector.
$$c.[1, 0, 0,.., 0] = [c, 0, 0, …, 0]$$
We are able to “span” an infinite variety of vectors with 𝑒₁ relying on the scalar 𝑐 we select.
If 𝑒₁ is the vector with simply the primary ingredient being 1 and the remaining being 0, then what’s 𝑒₂? The second ingredient being 1 and the remaining being 0 looks as if a logical selection.
$$e_2 = [0,1,0,0,dots 0]$$
Taking this to its logical conclusion, we get a set of n vectors:

These vectors type a foundation of the vector house that’s ℝⁿ. What does this imply? Any vector 𝑣 in ℝⁿ might be expressed as a linear mixture of those 𝑛 vectors. Which signifies that for some scalars 𝑐₁, 𝑐₂, …, 𝑐ₙ:
$$v = c_1.e_1+c_2.e_2+dots +c_n.e_n$$
All vectors, 𝑣 are “spanned” by the set of vectors 𝑒₁, 𝑒₂, …, 𝑒ₙ.
This specific assortment of vectors isn’t the one foundation. Any set of 𝑛 vectors works. The one caveat is that not one of the 𝑛 vectors must be “spanned” by the remaining. In different phrases, all of the 𝑛 vectors must be linearly unbiased. If we select 𝑛 random numbers from most steady distributions and repeat the method 𝑛 instances to create the 𝑛 vectors, you’re going to get a set of linearly unbiased vectors with 100% likelihood (“virtually certainly” in likelihood phrases). It’s simply very, most unlikely {that a} random vector occurs to be “spanned” by another 𝑘 < 𝑛 random vectors.
Going again to our recipe at first of this part to search out the determinant of a linear map, we now have a foundation to specific our vectors in. Fixing the premise additionally means our linear map might be expressed as a matrix (see part III of chapter 1). Since this linear map is taking vectors from ℝⁿ again to ℝⁿ, the corresponding matrix is 𝑛 × 𝑛.
Subsequent, we wanted 𝑛 vectors to type our parallelepiped. Why not take the 𝑒₁, 𝑒₂, …, 𝑒ₙ commonplace foundation we outlined earlier than? The measure of the patch of house contained between these vectors occurs to be 1, by very definition. The image beneath for ℝ³ will hopefully make this clear.

If we gather these vectors from the usual foundation right into a matrix (rows or columns), we get the id matrix (1’s on the primary diagonal, 0’s in all places else):

Once we mentioned we may apply our linear remodel to any n-dimensional patch of house, we would as nicely apply it to this “commonplace” patch.
However, it’s straightforward to point out that multiplying any matrix with the id matrix ends in the identical matrix. So, the ensuing vectors after the linear map is utilized are the columns of the matrix representing the linear map itself. So, the quantity by which the linear map modified the amount of the “commonplace patch” is identical because the n-dimensional measure of the parallelepiped between the column vectors of the matrix representing the map itself.
To recap, we began by motivating the determinant because the ratio by which a linear map modifications the measure of an n-dimensional patch of house. And now, we confirmed that this ratio itself is an n-dimensional measure. Specifically, the measure contained between the column vectors of any matrix representing the linear map.
III) Motivating the essential properties
We described within the earlier part how a determinant of a linear map ought to merely be the measure contained between the vectors of any of its matrix representations. On this part, we use two dimensional house (the place measures are areas) to inspire some basic properties a determinant should have.
The primary property is multi-linearity. A determinant is a operate that takes a bunch of vectors (collected in a matrix) and maps them to a single scalar. Since we’re limiting to two-dimensional house, we’ll take into account two vectors, each two dimensional. Our determinant (since we’ve motivated it to be the world of the parallelogram between the vectors) might be expressed as:
$$det = A(v_1, v_2)$$
How ought to this operate behave if we add a vector to one of many two vectors? The multi-linearity property requires:
$$A(v_1+v_3, v_2) = A(v_1,v_2)+A(v_3,v_2)tag{1}$$
That is obvious from the transferring image beneath (be aware the brand new space getting added).

And this visualization may also be used to see (by scaling one of many vectors as an alternative of including one other vector to it):
$$A(c.v_1, v_2) = c.A(v_1, v_2) tag{2}$$
This second property has an vital implication. What if we plug a damaging c into the equation?
The world, 𝐴(𝑣₁, 𝑣₂) ought to then be the other signal to 𝐴(𝑐·𝑣₁, 𝑣₂).
Which suggests we have to introduce the notion of damaging space and a damaging determinant.
This makes a whole lot of sense if we’re okay with the idea of damaging lengths. If lengths — measures in 1-D house — might be constructive or damaging, then it stands to purpose that areas — measures in 2-D house — also needs to be allowed to be damaging. And so, measures in house of any dimensionality ought to as nicely.
Collectively, equations (1) and (2) are the multi-linearity property.
One other vital property that has to do with the signal of the determinant is the alternating property. It requires:
$$A(v_1, v_2) = -A(v_2, v_1)$$
Swapping the order of two vectors negates the signal of the determinant (or measure between them). In case you realized in regards to the cross product of 3-D vectors, this property will probably be very pure. To inspire it, let’s suppose first of the one-dimensional distance between two place vectors, 𝑑(𝑣₁, 𝑣₂). It’s clear that 𝑑(𝑣₁, 𝑣₂) = −𝑑(𝑣₂, 𝑣₁) since after we go from 𝑣₂ to 𝑣₁, we’re touring in the other way to after we go from 𝑣₁ to 𝑣₂. Equally, if the world spanned between vectors 𝑣₁ and 𝑣₂ is constructive, then that between 𝑣₂ and 𝑣₁ have to be damaging. This property holds in 𝑛-dimensional house as nicely. If in 𝐴(𝑣₁, 𝑣₂, …, 𝑣ₙ) we swap two of the vectors, it causes the signal to modify.
The alternating property additionally implies that if one of many vectors is just a scalar a number of of the opposite, the determinant have to be 0. It’s because swapping the 2 vectors ought to negate the determinant:
$$start{align}A(v_1, v_1) = -A(v_1, v_1)
=> 2 A(v_1, v_1) = 0
=> A(v_1, v_1) = 0end{align}$$
We even have by multi-linearity (equation 2):
$$A(v_1, c.v_1) = c A(v_1, v_1) = 0$$
This is smart geometrically since if two vectors are parallel to one another, the world between them is ( 0 ).
The video [6] covers the geometric motivation of those properties with actually good visualizations and video [4] visualizes the alternating property fairly nicely.
IV) Getting algebraic: Deriving the Leibniz system
On this part, we transfer away from geometric instinct and method the subject of determinants from an alternate route — that of chilly, algebraic calculations.
See, the multi-linearity and alternating properties which we motivated within the final part with geometry are (remarkably) sufficient to present us a really particular algebraic system for the determinant, known as the Leibniz system.
That system helps us see properties of the determinant that will be actually, actually laborious to watch from the geometric method or with different algebraic formulation.
The Leibniz system can then be diminished to the Laplace growth, involving going alongside a row or column and calculating cofactors — which many individuals see in highschool.
Let’s derive the Leibniz system. We want a operate that takes the 𝑛 column vectors, 𝛼₁, 𝛼₂, …, 𝛼ₙ of the matrix as enter and converts them right into a scalar, 𝑐.
$$c=f(vec{a_1}, vec{a_2}, dots vec{a_n})$$
We are able to categorical every column vector by way of the usual foundation of the house.

Now, we are able to apply the property of multi-linearity. For now, to the primary column, 𝛼₁.

We are able to do the identical for the second column. Let’s take simply the primary time period from the summation above and check out the ensuing phrases.

Word that within the first time period, we get the vector 𝑒₁ showing twice. And by the alternating property, the operate 𝑓 for that time period turns into 0.
To ensure that two 𝑒₁’s to seem, the second indices of the 2 𝑎’s within the product should every turn out to be 1.
So, as soon as we do that for all of the columns, the phrases that received’t turn out to be zero by the alternating property would be the ones the place the second indices of the 𝑎’s don’t have any repetition — so all distinct numbers from 1 to 𝑛. In different phrases, we’re searching for permutations of 1 to 𝑛 to seem within the second indices of the 𝑎’s.
What in regards to the first indices of the 𝑎’s? These are merely the numbers 1 to 𝑛 so as since we pull out the 𝑎₁ₓ’s first, then the 𝑎₂ₓ’s, and so forth. In additional compact algebraic notation,

Within the expression on the correct, the areas 𝑓(𝑒_{𝑗₁}, 𝑒_{𝑗₂}, …, 𝑒_{𝑗ₙ}) can both be +1, −1, or 0 for the reason that 𝑒ⱼ’s are all unit vectors orthogonal to one another. We already established that any time period that has any repeated 𝑒ⱼ’s will turn out to be 0, leaving us with simply permutations (no repetition). Amongst these permutations, we’ll generally get +1 and generally −1.
The idea of permutations carries with it signs. The indicators of the areas are equal to the indicators of the permutations. If we denote by 𝑆ₙ the set of all permutations of [1, 2, …, 𝑛], then we get the Leibniz system of the determinant:
$$det([vec{a_1}, vec{a_2}, dots vec{a_n}]) = |A| = sumlimits_{sigma in S_n} sgn(sigma) prod limits_{i=1}^n a_{i,sigma(i)} tag{3}$$
This system can be described intimately in mathexchange post, [3]. And to make issues concrete, right here is a few easy Python code that implements it (together with a check case).
One shouldn’t really use this system to calculate the determinant of a matrix (except it’s only for enjoyable or exposition). It really works, however is comically inefficient given the sum over all permutations (which is 𝑛!, which is super-exponential).
Nevertheless, many theoretical properties of the determinant turn out to be trivial to see with the Leibniz system once they could be very laborious to decipher or show if we began from one other of its types. For instance:
- Proposition-1: With this system it turns into obvious {that a} matrix and its transpose have the identical determinant: |𝐴| = |𝐴ᵀ|. It’s a easy consequence of the symmetry of the system.
- Proposition-2: A really related derivation to the above can be utilized to point out that for 2 matrices 𝐴 and 𝐵, |𝐴𝐵| = |𝐴| ⋅ |𝐵|. See this answer within the mathexchange post, [8]. It is a very handy property since matrix multiplication comes up on a regular basis in varied decompositions of matrices, and reasoning in regards to the determinants of these decompositions is usually a highly effective device.
- Proposition-3: With the Leibniz system, we are able to simply see that if the matrix is higher triangular or decrease triangular (decrease triangular means each ingredient of the matrix above the diagonal is zero), the determinant is just the product of the entries on the diagonal. It’s because all permutations bar one: (𝑎₁₁ ⋅ 𝑎₂₂ ⋯ 𝑎ₙₙ) (the primary diagonal) get some zero time period or the opposite and make their phrases within the summation 0.

The third truth really results in probably the most environment friendly algorithm for calculating a determinant that the majority linear algebra libraries use. A matrix might be decomposed effectively into decrease and higher triangular matrices (known as the LU decomposition which we’ll cowl within the subsequent chapter). After doing this decomposition, the third truth is used to multiply the diagonals of these decrease and higher matrices to get their determinants. And eventually, the second truth is used to multiply these two determinants and get the determinant of the unique matrix.
Lots of people in highschool or college when first uncovered to the determinant, be taught in regards to the Laplace growth, which entails increasing a couple of row or column, discovering co-factors for every ingredient and summing. That may be derived from the above Leibniz growth by amassing related phrases. See this answer to the mathexchange post, [2].
V) Historic motivation
The determinant was first found within the context of linear techniques of equations. Say now we have 𝑛 equations in 𝑛 variables (𝑥₀, 𝑥₁, …, 𝑥ₙ).

This method might be expressed in matrix type:

And extra compactly:
$$A.x = b$$
An vital query is whether or not or not the system above has a novel resolution, x. And the determinant is a operate that “determines” this. There’s a distinctive resolution if and provided that the determinant of A is non-zero.
This traditionally impressed method motivates the determinant as a polynomial that arises after we attempt to remedy a linear system of equations related to the linear map. We are going to cowl this in additional depth in chapter 5.
For extra on this, see the wonderful reply within the mathexchange post, [8].
VI) Proof of the property we motivated with
We began this chapter by motivating the determinant as the quantity by which the ℝⁿ → ℝⁿ linear map modifications the measure of an n-dimensional patch of house. We additionally mentioned that this doesn’t work for 1, 2, … n − 1 dimensional measures. Beneath is a proof of this the place we use a few of the properties we encountered in the remainder of the sections.
Outline (𝑉, 𝑈) as 𝑛 × 𝑘 matrices, the place
$$ V = (v_1, v_2, dots, v_k) $$
By definition,
$$|v_1, v_2, dots, v_k| = sqrt{det(V^t V)} $$ and
$$ |u_1, u_2, dots, u_k| = sqrt{det(U^t U)} = sqrt{det((AV)^t (AV))} = sqrt{det(V^t A^t A V)} $$
Solely when n = okay is V is a sq. matrix, so
$$|v_1, v_2, dots, v_k| = sqrt{det(V^t A^t A V)}$$
$$= sqrt{det(V^t) det(A^t) det(A) det(V)} $$
$$= det(A) sqrt{det(V^t V)} = det(A) |v_1, v_2, dots, v_k| $$
References
[1] Mathexchange submit: Determinant of a linear map doesn’t rely on the bases: https://math.stackexchange.com/questions/962382/determinant-of-linear-transformation
[2] Mathexchange submit: Determinant of a matrix Laplace growth (highschool system) https://math.stackexchange.com/a/4225580/155881
[3] Mathexchange submit: Understanding Leibniz system for determinants https://math.stackexchange.com/questions/319321/understanding-the-leibniz-formula-for-determinants#:~:text=The%20formula%20says%20that%20det,permutation%20get%20a%20minus%20sign.&text=where%20the%20minus%20signs%20correspond%20to%20the%20odd%20permutations%20from%20above.
[4] Youtube video: 3B1B on determinants https://www.youtube.com/watch?v=Ip3X9LOh2dk&t=295s
[5] Connecting Leibniz system with geometry https://math.stackexchange.com/questions/593222/leibniz-formula-and-determinants
[6] Youtube video: Leibniz system is space: https://www.youtube.com/watch?v=9IswLDsEWFk
[7] Mathexchange submit: product of determinants is determinant of product https://math.stackexchange.com/questions/60284/how-to-show-that-detab-deta-detb
[8] Historic context for motivating determinant: https://math.stackexchange.com/a/4782557/155881

