Let us illustrate the diagrammatic expansion of integrals from Lecture 13 in a specific case: we will bring things full circle and look at “Stirling’s quantum field theory.” Going back to the very beginning, start with the Euler integral

Now make the change of variables to get

and we’re in position to apply our diagrammatic expansion of the remaining integral.

The nice thing about the diagrammatic expansion is that most of it is “universal,” and it is not until the last step that we have to worry about what the smooth function in the exponent of the integrand actually is. In particular, from Lecture 13, we have that the coefficient of is the sum of the following three numbers

where are the three graphs shown at the end of Lecture 13, in the same order listed there. These are precisely the graphs of cyclomatic number in which all vertices have degree at most the exponent of is the number of vertices, and the monomials encode the sequence of vertex degrees. Now we just have to calculate the order of the automorphism groups of these graphs: we obtain

so that the coefficient of in the asymptotic expansion of the integral is

Note that this has absolutely no dependence on what the integral actually is, so far: this dependence only enters at the last stage, where we plug in the values of the -coefficients, which for the integral in Stirling’s QFT are all identically equal to So the first two terms in the sum above cancel, leaving us with the conclusion that

as Now to get the next order in the asymptotic expansion, the coefficient of you have to write out all graphs of cyclomatic number with vertices of degree at least calculate the order of their automorphism groups, and evaluate as above. I am going to come back later and do this for you, but in the meantime you may want to try it yourself. It’s pretty tedious, but instructive. Actually, the fact that it’s tedious is an important takeaway: you can simplify your life greatly if you are willing to settle for the asymptotic expansion of since then only connected graphs will enter into the calculation of the coefficients in the asymptotic expansion, and there are a lot less of those.

For a more involved version of the same procedure, where the “graphs” involved are of a quite different nature, you can check out my lecture yesterday in the UC Berkeley combinatorics seminar. Don’t worry if you don’t understand much; we are going to look at matrix integrals together very soon.

We continue with the material from Lecture 10. Let be a smooth function on of the form

where is itself a smooth function with Taylor expansion of the form

where are real numbers such that the corresponding path integral

converges. The function is called the “potential,” and we view the action as a deformation of which corresponds to choosing all so that the potential and we get the exact evaluation

The by the Laplace principle we know that we have an asymptotic expansion

and we want to calculate the coefficients , which are functions of the parameters .

Taking this all back to Lecture 1, if we set and take the action

then we are looking at “Stirling’s QFT,” in which the potential has coefficients . The result of our computation should then be a some sort of understanding of the subleading corrections in Stirling’s approximation,

which are the “quantum corrections” in Stirling’s QFT, whatever that means.

To begin, let’s not yet think about the integral , and focus solely on the integrand

Let us focus even more narrowly on the non-Gaussian part of the integrand, and apply the Exponential Formula as in Lecture 3. First, we write

where . Then we have that the coefficients in the expansion

are given by

where the sum is over all partitions of , and the product is over the blocks of the partition . Since is the number of ways to cyclically order the subset , we have that

where denotes the number of -cycles in the disjoint cycle decomposition of the permutation , and is the total number of cycles. Thus the polynomial is a generating function for the permutations in which keeps track of cycle type; this polynomial is called the “augmented cycle index polynomial” of . The actual (i.e. non-augmented) cycle index polynomial is , and it is the basic tool in Polya’s theory of enumeration under group action. Equivalently, we can write the modified cycle index as

where the sum is over Young diagrams with cells and is the conjugacy class of permutations of cycle type .

Let’s integrate. We write

We know how to compute moments of the Gaussian distribution: we recall from Lecture 4 that

where is the number of fixed point free involutions in , i.e. permutations which factor into disjoint cycles of order (so is zero if is odd). We thus have the series

where , and the number is equal to the number of pairs in which is a fixed point free involution, and is a permutation in of cycle type . Such pairs of permutations are special objects, and they have a special name.

Definition 1: A pair of permutations such that is a fixed point free involution and all cycles of have length at least is called a combinatorial map.

The use of the word “map” in Definition 1 may seem strange, since you are probably more accustomed to seeing it appear in the following definition.

Definition 2: A topological map consists of a closed oriented surface together with a set of distinct points on and a set of curves on , each of which joins a point of to a point of in such a way that the curves in intersect only at points of , and moreover if and are deleted from then what remains are disjoint open sets, called “faces,” each of which is homeomorphic to a disc.

Topological maps are basic objects in graph theory: for example, the famous four color theorem asserts that in order to color the faces of a map on the sphere in such a way that faces which are bounded by a common curve have different colors, it is necessary and sufficient to have four colors. Indeed there is a clear connection between graphs and topological maps, in that starting from a graph we can try to embed it on a surface by choosing a set of points on in bijection with , and a set of curves on in bijection with , such that the requirements to be a topological map are satisfied (this may not be possible if the connectivity of is high and the genus of is low, and the minimal genus of such that this can be done is called the genus of .

The connection between combinatorial maps and topological maps is not quite as obvious, but it is still fairly straightforward. The construction is quite simple and natural. First, draw every cycle of as a polygon, with the elements permuted by the cycle being taken as labeling the edges of the polygon, and the edges being directed counterclockwise around the polygon. Now, identify the edges of these polygons in pairs according to the cycles of the fixed point free involution , ensuring that when we identify two edges they are glued with opposition orientations, i.e. forming a two-way street. This procedure produces from the pair a closed oriented surface together with a collection of points on the surface connected by curves meeting only at those points — a topological map. The points are images of the former vertices of the polygons after having been identified, and its curves are former edges of the polygons after having been identified; the faces of the map are the interiors of the polygons.

As a small example of the above construction, let us consider combinatorial maps of degree , i.e. pairs such that is a product of -cyles and is a -cycle. There are thus possibilities for and possibilities for , for a total of combinatorial maps of degree .

We consider those combinatorial maps as above with , the full forward cycle in . These are the following:

Now we depict the corresponding gluing construction as described above:

So, the three different combinatorial maps described above gave us only two different topological maps: the combinatorial maps and produced the same topological map. Thus, the correspondence between combinatorial maps and topological maps is many-to-one (as is the correspondence between graphs and topological maps). But the general picture that is emerging is that the asymptotic expansion of is a generating function for topological maps, with face degrees determined by the Maclaurin series of the potential . That is, topological maps are the Feynman diagrams of zero-dimensional scalar-valued quantum field theory. We will continue to develop this connection next week.

In this lecture we will actually start down the path to Feynman diagrams; how far down this path we will progress remains to be seen.

Feynman diagrams are combinatorial objects closely associated with an area of theoretical physics called quantum field theory, and so we will begin with a nesessarily brief and sketchy discussion of what QFT is. The truth is that nobody knows what quantum field theory “is” in the sense of a mathematical definition, but I can at least try to give you a sense of the mathematical structures that are involved, and the sorts of computations that physicists perform.

Let be a manifold, representing a “universe” which we want to understand. We consider functions on as measurements on this universe, or “observables.” We must say what the outputs of these observations: they may be numbers, or vectors, or operators, or something else. In other words, we want to specify a “target” for observations , and typically is some other manifold. So the space of observables is the function space . For example, one could take , a manifold which is supposed to model our physical experience of “spacetime,” and , meaning that our observables are numerical measurements on spacetime, which sounds reasonable and not at all problematic.

The next step is to define a “field,” which a measure on , the space of all functions . Physicists would like to prescribe this measure by giving its density against the background flat Lebesgue measure, and this density is supposed to be of the form

where is called the “semiclassical parameter” and is a functional called the “action.” The total flatness and uniformity of the Lebesgue measure represents entropy, while the density above represents energy in the sense that it leads to measure concentrating near the minimizers of the action, much as we saw when we initially discussed the Laplace method, and as is familiar from the setting of statistical mechanics.

Once we have prescribed the spaces and and the action , we are looking at something which is more or less what physicists call a QFT, and we start asking questions about it. One of the main things that physicists would like to do is to evaluate the integral

which is called the “partition function” or “path integral.” It is not expected that this can be done exactly in most cases; when is such that an exact evaluation of the corresponding is possible, we are looking at what is called a “free theory.” The goal then is to select a new action which is a “perturbation” of , i.e. is close to in some sense. One then tries to estimate the new path integral corresponding to in the limit . What is supposed to happen is that in this so-called “semiclassical limit,” there emerges an approximation of the form

The terms are called “quantum terms,” and the sum of all these terms is typically a divergent series, i.e. it has radius of convergence zero. So what this meant by the above is that there is that admits a complete asymptotic expansion: we have

as for any nonnegative integer . Now the question that what physicists want to address is how to compute the quantum coefficients . The idea is that these coefficients should all be computable in terms of the “correlation functions” of the free theory, meaning integrals of the form

The computation of these correlation functions is typically very intricate, and Feynman diagrams are a graphical bookkeeping device introduced by Feynman in order to help organize such computations. The theory is very well developed from a physical perspective, and is in regular, constant use in contemporary theoretical physics.

The problem is that none of the above makes sense mathematically: even when one makes very reasonable, realistic choices like and the corresponding space of functions is infinite-dimensional, and hence does not admit a Lebesgue measure. This means that the definition of the field as a measure which is absolutely continuous with respect to Lebesgue measure makes no sense, and the path integral

This leaves us with two options. One is to try make the above general construction rigorous by developing some kind of theory of functional integration that makes mathematical sense. This is an extremely interesting undertaking, but not one that we will discuss. The second option is to choose in such a way that is finite-dimensional. The only way this can happen is if one of is there zero-dimensional manifold consisting of a single point. Choosing is not advisable, since then is also zero-dimensional, which leads to a trivial theory. However, if we choose then , and we have something non-trivial, namely a toy model aptly named zero-dimensional quantum field theory.

What we are going to do now is develop the zero-dimensional quantum field theory corresponding to the data

The path integral corresponding this quantum field theory is

which is non-problematic provided we choose such that the integral converges. We have even identified previously an action which gives us a free theory: if we choose

the corresponding path integral evaluation is Lord Kelvin’s favorite formula:

And the next part of the construction, the semiclassical limit of the path integral corresponding to a more general action , is just the classical Laplace method, which we now state formally.

Theorem 1: Let be a (possibly infinite) interval, and let be a smooth function that attains a global minimum at a unique point . Then

where is a smooth function on such that

Theorem 1 tells us that, for any nonnegative integer , we have

where the sum in the brackets is just the th Maclaurin polynomial and the error term follows from Taylor’s theorem (it is important to note that is smooth, but not necessarily analytic, i.e. its Maclaurin series may be divergent, and even if it is convergent it need not sum to ). So we are in a position where we have the perfectly well-defined problem of computing the “quantum corrections” . This will be our goal in the next several lectures, and we will see that the Feynman diagrams which accompany this problem are classical combinatorial objects, namely maps on surfaces.

In this lecture we continue our long march towards Stirling’s formula. Let me begin by saying that, while we seem to be spending an absurd amount of time on this, Stirling’s approximation is being used as a case study in something much more general, Laplace’s method, and we will understand the general thing almost automatically once we understand the specific thing.

We return to the integral representation

In Lecture 2, we performed some elementary estimates which showed that, for all sufficiently small we have

where is a positive number depending on but not on This quantifies the intuition that the bulk contribution to the integral comes from a small neighborhood of the minimum of the action.

We now come to the problem of estimating this bulk contribution

The action

is holomorphic on the open unit disc and its Maclaurin series is

this series converging absolutely and uniformly on compact subsets of to the correct value In Lecture 2, we observed that if we throw away all terms in the series expansion of beyond the quadratic term, our integral -integral becomes the truncated Gaussian integral

which can then be replaced by a total Gaussian integral at the cost of introducing an exponentially small error,

this being beneficial because the total Gaussian integral can be exactly evaluated,

Thus, the remaining question is to quantify the error incurred by replacing

We now want quantify the error incurred by throwing away all terms of the action beyond its dominant (quadratic) term. For small we naturally expect this error to be a small number which decays as increases, but this decay turns out to be much slower than exponential and represents the main source of error in the approximation scheme we’re building.

To get started, we write

Next, we Maclaurin-expand the non-Gaussian part of the integrand, writing

where is the th derivative of at Assuming this series converges uniformly on indeed, it is the Maclaurin series of the function and we aren’t changing the location of singularities by subtracting an entire function. We can thus interchange the order of summation and integration, thereby obtaining

We have thus reduced the initial problem to two problems.

Problem 1: Compute the coefficients

Problem 2: Compute the integrals

We will solve Problem 1 today, and Problem 2 on Thursday (in case you’d like to think ahead).

Problem 1 is pure algebraic combinatorics, being a question about composition in the algebra of formal power series This question can be phrased in greater generality, as follows. Let

be elements of such that the constant term of is zero. Notice that we are writing these series as if their coefficients were the derivatives of analytic functions and at but we are not actually imposing any convergence conditions on them. Series of this form are often referred to as exponential generating functions. The composition

is well-defined. Let and expand

The problem is to understand the coefficients of in terms of the coefficients of and This problem was posed and solved by Hurwitz in the nineteenth century in connection with what is now a classical problem in enumerative geometry, namely the question of counting branched covers of the Riemann sphere with specified ramification data (I hope to touch on this problem more extensively later in the course).

Exercise 1: Compute the first few coefficients of by hand.

Exercise 2: Does it make sense to do this computation using the chain rule? Why or why not? Do you get correct formulas by this method?

Let’s consider first an easier problem, which well serve as a stepping stone. Consider two series

and let

be their product. It is then immediate that

This algebraic expression also has combinatorial meaning, which becomes clear if we sacrifice its elegance and rewrite it in the garish form

The enumerative meaning of this expression is the following. Suppose that is the number of combinatorial structures of Type A which can be built on and similarly is the number of combinatorial structures of Type B which can be built on Then, is the number of ways to build a hybrid combinatorial structure of Type AB on by choosing disjoint subsets and building a type A structure on and a Type B structure on This can be iterated: the product of exponential generating functions is an exponential generating function whose coefficients count the number of ways to select a list of disjoin subsets of whose union is and then build a Type A_j structure on for each

Exercise 3: Read this (easy), this (harder), or this (hardest).

We can now state the main theorem on composition of exponential generating functions, A partition of is a set of disjoint nonempty subsets of whose union is The elements of are called its blocks, and we denote by the number of blocks in Let denote the set of all partitions of

Theorem (The Compositional Formula): With notation as above, we have

Proof: Write

use the combinatorial expansion and reorganize the sum.

— Q.E.D.

Corollary (The Exponential Formula): If then

The combinatorial meaning of the Exponential Formula is the following: if is the exponential generating function for a class of “connected” combinatorial structures The Exponential Formula gets used all over the place — probabilists call it the moment-cumulant formula, while physicists often call it the linked cluster expansion and invoke it using colorful phrases like “connected vacuum bubbles exponentiate.” A basic example is the following: let be the number of simple graphs with vertex set let be the number of connected simply graphs with vertex set and form the generating functions

Then and the sequences and are related as in the Exponential Formula.

Now we return to Problem 1 above, which is to compute the coefficients in the expansion

We do this using the Exponential Formula: we have

where

The Exponential Formula now tells us that the coefficients we seek are given by

Thus, we have solved Problem 1: we see that

with the number of permutations of consisting of cycles, each of length at least

Exercise 4: Do the same computation instead using the Compositional Formula with

Let be an -dimensional vector space equipped with a scalar product Recall from Lecture 16 that an operator is said to be selfadjoint (or symmetric) if

Also recall from Lecture 18 that is said to be semisimple if there exists a basis of consisting of eigenvectors of The goal of this lecture is to prove the following cornerstone result in linear algebra.

Theorem 1 (Spectral Theorem for selfadjoint operators): If is selfadjoint, then it is semisimple.

The proof of this important theorem occupies the remainder of this lecture. It is a constructive argument that builds an eigenbasis for one vector at a time. A nice feature of the construction is that the eigenbasis it outputs is an orthonormal basis of

Let us begin with an important observation on a special subspecies of selfadjoint operators.

Definition 1: A selfadjoint operator is said to be nonnegative if the associated quadratic form is nonnegative, i.e. if the function defined by

satisfies for all

Any nonnegative selfadjoint operator has the property that membership in its kernel is certified by vanishing of

Lemma 1: If is a nonnegative selfadjoint operator, then if and only if

Proof: One direction of this equivalence is obvious: if then

The proof of the converse statement is similar to the proof of the Cauchy-Schwarz inequality. More precisely, suppose that and let be any number and let be an arbitrary vector. We have

Using the definition of together with the fact that is selfadjoint, this simplifies to

and since this further simplifies to

Now, as a function of the righthand side of this equation is a parabola, and since this parabola is upward=opening. Moreover, since the lefthand side satisfies the lowest point of this parabola cannot lie below the line and this forces

But the vector was chosen arbitrarily, so the above equation holds for any in particular We thus have

which means that i.e.

— Q.E.D.

Now, let be any selfadjoint operator. We are going to use the Lemma just established to prove that admits an eigenvector ; the argument even gives a description of the corresponding eigenvalue

Consider the unit sphere in the Euclidean space i.e. the set

of all vectors of length The quadratic form is a continuous function, and hence by the Extreme Value Theorem the minimum value of on the sphere,

does indeed exist, and is moreover achieved at a vector at which the minimum is achieved, i.e.

Theorem 2: The minimum of on the unit sphere is an eigenvalue of and the minimizer lies in the eigenspace

Proof: By definition of as the minimum value of we have that

Since for any the above inequality can be rewritten as

But actually, this implies that

since every vector in is a nonnegative scalar multiple of a vector of unit length (make sure you understand this). We thus have that

This says that the selfadjoint operator is nonnegative. Moreover, we have that

Thus, by Lemma 1, we have that meaning that

or equivalently

— Q.E.D.

Theorem 2 has established that an arbitrary selfadjoint operator has an eigenvector. However, this seems to be a long way from Theorem 1, which makes the much stronger assertion that has linearly independent eigenvectors. In fact, the distance from Theorem 2 to Theorem 1 is not so long as it may seem. To see why, we need to introduce one more very important concept.

Defintion 2: Let be a linear operator, and let be a subspace of We say that is invariant under if

The meaning of this definition is that if is invariant under then may be considered as a linear operator on the smaller space i.e. as an element of the algebra

Let us adorn the eigenvalue/eigenvector pair produced by Theorem 2 with a subscript, writing this pair as Consider the orthogonal complement of the line spanned by i.e. the subspace of given by

Proposition 1: The subspace is invariant under

Proof: We have to prove that if is orthogonal to the eigenvector of then so is This follows easily from the fact that is selfadjoint:

— Q.E.D.

The effect of Proposition 1 is that we may consider as a selfadjoint operator defined on the -dimensional subspace But this means that we can simply apply Theorem 2 again, with replacing We will then get a new eigenvector/eigenvalue pair where

is the minimum value of on the unit sphere in the Euclidean space and is a vector at which the minimum is achieved,

By construction, is a unit vector orthogonal to so that in particular is a linearly independent set in Moreover, we have that since is a subset of

Let be a vector space, and let us consider the algebra as a kind of ecosystem consisting of various life forms of varying complexity. We now move on to the portion of the course which is concerned with the taxonomy of linear operators — their classification and division into various particular classes.

The simplest organisms in the ecosystem are operators which act by scaling every vector by a fixed number ; these are the single-celled organisms of the operator ecosystem.

Definition 1: An operator is said to be simple if there exists a scalar such that

— Q.E.D.

Simple operators really are very simple, in the sense that they are no more complicated than numbers. Indeed, Definition 1 is equivalent to saying that where is the identity operator, which plays the role of the number in the algebra meaning that it is the multiplicative identity in this algebra. Simple operators are extremely easy to manipulate algebraically: if then we have

for any nonnegative integer and more generally if is any polynomial in a single variable then we have

Exercise 1: Prove the above formula.

The formula even works in the case that is a negative integer, provided that equivalently, the simple operator is invertible if and only if its inverse being If and are simple operators, then they commute,

just like ordinary numbers, and more generally

for any polynomial in two variables.

Exercise 2: Prove the above formula.

Another way to appreciate how truly simple simple operators are is to look at their matrices. In order to do this, we have to restrict to the case that the vector space is finite-dimensional. If is -dimensional, and is any basis of then the matrix of relative to is simply

where the off-diagonal matrix elements are all equal to zero. For this reason, simple operators are often called diagonal operators.

Most operators in are not simple operators — they are complicated multicellular organisms. So, to understand them we have to dissect them and look at their organs one at a time. Mathematically, this means that, given an operator we look for special vectors in on which acts as if it was simple.

Definition 2: A nonzero vector is said to be an eigenvector of an operator if

for some The scalar is said to be an eigenvalue of

The best case scenario is that we can find a basis of entirely made up of eigenvectors of

Defintion 3: An operator is said to be semisimple if there exists a basis of consisting of eigenvectors of Such a basis is called an eigenbasis for

As the name suggests, semisimple operators are pretty simple, but not quite as simple as simple operators. In particular, every simple operator is semisimple, because if is simple then every nonzero vector in is an eigenvector of and hence any basis in is an eigenbasis for The converse, however, is not true.

Let be an -dimensional vector space, and let be a semisimple operator. By definition, this means that there exists a basis in consisting of eigenvectors of This in turn means that there exist numbers such that

If then is simple, but if these numbers are not all the same then it is not. However, even if all these numbers are different, the matrix of relative to will still be a diagonal matrix, i.e. it will have the form

For this reason, semisimple operators are often called diagonalizable operators. Note the shift in terminology from “diagonal,” for simple, to “diagonalizable,” for semisimple. The former term suggest an immutable characteristic, independent of basis, whereas the latter indicates that some action must be taken, in that a special basis must be found to reveal diagonal form. More precisely, the matrix of a semisimple operator is not diagonal with respect to an arbitrary basis; the definition only says that the matrix of is diagonal relative to some basis.

Most linear operators are not semisimple — indeed, there are plenty of operators that have no eigenvectors at all. Consider the operator

which rotates a vector counterclockwise through the angle The matrix of this operator relative to the standard basis

of is

If then , so that is a simple operator: are eigenvectors, with eigenvalues If then and again is simple, with the same eigenvectors and eigenvalues However, taking any other value of for example rotation through a right angle, it is geometrically clear that is never a scalar multiple of so that has no eigenvectors at all. In particular, it is not semisimple.

Let us now formulate necessary and sufficient conditions for an operator to be semisimple. In this endeavor it is psychologically helpful to reorganize the eigenvector/eigenvalue definition by thinking of eigenvalues as the primary objects, and eigenvectors as secondary objects associated to them.

Defintion 4: The spectrum of an operator is the set defined by

For each the set defined by

is called the –eigenspace of The dimension of is called the geometric multiplicity of

In these terms, saying that is a simple operator means that the spectrum of consists of a single number,

and that the corresponding eigenspace exhausts

At the other extreme, the rotation operator considered above has empty spectrum,

and thus does not have any eigenspaces.

Proposition 1: For any , for each the eigenspace is a subspace of

Proof: First, observe that because

Second, is closed under scalar multiplication: if then

Third, is closed under vector addition: if then

— Q.E.D.

So, the eigenspaces of an operator constitute a collection of subspaces of of indexed by the numbers A key feature of these subspaces is that they are independent of one another.

Theorem 1: Suppose that are distinct eigenvalues of an operator Let be nonzero vectors such that for each Then is a linearly independent set.

Proof: We prove this by induction on The base case is and in this case the assertion is simply that the set consisting of a single eigenvector of is linearly independent. This is true, since eigenvectors are nonzero by definition.

For the induction step, suppose that is a linearly dependent set. Then, there exist numbers not all equal to zero, such that

Let us suppose that Applying the operator to both sides of the above vector equation, we get

On the other hand, we can multiply the original vector equation by any scalar and it remains true; in particular, we have

Now, subtracting this third equation from the second equation, we obtain

By the induction hypothesis, is a linearly independent set, and hence all the coefficients in this vector equation are zero. In particular, we have

But this is impossible, since and Hence, the set cannot be linearly dependent — it must be linearly independent.

— Q.E.D.

Restricting to the case that is finite-dimensional, Theorem 1 has the following crucial consequences.

Corollary 1: is semisimple if and only if

Proof: Suppose first that is semisimple. By definition, this means that the span of the eigenspaces of is all of

Thus

By Theorem 1, we have

and hence

Conversely, suppose that the sum of the dimensions of the eigenspaces of is equal to the dimension of For each let be a basis of the eigenspace Then, by Theorem 1, the set

is a linearly independent set, and hence a basis of the subspace of Thus

Since by hypothesis we have

this implies that

which in turn implies that

Thus is a basis of consisting of eigenvectors of whence is semisimple.

— Q.E.D.

Corollay 3: If then is semisimple.

Proof: To say that is equivalent to saying that the spectrum of consists of distinct numbers,

Sampling a collection of nonzero vectors from each corresponding eigenspace,

we get a set of eigenvectors of By Theorem 1, is a linearly independent set, hence it is a basis of

In Lecture 13, we discussed matrix representations of linear transformations between finite-dimensional vector spaces. In this lecture, we consider linear transformations between finite-dimensional Euclidean spaces, and discuss the relationship between the scalar product and the matrix representation of linear transformations. Note that any vector space can be promoted to a Euclidean space by choosing a basis in and defining to be the unique scalar product on such that is orthonormal.

Let and be Euclidean spaces; by abuse of notation, we will denote the scalar product in each of these spaces by the same symbol . Let be an orthonormal basis in and let be an orthonormal basis in Let be a linear transformation.

Definition 1: The matrix elements of relative to the bases and are the scalar products

The reason the number is called a “matrix element” of is that this number is exactly the -element of the matrix of defined in Lecture 13. Indeed, if

then

where the last equality follows from the orthonormality of However, one can note that it is not actually necessary to assume that and are finite-dimensional in order for the matrix elements of to be well-defined. However, we will always make this assumption, and thus in more visual form, we have that

The connection between matrices and scalar products is often very useful for performing computations which would be much more annoying without the use of scalar products. A good example is change of basis for linear operators. The setup here is that so that and are two (possibly) different orthonormal bases of the same Euclidean space. Given an operator we would like to understand the relationship between the two matrices

which represent the operator relative to the bases and respectively. In order to do this, let us consider the linear operator uniquely defined by the equations

Why do these equations uniquely determine ? Because, for any we have

Let us observe that the operator we have defined is an automorphism of i.e. it has an inverse. Indeed, it is clear that the linear operator uniquely determined by the equations

is the inverse of Operators which transform orthonormal bases into orthonormal bases have a special name.

Definition 2: An operator is said to be an orthogonal operator if it preserves orthonormal bases: for any orthonormal basis in the set is again an orthonormal basis in .

Note that every orthogonal operator is invertible, since we can always define just as we did above. In particular, the operators we defined above by are orthogonal operators.

Proposition 1: An operator is orthogonal if and only if

Proof: Observe that, by linearity of and bilinearity of it is sufficient to prove the claim in the case that and for some where is an orthonormal basis of

Suppose that is an orthogonal operator. Let Then is an orthonormal basis of and consequently we have

Conversely, suppose that

We then have that so that is an orthonormal basis of and thus is an orthogonal operator.

— Q.E.D.

Proposition 2: An operator is orthogonal if and only if it is invertible and

Proof: Suppose first that is orthogonal. Then, is invertible and is also orthonal, and hence for any we have

Conversely, suppose that is invertible and

Then, for any we have

whence is orthogonal by Proposition 1.

— Q.E.D.

Now let us return to the problem that we were working on prior to our digression into the generalities of orthogonal operators, namely that of computing the relationship between the matrices . We have

where we used Proposition 2 to obtain the second equality. Thus, we have the matrix equation

where on the right hand side we are using the fact that

is an algebra isomorphism, as in Lecture 13, which means that the matrix representing a product of operators is the product of the matrices representing each operator individually. This relationship between is usually phrased as the statement that the matrix representing the operator in the “new” basis is obtained from the matrix representing in the “old” basis by “conjugating” it by the of the matrix by the matrix where is the orthogonal operator that transforms the old basis into the new basis.

We met linear transformations for the first time back in Lecture 2, and have encountered them a handful of times in subsequent lectures. In this lecture, we begin a systematic study of linear transformation that will take up the rest of the course. As with the study of any new species, there will be a lot of classifying: we will divide up linear transformations into various classes defined by certain common features which to a large extent determine their behavior. Before classifying linear transformations into various special species and families, let us consider the ecosystem of all linear transformations as a collective whole.

Let and be vector spaces, and let be the set of all linear transformations This seemingly strange notation stems from the fact that is a fancy alternative name for linear transformations is homomorphisms, which roughly means “same shape” in Greek.

Theorem 1: is a vector space.

Proof: In order to promote from simply a set to a vector space, we need to give a rule for adding and scaling linear transformations. These rules are simply and natural. If linear transformations and are scalars we define a new linear transformation by

One must check first that the function from to so defined satisfies the linear transformation axioms, and then that the set equipped with these notions of addition and scalar multiplication satisfies the vector spaces axioms. This is left as an exercise to the reader (it is a long but easy exercise which you should do at least once).

— Q.E.D.

Now let us start asking questions about the vectors in i.e. considering what properties a linear transformation from to may or may not have. Given one of the first things we would like to know is whether it is injective and/or surjective. These questions reduce to questions about certain subspaces of and associated to

Definition 1: The kernel of is the subset of defined by

and the image of is the subset of defined by

You can think of the kernel and image of as the solution sets to two different equations involving The kernel in terms of certain vector equations associated to First, the kernel of is the set of solutions to the equation

where is an unknown vector in Second, the image of is the set of all such that the equation

has a solution.

Proposition 1: The kernel of is a subspace of , and the image of is a subspace of

Proof: Since is a linear transformation, it is by definition the case that so that Now let be any two vectors in the kernel of and let be any two scalars. We then have

which shows that is closed under taking linear combinations. Hence, is a subspace of

Now consider the image of Since is a linear transformation, it is by definition the case that so that Now let be any two vectors in the image of Then, by definition of there exist vectors such that So, for any scalars we have that

which shows that is closed under taking linear combinations.

— Q.E.D.

Proposition 2: The linear transformation is injective if and only if and surjective if and only if

Proof: First we consider the relationship between the injectivity of and its kernel. Suppose that Let be vectors such that This is equivalent to

which says that Since the only vector in is this forces which means that and we conclude that is injective. Conversely, suppose we know that is injective. Let be any two vectors in the kernel of Then Since is injective, this forces which says that any two vectors in are equal to one another. But this means that any vector in is equal to and we conclude that

Now let us consider the relationship between the surjectivity of and its image. Suppose that This is exactly what it means for the function to be surjective: every element in the codomain of is in fact in the range of Conversely, suppose we know that is surjective. By definition of surjectivity, this means that

— Q.E.D.

We can use the above propositions to gain some insight into linear equations, meaning equations of the form

where is a given linear transformation from to is a given vector in and is an unknown vector in

Theorem 1: The solution set of the above linear equation is i.e. the set of of all vectors in of the form where is any particular vector such that and

Proof: We begin by first consider the homogenous equation associated to the linear equation we want to solve, which is

By definition, the solution set of this homogeneous equation is Now, if is any vector such that then we also have

which shows that all vectors in are solutions of the equation

Conversely, suppose that is a vector in such that Then, we have

which shows that is a vector in That is, for some or equivalently This shows that all solutions of belong to the set

— Q.E.D.

We can now make the following statements about solutions of the linear equation :

The equation has a solution if and only if .

If the equation has a solution, this solution is unique if and only if

If the equation has a solution but the solution is not unique, then the equation has infinitely many solutions.

Exercise 1: Arrange the above information into a tree diagram which shows the relationship between the various cases.

In Lecture 12, we will think in more detail about the structure of the vector space in the case that and are finite-dimensional. In the finite-dimensional case, we shall see that everything we might want to know can be expressed in the language of matrices.