Let us illustrate the diagrammatic expansion of integrals from Lecture 13 in a specific case: we will bring things full circle and look at “Stirling’s quantum field theory.” Going back to the very beginning, start with the Euler integral
Now make the change of variables to get
and we’re in position to apply our diagrammatic expansion of the remaining integral.
The nice thing about the diagrammatic expansion is that most of it is “universal,” and it is not until the last step that we have to worry about what the smooth function in the exponent of the integrand actually is. In particular, from Lecture 13, we have that the coefficient of is the sum of the following three numbers
where are the three graphs shown at the end of Lecture 13, in the same order listed there. These are precisely the graphs of cyclomatic number in which all vertices have degree at most the exponent of is the number of vertices, and the monomials encode the sequence of vertex degrees. Now we just have to calculate the order of the automorphism groups of these graphs: we obtain
so that the coefficient of in the asymptotic expansion of the integral is
Note that this has absolutely no dependence on what the integral actually is, so far: this dependence only enters at the last stage, where we plug in the values of the -coefficients, which for the integral in Stirling’s QFT are all identically equal to So the first two terms in the sum above cancel, leaving us with the conclusion that
as Now to get the next order in the asymptotic expansion, the coefficient of you have to write out all graphs of cyclomatic number with vertices of degree at least calculate the order of their automorphism groups, and evaluate as above. I am going to come back later and do this for you, but in the meantime you may want to try it yourself. It’s pretty tedious, but instructive. Actually, the fact that it’s tedious is an important takeaway: you can simplify your life greatly if you are willing to settle for the asymptotic expansion of since then only connected graphs will enter into the calculation of the coefficients in the asymptotic expansion, and there are a lot less of those.
For a more involved version of the same procedure, where the “graphs” involved are of a quite different nature, you can check out my lecture yesterday in the UC Berkeley combinatorics seminar. Don’t worry if you don’t understand much; we are going to look at matrix integrals together very soon.
We continue with the material from Lecture 10. Let be a smooth function on of the form
where is itself a smooth function with Taylor expansion of the form
where are real numbers such that the corresponding path integral
converges. The function is called the “potential,” and we view the action as a deformation of which corresponds to choosing all so that the potential and we get the exact evaluation
The by the Laplace principle we know that we have an asymptotic expansion
and we want to calculate the coefficients , which are functions of the parameters .
Taking this all back to Lecture 1, if we set and take the action
then we are looking at “Stirling’s QFT,” in which the potential has coefficients . The result of our computation should then be a some sort of understanding of the subleading corrections in Stirling’s approximation,
which are the “quantum corrections” in Stirling’s QFT, whatever that means.
To begin, let’s not yet think about the integral , and focus solely on the integrand
Let us focus even more narrowly on the non-Gaussian part of the integrand, and apply the Exponential Formula as in Lecture 3. First, we write
where . Then we have that the coefficients in the expansion
are given by
where the sum is over all partitions of , and the product is over the blocks of the partition . Since is the number of ways to cyclically order the subset , we have that
where denotes the number of -cycles in the disjoint cycle decomposition of the permutation , and is the total number of cycles. Thus the polynomial is a generating function for the permutations in which keeps track of cycle type; this polynomial is called the “augmented cycle index polynomial” of . The actual (i.e. non-augmented) cycle index polynomial is , and it is the basic tool in Polya’s theory of enumeration under group action. Equivalently, we can write the modified cycle index as
where the sum is over Young diagrams with cells and is the conjugacy class of permutations of cycle type .
Let’s integrate. We write
We know how to compute moments of the Gaussian distribution: we recall from Lecture 4 that
where is the number of fixed point free involutions in , i.e. permutations which factor into disjoint cycles of order (so is zero if is odd). We thus have the series
where , and the number is equal to the number of pairs in which is a fixed point free involution, and is a permutation in of cycle type . Such pairs of permutations are special objects, and they have a special name.
Definition 1: A pair of permutations such that is a fixed point free involution and all cycles of have length at least is called a combinatorial map.
The use of the word “map” in Definition 1 may seem strange, since you are probably more accustomed to seeing it appear in the following definition.
Definition 2: A topological map consists of a closed oriented surface together with a set of distinct points on and a set of curves on , each of which joins a point of to a point of in such a way that the curves in intersect only at points of , and moreover if and are deleted from then what remains are disjoint open sets, called “faces,” each of which is homeomorphic to a disc.
Topological maps are basic objects in graph theory: for example, the famous four color theorem asserts that in order to color the faces of a map on the sphere in such a way that faces which are bounded by a common curve have different colors, it is necessary and sufficient to have four colors. Indeed there is a clear connection between graphs and topological maps, in that starting from a graph we can try to embed it on a surface by choosing a set of points on in bijection with , and a set of curves on in bijection with , such that the requirements to be a topological map are satisfied (this may not be possible if the connectivity of is high and the genus of is low, and the minimal genus of such that this can be done is called the genus of .
The connection between combinatorial maps and topological maps is not quite as obvious, but it is still fairly straightforward. The construction is quite simple and natural. First, draw every cycle of as a polygon, with the elements permuted by the cycle being taken as labeling the edges of the polygon, and the edges being directed counterclockwise around the polygon. Now, identify the edges of these polygons in pairs according to the cycles of the fixed point free involution , ensuring that when we identify two edges they are glued with opposition orientations, i.e. forming a two-way street. This procedure produces from the pair a closed oriented surface together with a collection of points on the surface connected by curves meeting only at those points — a topological map. The points are images of the former vertices of the polygons after having been identified, and its curves are former edges of the polygons after having been identified; the faces of the map are the interiors of the polygons.
As a small example of the above construction, let us consider combinatorial maps of degree , i.e. pairs such that is a product of -cyles and is a -cycle. There are thus possibilities for and possibilities for , for a total of combinatorial maps of degree .
We consider those combinatorial maps as above with , the full forward cycle in . These are the following:
Now we depict the corresponding gluing construction as described above:
So, the three different combinatorial maps described above gave us only two different topological maps: the combinatorial maps and produced the same topological map. Thus, the correspondence between combinatorial maps and topological maps is many-to-one (as is the correspondence between graphs and topological maps). But the general picture that is emerging is that the asymptotic expansion of is a generating function for topological maps, with face degrees determined by the Maclaurin series of the potential . That is, topological maps are the Feynman diagrams of zero-dimensional scalar-valued quantum field theory. We will continue to develop this connection next week.
In this lecture we will actually start down the path to Feynman diagrams; how far down this path we will progress remains to be seen.
Feynman diagrams are combinatorial objects closely associated with an area of theoretical physics called quantum field theory, and so we will begin with a nesessarily brief and sketchy discussion of what QFT is. The truth is that nobody knows what quantum field theory “is” in the sense of a mathematical definition, but I can at least try to give you a sense of the mathematical structures that are involved, and the sorts of computations that physicists perform.
Let be a manifold, representing a “universe” which we want to understand. We consider functions on as measurements on this universe, or “observables.” We must say what the outputs of these observations: they may be numbers, or vectors, or operators, or something else. In other words, we want to specify a “target” for observations , and typically is some other manifold. So the space of observables is the function space . For example, one could take , a manifold which is supposed to model our physical experience of “spacetime,” and , meaning that our observables are numerical measurements on spacetime, which sounds reasonable and not at all problematic.
The next step is to define a “field,” which a measure on , the space of all functions . Physicists would like to prescribe this measure by giving its density against the background flat Lebesgue measure, and this density is supposed to be of the form
where is called the “semiclassical parameter” and is a functional called the “action.” The total flatness and uniformity of the Lebesgue measure represents entropy, while the density above represents energy in the sense that it leads to measure concentrating near the minimizers of the action, much as we saw when we initially discussed the Laplace method, and as is familiar from the setting of statistical mechanics.
Once we have prescribed the spaces and and the action , we are looking at something which is more or less what physicists call a QFT, and we start asking questions about it. One of the main things that physicists would like to do is to evaluate the integral
which is called the “partition function” or “path integral.” It is not expected that this can be done exactly in most cases; when is such that an exact evaluation of the corresponding is possible, we are looking at what is called a “free theory.” The goal then is to select a new action which is a “perturbation” of , i.e. is close to in some sense. One then tries to estimate the new path integral corresponding to in the limit . What is supposed to happen is that in this so-called “semiclassical limit,” there emerges an approximation of the form
The terms are called “quantum terms,” and the sum of all these terms is typically a divergent series, i.e. it has radius of convergence zero. So what this meant by the above is that there is that admits a complete asymptotic expansion: we have
as for any nonnegative integer . Now the question that what physicists want to address is how to compute the quantum coefficients . The idea is that these coefficients should all be computable in terms of the “correlation functions” of the free theory, meaning integrals of the form
The computation of these correlation functions is typically very intricate, and Feynman diagrams are a graphical bookkeeping device introduced by Feynman in order to help organize such computations. The theory is very well developed from a physical perspective, and is in regular, constant use in contemporary theoretical physics.
The problem is that none of the above makes sense mathematically: even when one makes very reasonable, realistic choices like and the corresponding space of functions is infinite-dimensional, and hence does not admit a Lebesgue measure. This means that the definition of the field as a measure which is absolutely continuous with respect to Lebesgue measure makes no sense, and the path integral
This leaves us with two options. One is to try make the above general construction rigorous by developing some kind of theory of functional integration that makes mathematical sense. This is an extremely interesting undertaking, but not one that we will discuss. The second option is to choose in such a way that is finite-dimensional. The only way this can happen is if one of is there zero-dimensional manifold consisting of a single point. Choosing is not advisable, since then is also zero-dimensional, which leads to a trivial theory. However, if we choose then , and we have something non-trivial, namely a toy model aptly named zero-dimensional quantum field theory.
What we are going to do now is develop the zero-dimensional quantum field theory corresponding to the data
The path integral corresponding this quantum field theory is
which is non-problematic provided we choose such that the integral converges. We have even identified previously an action which gives us a free theory: if we choose
the corresponding path integral evaluation is Lord Kelvin’s favorite formula:
And the next part of the construction, the semiclassical limit of the path integral corresponding to a more general action , is just the classical Laplace method, which we now state formally.
Theorem 1: Let be a (possibly infinite) interval, and let be a smooth function that attains a global minimum at a unique point . Then
where is a smooth function on such that
Theorem 1 tells us that, for any nonnegative integer , we have
where the sum in the brackets is just the th Maclaurin polynomial and the error term follows from Taylor’s theorem (it is important to note that is smooth, but not necessarily analytic, i.e. its Maclaurin series may be divergent, and even if it is convergent it need not sum to ). So we are in a position where we have the perfectly well-defined problem of computing the “quantum corrections” . This will be our goal in the next several lectures, and we will see that the Feynman diagrams which accompany this problem are classical combinatorial objects, namely maps on surfaces.
In this lecture we continue our long march towards Stirling’s formula. Let me begin by saying that, while we seem to be spending an absurd amount of time on this, Stirling’s approximation is being used as a case study in something much more general, Laplace’s method, and we will understand the general thing almost automatically once we understand the specific thing.
We return to the integral representation
In Lecture 2, we performed some elementary estimates which showed that, for all sufficiently small we have
where is a positive number depending on but not on This quantifies the intuition that the bulk contribution to the integral comes from a small neighborhood of the minimum of the action.
We now come to the problem of estimating this bulk contribution
this series converging absolutely and uniformly on compact subsets of to the correct value In Lecture 2, we observed that if we throw away all terms in the series expansion of beyond the quadratic term, our integral -integral becomes the truncated Gaussian integral
which can then be replaced by a total Gaussian integral at the cost of introducing an exponentially small error,
this being beneficial because the total Gaussian integral can be exactly evaluated,
Thus, the remaining question is to quantify the error incurred by replacing
We now want quantify the error incurred by throwing away all terms of the action beyond its dominant (quadratic) term. For small we naturally expect this error to be a small number which decays as increases, but this decay turns out to be much slower than exponential and represents the main source of error in the approximation scheme we’re building.
To get started, we write
Next, we Maclaurin-expand the non-Gaussian part of the integrand, writing
where is the th derivative of at Assuming this series converges uniformly on indeed, it is the Maclaurin series of the function and we aren’t changing the location of singularities by subtracting an entire function. We can thus interchange the order of summation and integration, thereby obtaining
We have thus reduced the initial problem to two problems.
Problem 1: Compute the coefficients
Problem 2: Compute the integrals
We will solve Problem 1 today, and Problem 2 on Thursday (in case you’d like to think ahead).
Problem 1 is pure algebraic combinatorics, being a question about composition in the algebra of formal power series This question can be phrased in greater generality, as follows. Let
be elements of such that the constant term of is zero. Notice that we are writing these series as if their coefficients were the derivatives of analytic functions and at but we are not actually imposing any convergence conditions on them. Series of this form are often referred to as exponential generating functions. The composition
is well-defined. Let and expand
The problem is to understand the coefficients of in terms of the coefficients of and This problem was posed and solved by Hurwitz in the nineteenth century in connection with what is now a classical problem in enumerative geometry, namely the question of counting branched covers of the Riemann sphere with specified ramification data (I hope to touch on this problem more extensively later in the course).
Exercise 1: Compute the first few coefficients of by hand.
Exercise 2: Does it make sense to do this computation using the chain rule? Why or why not? Do you get correct formulas by this method?
Let’s consider first an easier problem, which well serve as a stepping stone. Consider two series
be their product. It is then immediate that
This algebraic expression also has combinatorial meaning, which becomes clear if we sacrifice its elegance and rewrite it in the garish form
The enumerative meaning of this expression is the following. Suppose that is the number of combinatorial structures of Type A which can be built on and similarly is the number of combinatorial structures of Type B which can be built on Then, is the number of ways to build a hybrid combinatorial structure of Type AB on by choosing disjoint subsets and building a type A structure on and a Type B structure on This can be iterated: the product of exponential generating functions is an exponential generating function whose coefficients count the number of ways to select a list of disjoin subsets of whose union is and then build a Type A_j structure on for each
Exercise 3: Read this (easy), this (harder), or this (hardest).
We can now state the main theorem on composition of exponential generating functions, A partition of is a set of disjoint nonempty subsets of whose union is The elements of are called its blocks, and we denote by the number of blocks in Let denote the set of all partitions of
Theorem (The Compositional Formula): With notation as above, we have
use the combinatorial expansion and reorganize the sum.
Corollary (The Exponential Formula): If then
The combinatorial meaning of the Exponential Formula is the following: if is the exponential generating function for a class of “connected” combinatorial structures The Exponential Formula gets used all over the place — probabilists call it the moment-cumulant formula, while physicists often call it the linked cluster expansion and invoke it using colorful phrases like “connected vacuum bubbles exponentiate.” A basic example is the following: let be the number of simple graphs with vertex set let be the number of connected simply graphs with vertex set and form the generating functions
Then and the sequences and are related as in the Exponential Formula.
Now we return to Problem 1 above, which is to compute the coefficients in the expansion
We do this using the Exponential Formula: we have
The Exponential Formula now tells us that the coefficients we seek are given by
Thus, we have solved Problem 1: we see that
with the number of permutations of consisting of cycles, each of length at least
Exercise 4: Do the same computation instead using the Compositional Formula with
Let be an -dimensional vector space equipped with a scalar product Recall from Lecture 16 that an operator is said to be selfadjoint (or symmetric) if
Also recall from Lecture 18 that is said to be semisimple if there exists a basis of consisting of eigenvectors of The goal of this lecture is to prove the following cornerstone result in linear algebra.
Theorem 1 (Spectral Theorem for selfadjoint operators): If is selfadjoint, then it is semisimple.
The proof of this important theorem occupies the remainder of this lecture. It is a constructive argument that builds an eigenbasis for one vector at a time. A nice feature of the construction is that the eigenbasis it outputs is an orthonormal basis of
Let us begin with an important observation on a special subspecies of selfadjoint operators.
Definition 1: A selfadjoint operator is said to be nonnegative if the associated quadratic form is nonnegative, i.e. if the function defined by
satisfies for all
Any nonnegative selfadjoint operator has the property that membership in its kernel is certified by vanishing of
Lemma 1: If is a nonnegative selfadjoint operator, then if and only if
Proof: One direction of this equivalence is obvious: if then
The proof of the converse statement is similar to the proof of the Cauchy-Schwarz inequality. More precisely, suppose that and let be any number and let be an arbitrary vector. We have
Using the definition of together with the fact that is selfadjoint, this simplifies to
and since this further simplifies to
Now, as a function of the righthand side of this equation is a parabola, and since this parabola is upward=opening. Moreover, since the lefthand side satisfies the lowest point of this parabola cannot lie below the line and this forces
But the vector was chosen arbitrarily, so the above equation holds for any in particular We thus have
which means that i.e.
Now, let be any selfadjoint operator. We are going to use the Lemma just established to prove that admits an eigenvector ; the argument even gives a description of the corresponding eigenvalue
Consider the unit sphere in the Euclidean space i.e. the set
of all vectors of length The quadratic form is a continuous function, and hence by the Extreme Value Theorem the minimum value of on the sphere,
does indeed exist, and is moreover achieved at a vector at which the minimum is achieved, i.e.
Theorem 2: The minimum of on the unit sphere is an eigenvalue of and the minimizer lies in the eigenspace
Proof: By definition of as the minimum value of we have that
Since for any the above inequality can be rewritten as
But actually, this implies that
since every vector in is a nonnegative scalar multiple of a vector of unit length (make sure you understand this). We thus have that
This says that the selfadjoint operator is nonnegative. Moreover, we have that
Thus, by Lemma 1, we have that meaning that
Theorem 2 has established that an arbitrary selfadjoint operator has an eigenvector. However, this seems to be a long way from Theorem 1, which makes the much stronger assertion that has linearly independent eigenvectors. In fact, the distance from Theorem 2 to Theorem 1 is not so long as it may seem. To see why, we need to introduce one more very important concept.
Defintion 2: Let be a linear operator, and let be a subspace of We say that is invariant under if
The meaning of this definition is that if is invariant under then may be considered as a linear operator on the smaller space i.e. as an element of the algebra
Let us adorn the eigenvalue/eigenvector pair produced by Theorem 2 with a subscript, writing this pair as Consider the orthogonal complement of the line spanned by i.e. the subspace of given by
Proposition 1: The subspace is invariant under
Proof: We have to prove that if is orthogonal to the eigenvector of then so is This follows easily from the fact that is selfadjoint:
The effect of Proposition 1 is that we may consider as a selfadjoint operator defined on the -dimensional subspace But this means that we can simply apply Theorem 2 again, with replacing We will then get a new eigenvector/eigenvalue pair where
is the minimum value of on the unit sphere in the Euclidean space and is a vector at which the minimum is achieved,
By construction, is a unit vector orthogonal to so that in particular is a linearly independent set in Moreover, we have that since is a subset of
Let be a vector space, and let us consider the algebra as a kind of ecosystem consisting of various life forms of varying complexity. We now move on to the portion of the course which is concerned with the taxonomy of linear operators — their classification and division into various particular classes.
The simplest organisms in the ecosystem are operators which act by scaling every vector by a fixed number ; these are the single-celled organisms of the operator ecosystem.
Definition 1: An operator is said to be simple if there exists a scalar such that
Simple operators really are very simple, in the sense that they are no more complicated than numbers. Indeed, Definition 1 is equivalent to saying that where is the identity operator, which plays the role of the number in the algebra meaning that it is the multiplicative identity in this algebra. Simple operators are extremely easy to manipulate algebraically: if then we have
for any nonnegative integer and more generally if is any polynomial in a single variable then we have
Exercise 1: Prove the above formula.
The formula even works in the case that is a negative integer, provided that equivalently, the simple operator is invertible if and only if its inverse being If and are simple operators, then they commute,
just like ordinary numbers, and more generally
for any polynomial in two variables.
Exercise 2: Prove the above formula.
Another way to appreciate how truly simple simple operators are is to look at their matrices. In order to do this, we have to restrict to the case that the vector space is finite-dimensional. If is -dimensional, and is any basis of then the matrix of relative to is simply
where the off-diagonal matrix elements are all equal to zero. For this reason, simple operators are often called diagonal operators.
Most operators in are not simple operators — they are complicated multicellular organisms. So, to understand them we have to dissect them and look at their organs one at a time. Mathematically, this means that, given an operator we look for special vectors in on which acts as if it was simple.
Definition 2: A nonzero vector is said to be an eigenvector of an operator if
for some The scalar is said to be an eigenvalue of
The best case scenario is that we can find a basis of entirely made up of eigenvectors of
Defintion 3: An operator is said to be semisimple if there exists a basis of consisting of eigenvectors of Such a basis is called an eigenbasis for
As the name suggests, semisimple operators are pretty simple, but not quite as simple as simple operators. In particular, every simple operator is semisimple, because if is simple then every nonzero vector in is an eigenvector of and hence any basis in is an eigenbasis for The converse, however, is not true.
Let be an -dimensional vector space, and let be a semisimple operator. By definition, this means that there exists a basis in consisting of eigenvectors of This in turn means that there exist numbers such that
If then is simple, but if these numbers are not all the same then it is not. However, even if all these numbers are different, the matrix of relative to will still be a diagonal matrix, i.e. it will have the form
For this reason, semisimple operators are often called diagonalizable operators. Note the shift in terminology from “diagonal,” for simple, to “diagonalizable,” for semisimple. The former term suggest an immutable characteristic, independent of basis, whereas the latter indicates that some action must be taken, in that a special basis must be found to reveal diagonal form. More precisely, the matrix of a semisimple operator is not diagonal with respect to an arbitrary basis; the definition only says that the matrix of is diagonal relative to some basis.
Most linear operators are not semisimple — indeed, there are plenty of operators that have no eigenvectors at all. Consider the operator
which rotates a vector counterclockwise through the angle The matrix of this operator relative to the standard basis
If then , so that is a simple operator: are eigenvectors, with eigenvalues If then and again is simple, with the same eigenvectors and eigenvalues However, taking any other value of for example rotation through a right angle, it is geometrically clear that is never a scalar multiple of so that has no eigenvectors at all. In particular, it is not semisimple.
Let us now formulate necessary and sufficient conditions for an operator to be semisimple. In this endeavor it is psychologically helpful to reorganize the eigenvector/eigenvalue definition by thinking of eigenvalues as the primary objects, and eigenvectors as secondary objects associated to them.
Defintion 4: The spectrum of an operator is the set defined by
For each the set defined by
is called the –eigenspace of The dimension of is called the geometric multiplicity of
In these terms, saying that is a simple operator means that the spectrum of consists of a single number,
and that the corresponding eigenspace exhausts
At the other extreme, the rotation operator considered above has empty spectrum,
and thus does not have any eigenspaces.
Proposition 1: For any , for each the eigenspace is a subspace of
Proof: First, observe that because
Second, is closed under scalar multiplication: if then
Third, is closed under vector addition: if then
So, the eigenspaces of an operator constitute a collection of subspaces of of indexed by the numbers A key feature of these subspaces is that they are independent of one another.
Theorem 1: Suppose that are distinct eigenvalues of an operator Let be nonzero vectors such that for each Then is a linearly independent set.
Proof: We prove this by induction on The base case is and in this case the assertion is simply that the set consisting of a single eigenvector of is linearly independent. This is true, since eigenvectors are nonzero by definition.
For the induction step, suppose that is a linearly dependent set. Then, there exist numbers not all equal to zero, such that
Let us suppose that Applying the operator to both sides of the above vector equation, we get
On the other hand, we can multiply the original vector equation by any scalar and it remains true; in particular, we have
Now, subtracting this third equation from the second equation, we obtain
By the induction hypothesis, is a linearly independent set, and hence all the coefficients in this vector equation are zero. In particular, we have
But this is impossible, since and Hence, the set cannot be linearly dependent — it must be linearly independent.
Restricting to the case that is finite-dimensional, Theorem 1 has the following crucial consequences.
Corollary 1: is semisimple if and only if
Proof: Suppose first that is semisimple. By definition, this means that the span of the eigenspaces of is all of
By Theorem 1, we have
Conversely, suppose that the sum of the dimensions of the eigenspaces of is equal to the dimension of For each let be a basis of the eigenspace Then, by Theorem 1, the set
is a linearly independent set, and hence a basis of the subspace of Thus
Since by hypothesis we have
this implies that
which in turn implies that
Thus is a basis of consisting of eigenvectors of whence is semisimple.
Corollay 3: If then is semisimple.
Proof: To say that is equivalent to saying that the spectrum of consists of distinct numbers,
Sampling a collection of nonzero vectors from each corresponding eigenspace,
we get a set of eigenvectors of By Theorem 1, is a linearly independent set, hence it is a basis of
In Lecture 13, we discussed matrix representations of linear transformations between finite-dimensional vector spaces. In this lecture, we consider linear transformations between finite-dimensional Euclidean spaces, and discuss the relationship between the scalar product and the matrix representation of linear transformations. Note that any vector space can be promoted to a Euclidean space by choosing a basis in and defining to be the unique scalar product on such that is orthonormal.
Let and be Euclidean spaces; by abuse of notation, we will denote the scalar product in each of these spaces by the same symbol . Let be an orthonormal basis in and let be an orthonormal basis in Let be a linear transformation.
Definition 1: The matrix elements of relative to the bases and are the scalar products
The reason the number is called a “matrix element” of is that this number is exactly the -element of the matrix of defined in Lecture 13. Indeed, if
where the last equality follows from the orthonormality of However, one can note that it is not actually necessary to assume that and are finite-dimensional in order for the matrix elements of to be well-defined. However, we will always make this assumption, and thus in more visual form, we have that
The connection between matrices and scalar products is often very useful for performing computations which would be much more annoying without the use of scalar products. A good example is change of basis for linear operators. The setup here is that so that and are two (possibly) different orthonormal bases of the same Euclidean space. Given an operator we would like to understand the relationship between the two matrices
which represent the operator relative to the bases and respectively. In order to do this, let us consider the linear operator uniquely defined by the equations
Why do these equations uniquely determine ? Because, for any we have
Let us observe that the operator we have defined is an automorphism of i.e. it has an inverse. Indeed, it is clear that the linear operator uniquely determined by the equations
is the inverse of Operators which transform orthonormal bases into orthonormal bases have a special name.
Definition 2: An operator is said to be an orthogonal operator if it preserves orthonormal bases: for any orthonormal basis in the set is again an orthonormal basis in .
Note that every orthogonal operator is invertible, since we can always define just as we did above. In particular, the operators we defined above by are orthogonal operators.
Proposition 1: An operator is orthogonal if and only if
Proof: Observe that, by linearity of and bilinearity of it is sufficient to prove the claim in the case that and for some where is an orthonormal basis of
Suppose that is an orthogonal operator. Let Then is an orthonormal basis of and consequently we have
Conversely, suppose that
We then have that so that is an orthonormal basis of and thus is an orthogonal operator.
Proposition 2: An operator is orthogonal if and only if it is invertible and
Proof: Suppose first that is orthogonal. Then, is invertible and is also orthonal, and hence for any we have
Conversely, suppose that is invertible and
Then, for any we have
whence is orthogonal by Proposition 1.
Now let us return to the problem that we were working on prior to our digression into the generalities of orthogonal operators, namely that of computing the relationship between the matrices . We have
where we used Proposition 2 to obtain the second equality. Thus, we have the matrix equation
where on the right hand side we are using the fact that
is an algebra isomorphism, as in Lecture 13, which means that the matrix representing a product of operators is the product of the matrices representing each operator individually. This relationship between is usually phrased as the statement that the matrix representing the operator in the “new” basis is obtained from the matrix representing in the “old” basis by “conjugating” it by the of the matrix by the matrix where is the orthogonal operator that transforms the old basis into the new basis.
We met linear transformations for the first time back in Lecture 2, and have encountered them a handful of times in subsequent lectures. In this lecture, we begin a systematic study of linear transformation that will take up the rest of the course. As with the study of any new species, there will be a lot of classifying: we will divide up linear transformations into various classes defined by certain common features which to a large extent determine their behavior. Before classifying linear transformations into various special species and families, let us consider the ecosystem of all linear transformations as a collective whole.
Let and be vector spaces, and let be the set of all linear transformations This seemingly strange notation stems from the fact that is a fancy alternative name for linear transformations is homomorphisms, which roughly means “same shape” in Greek.
Theorem 1: is a vector space.
Proof: In order to promote from simply a set to a vector space, we need to give a rule for adding and scaling linear transformations. These rules are simply and natural. If linear transformations and are scalars we define a new linear transformation by
One must check first that the function from to so defined satisfies the linear transformation axioms, and then that the set equipped with these notions of addition and scalar multiplication satisfies the vector spaces axioms. This is left as an exercise to the reader (it is a long but easy exercise which you should do at least once).
Now let us start asking questions about the vectors in i.e. considering what properties a linear transformation from to may or may not have. Given one of the first things we would like to know is whether it is injective and/or surjective. These questions reduce to questions about certain subspaces of and associated to
Definition 1: The kernel of is the subset of defined by
and the image of is the subset of defined by
You can think of the kernel and image of as the solution sets to two different equations involving The kernel in terms of certain vector equations associated to First, the kernel of is the set of solutions to the equation
where is an unknown vector in Second, the image of is the set of all such that the equation
has a solution.
Proposition 1: The kernel of is a subspace of , and the image of is a subspace of
Proof: Since is a linear transformation, it is by definition the case that so that Now let be any two vectors in the kernel of and let be any two scalars. We then have
which shows that is closed under taking linear combinations. Hence, is a subspace of
Now consider the image of Since is a linear transformation, it is by definition the case that so that Now let be any two vectors in the image of Then, by definition of there exist vectors such that So, for any scalars we have that
which shows that is closed under taking linear combinations.
Proposition 2: The linear transformation is injective if and only if and surjective if and only if
Proof: First we consider the relationship between the injectivity of and its kernel. Suppose that Let be vectors such that This is equivalent to
which says that Since the only vector in is this forces which means that and we conclude that is injective. Conversely, suppose we know that is injective. Let be any two vectors in the kernel of Then Since is injective, this forces which says that any two vectors in are equal to one another. But this means that any vector in is equal to and we conclude that
Now let us consider the relationship between the surjectivity of and its image. Suppose that This is exactly what it means for the function to be surjective: every element in the codomain of is in fact in the range of Conversely, suppose we know that is surjective. By definition of surjectivity, this means that
We can use the above propositions to gain some insight into linear equations, meaning equations of the form
where is a given linear transformation from to is a given vector in and is an unknown vector in
Theorem 1: The solution set of the above linear equation is i.e. the set of of all vectors in of the form where is any particular vector such that and
Proof: We begin by first consider the homogenous equation associated to the linear equation we want to solve, which is
By definition, the solution set of this homogeneous equation is Now, if is any vector such that then we also have
which shows that all vectors in are solutions of the equation
Conversely, suppose that is a vector in such that Then, we have
which shows that is a vector in That is, for some or equivalently This shows that all solutions of belong to the set
We can now make the following statements about solutions of the linear equation :
The equation has a solution if and only if .
If the equation has a solution, this solution is unique if and only if
If the equation has a solution but the solution is not unique, then the equation has infinitely many solutions.
Exercise 1: Arrange the above information into a tree diagram which shows the relationship between the various cases.
In Lecture 12, we will think in more detail about the structure of the vector space in the case that and are finite-dimensional. In the finite-dimensional case, we shall see that everything we might want to know can be expressed in the language of matrices.