## Math 262A: Lecture 14

Let us illustrate the diagrammatic expansion of integrals from Lecture 13 in a specific case: we will bring things full circle and look at “Stirling’s quantum field theory.” Going back to the very beginning, start with the Euler integral

$N! = \int_0^\infty t^Ne^{-t} \mathrm{d}t.$

Now make the change of variables $t=Ne^x$ to get

$N! = \int\limits_{\mathbb{R}} (Ne^x)^N e^{-Ne^x} \mathrm{d}x \\ =N^{N+1} \int\limits_\mathbb{R} e^{Nx-Ne^x} \mathrm{d}x \\ = e^{-N}N^N \int\limits_\mathbb{R} e^{N+Nx-Ne^x} \mathrm{d}x \\ =e^{-N}N^{N+1} \int\limits_\mathbb{R} e^{-N(\frac{x^2}{2}+\sum_{d=3}^\infty \frac{x^d}{d!})} \mathrm{d}x \\ =\sqrt{\frac{2\pi}{N}}e^{-N}N^{N+1} \int\limits_\mathbb{R} e^{-N(\frac{x^2}{2}+\sum_{d=3}^\infty \frac{x^d}{d!})} \frac{\mathrm{d}x}{\sqrt{\frac{2\pi}{N}}},$

and we’re in position to apply our diagrammatic $N\to \infty$ expansion of the remaining integral.

The nice thing about the diagrammatic expansion is that most of it is “universal,” and it is not until the last step that we have to worry about what the smooth function in the exponent of the integrand actually is. In particular, from Lecture 13, we have that the coefficient of $N^{-1}$ is the sum of the following three numbers

$\frac{1}{|\mathrm{Aut}\Gamma_1|}(-1)^1p_4 + \frac{1}{|\mathrm{Aut}\Gamma_2|}(-1)^2p_3^2+\frac{1}{|\mathrm{Aut}\Gamma_3|}(-1)^2p_4,$

where $\Gamma_1,\Gamma_2,\Gamma_3$ are the three graphs shown at the end of Lecture 13, in the same order listed there. These are precisely the graphs of cyclomatic number $2$ in which all vertices have degree at most $3,$ the exponent of $(-1)$ is the number of vertices, and the monomials $p_3^2,p_3^2,p_4$ encode the sequence of vertex degrees. Now we just have to calculate the order of the automorphism groups of these graphs: we obtain

$|\mathrm{Aut} \Gamma_1| = 8,\ |\mathrm{Aut} \Gamma_2| = 8,\ |\mathrm{Aut} \Gamma_3|=12,$

so that the coefficient of $N^{-1}$ in the asymptotic expansion of the integral is

$\frac{1}{8}(-1)^1p_4 + \frac{1}{8}(-1)^2p_3^2+\frac{1}{12}(-1)^2p_4.$

Note that this has absolutely no dependence on what the integral actually is, so far: this dependence only enters at the last stage, where we plug in the values of the $p$-coefficients, which for the integral in Stirling’s QFT are all identically equal to $1.$ So the first two terms in the sum above cancel, leaving us with the conclusion that

$N! = \sqrt{2\pi} e^{-N}N^{N+\frac{1}{2}}(1+\frac{1}{12N} + o(\frac{1}{N^2}))$

as $N \to \infty.$ Now to get the next order in the asymptotic expansion, the coefficient of $N^{-2},$ you have to write out all graphs of cyclomatic number $3$ with vertices of degree at least $3,$ calculate the order of their automorphism groups, and evaluate as above. I am going to come back later and do this for you, but in the meantime you may want to try it yourself. It’s pretty tedious, but instructive. Actually, the fact that it’s tedious is an important takeaway: you can simplify your life greatly if you are willing to settle for the asymptotic expansion of $\log N!,$ since then only connected graphs will enter into the calculation of the coefficients in the asymptotic expansion, and there are a lot less of those.

For a more involved version of the same procedure, where the “graphs” involved are of a quite different nature, you can check out my lecture yesterday in the UC Berkeley combinatorics seminar. Don’t worry if you don’t understand much; we are going to look at matrix integrals together very soon.

## Math 262A: Lecture 11

We continue with the material from Lecture 10. Let $S$ be a smooth function on $\mathbb{R}$ of the form

$S(x) = \frac{x^2}{2} + V(x),$

where $V$ is itself a smooth function with Taylor expansion of the form

$V(x) = \sum_{d=3}^\infty t_d \frac{x^d}{d},$

where $t_3,t_4,t_3,\dots$ are real numbers such that the corresponding path integral

$\mathcal{Z} = \int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x$

converges. The function $V$ is called the “potential,” and we view the action $S(x)$ as a deformation of $S_0(x) = \frac{x^2}{2}$ which corresponds to choosing all $t_i=0$ so that the potential $V=0$ and we get the exact evaluation

$\mathcal{Z}_0 = \int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x = \sqrt{2\pi \hbar}.$

The by the Laplace principle we know that we have an $\hbar \to 0$ asymptotic expansion

$\frac{1}{\mathcal{Z}_0} \int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x \sim \sum\limits_{k=0}^\infty a_k \hbar^k,$

and we want to calculate the coefficients $a_k$, which are functions of the parameters $t_i$.

Taking this all back to Lecture 1, if we set $\hbar = 1/N$ and take the action

$S(x) = x-\log(x) = \frac{x^2}{2} - \frac{x^3}{3} + \frac{x^4}{4}-\dots,$

then we are looking at “Stirling’s QFT,” in which the potential $V(x)$ has coefficients $t_d=(-1)^d$. The result of our computation should then be a some sort of understanding of the subleading corrections $a_k$ in Stirling’s approximation,

$N! \sim \sqrt{2\pi}\frac{N^{N+\frac{1}{2}}}{e^N}(1+\frac{a_1}{N} + \frac{a_2}{N^2} + \dots),$

which are the “quantum corrections” in Stirling’s QFT, whatever that means.

To begin, let’s not yet think about the integral $\mathcal{Z}_N$, and focus solely on the integrand

$e^{-\frac{1}{\hbar}(\frac{x^2}{2} + V(x))} = e^{-\frac{x^2}{2\hbar}}e^{-\frac{1}{\hbar}\sum_{d \geq 3} t_d \frac{x^d}{d}}.$

Let us focus even more narrowly on the non-Gaussian part $e^{-\frac{1}{\hbar}V(x)}$ of the integrand, and apply the Exponential Formula as in Lecture 3. First, we write

$V(x) = \sum\limits_{d=1}^\infty t_d\frac{x^d}{d} = \sum\limits_{d=1}^\infty t_d(d-1)!\frac{x^d}{d!},$

where $t_1=t_2=0$. Then we have that the coefficients $u_d$ in the expansion

$e^{-\frac{1}{\hbar}V(x)} = 1 + \sum_{d=1}^\infty u_d \frac{x^d}{d!}$

are given by

$u_d = \sum\limits_{P \in \mathrm{Par}(d)} \prod\limits_{B \in P} (-\frac{1}{\hbar})t_{|B|}(|B|-1)!,$

where the sum is over all partitions $P$ of $\{1,\dots,d\}$, and the product is over the blocks $B$ of the partition $P$. Since $(|B|-1)!$ is the number of ways to cyclically order the subset $B \subseteq \{1,\dots,d\}$, we have that

$u_d = \sum\limits_{\sigma \in \mathrm{S}(d)} \left(-\frac{1}{\hbar} \right)^{c(\sigma)} t_1^{c_1(\sigma)} \dots t_d^{c_d(\sigma)},$

where $c_k(\sigma)$ denotes the number of $k$-cycles in the disjoint cycle decomposition of the permutation $\sigma \in \mathrm{S}(d)$, and $c(\sigma)=c_1(\sigma)+\dots+c_d(\sigma)$ is the total number of cycles. Thus the polynomial $u_d=u_d(t_1,\dots,t_d)$ is a generating function for the permutations in $\mathrm{S}(d)$ which keeps track of cycle type; this polynomial is called the “augmented cycle index polynomial” of $\mathrm{S}(d)$. The actual (i.e. non-augmented) cycle index polynomial is $\frac{1}{d!} u_d$, and it is the basic tool in Polya’s theory of enumeration under group action. Equivalently, we can write the modified cycle index as

$u_d = \sum\limits_{\mu \vdash d} |C_\mu| \left(-\frac{1}{\hbar} \right)^{\ell(\mu)}\prod\limits_{i=1}^{\ell(\mu)} t_{\mu_i},$

where the sum is over Young diagrams with $d$ cells and $C_\mu \subseteq \mathrm{S}(d)$ is the conjugacy class of permutations of cycle type $\mu$.

Let’s integrate. We write

$\frac{\mathcal{Z}}{\mathcal{Z}_0} = \frac{1}{\mathcal{Z}_0} \int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar}(\frac{x^2}{2} + V(x))} \mathrm{d}x = 1+\sum\limits_{d=1}^\infty \frac{u_d}{d!} \int\limits_{\mathbb{R}} x^d e^{-\frac{x^2}{2\hbar}} \frac{\mathrm{d}x}{\mathcal{Z}_0}.$

We know how to compute moments of the Gaussian distribution: we recall from Lecture 4 that

$\int\limits_{\mathbb{R}} x^d e^{-\frac{x^2}{2\hbar}} \frac{\mathrm{d}x}{\mathcal{Z}_0} = \hbar^{\frac{d}{2}} (d-1)!!,$

where $(d-1)!!$ is the number of fixed point free involutions in $\mathrm{S}(d)$, i.e. permutations which factor into disjoint cycles of order $2$ (so $(d-1)!!$ is zero if $d$ is odd). We thus have the series

$\frac{\mathcal{Z}}{\mathcal{Z}_0} = 1 + \sum\limits_{d=1}^\infty \frac{u_d}{d!} \hbar^{\frac{d}{2}} (d-1)!! = 1 + \sum\limits_{d=1}^\infty \frac{1}{d!} \left( \sum\limits_{\mu \vdash d} (-1)^{\ell(\mu)}(d-1)!!|C_\mu| \hbar^{\frac{d}{2}-\ell(\mu)} \right)t_\mu,$

where $t_\mu = \prod_{i=1}^{\ell(\mu)} t_{\mu_i}$, and the number $(d-1)!!|C_\mu|$ is equal to the number of pairs $(\alpha,\varphi)$ in which $\alpha \in \mathrm{S}(d)$ is a fixed point free involution, and $\varphi \in C_\mu$ is a permutation in $\mathrm{S}(d)$ of cycle type $\mu$. Such pairs of permutations are special objects, and they have a special name.

Definition 1: A pair of permutations $(\alpha,\varphi) \in \mathrm{S}(d) \times \mathrm{S}(d)$ such that $\alpha$ is a fixed point free involution and all cycles of $\varphi$ have length at least $3$ is called a combinatorial map.

The use of the word “map” in Definition 1 may seem strange, since you are probably more accustomed to seeing it appear in the following definition.

Definition 2: A topological map consists of a closed oriented surface $\mathbf{X}$ together with a set $P$ of distinct points on $\mathbf{X}$ and a set $C$ of curves on $\mathbf{X}$, each of which joins a point of $P$ to a point of $P$ in such a way that the curves in $C$ intersect only at points of $P$, and moreover if $P$ and $C$ are deleted from $\mathbf{X}$ then what remains are disjoint open sets, called “faces,” each of which is homeomorphic to a disc.

Topological maps are basic objects in graph theory: for example, the famous four color theorem asserts that in order to color the faces of a map on the sphere in such a way that faces which are bounded by a common curve have different colors, it is necessary and sufficient to have four colors. Indeed there is a clear connection between graphs and topological maps, in that starting from a graph $\mathrm{G}=(V,E)$ we can try to embed it on a surface $\mathbf{X}$ by choosing a set of points $P$ on $\mathbf{X}$ in bijection with $V$, and a set of curves $C$ on $\mathbf{X}$ in bijection with $E$, such that the requirements to be a topological map are satisfied (this may not be possible if the connectivity of $\mathrm{G}$ is high and the genus of $X$ is low, and the minimal genus of $\mathbf{X}$ such that this can be done is called the genus of $\mathrm{G}$.

The connection between combinatorial maps and topological maps is not quite as obvious, but it is still fairly straightforward. The construction is quite simple and natural. First, draw every cycle of $\varphi$ as a polygon, with the elements permuted by the cycle being taken as labeling the edges of the polygon, and the edges being directed counterclockwise around the polygon. Now, identify the edges of these polygons in pairs according to the cycles of the fixed point free involution $\alpha$, ensuring that when we identify two edges they are glued with opposition orientations, i.e. forming a two-way street. This procedure produces from the pair $(\alpha,\varphi)$ a closed oriented surface $\mathbf{X},$ together with a collection of points on the surface connected by curves meeting only at those points — a topological map. The points $P$ are images of the former vertices of the polygons after having been identified, and its curves are former edges of the polygons after having been identified; the faces of the map are the interiors of the polygons.

As a small example of the above construction, let us consider combinatorial maps of degree $d=4$, i.e. pairs $(\alpha,\varphi) \in \mathrm{S}(4) \times \mathrm{S}(4)$ such that $\alpha = (**)(**)$ is a product of $2$-cyles and $\varphi=(****)$ is a $4$-cycle. There are thus $(4-1)!! = 3 \cdot 1$ possibilities for $\alpha$ and $(4-1)!=3\cdot 2 \cdot 1$ possibilities for $\varphi$, for a total of $18$ combinatorial maps of degree $4$.

We consider those combinatorial maps as above with $\varphi=(1\ 2\ 3\ 4)$, the full forward cycle in $\mathrm{S}(4)$. These are the following:

$A = \left[(1\ 2)(3\ 4), (1\ 2\ 3\ 4) \right],\ B=\left[(1\ 3)(2\ 4), (1\ 2\ 3\ 4) \right],\ C=\left[(1\ 4)(2\ 3), (1\ 2\ 3\ 4) \right].$

Now we depict the corresponding gluing construction as described above:

So, the three different combinatorial maps described above gave us only two different topological maps: the combinatorial maps $A$ and $C$ produced the same topological map. Thus, the correspondence between combinatorial maps and topological maps is many-to-one (as is the correspondence between graphs and topological maps). But the general picture that is emerging is that the asymptotic expansion of $\frac{\mathcal{Z}}{\mathcal{Z}_0}$ is a generating function for topological maps, with face degrees determined by the Maclaurin series of the potential $V$. That is, topological maps are the Feynman diagrams of zero-dimensional scalar-valued quantum field theory. We will continue to develop this connection next week.

## Math 262A: Lecture 10

In this lecture we will actually start down the path to Feynman diagrams; how far down this path we will progress remains to be seen.

Feynman diagrams are combinatorial objects closely associated with an area of theoretical physics called quantum field theory, and so we will begin with a nesessarily brief and sketchy discussion of what QFT is. The truth is that nobody knows what quantum field theory “is” in the sense of a mathematical definition, but I can at least try to give you a sense of the mathematical structures that are involved, and the sorts of computations that physicists perform.

Let $X$ be a manifold, representing a “universe” which we want to understand. We consider functions on $X$ as measurements on this universe, or “observables.” We must say what the outputs of these observations: they may be numbers, or vectors, or operators, or something else. In other words, we want to specify a “target” $Y$ for observations $f \colon X \to Y$, and typically $Y$ is some other manifold. So the space of observables is the function space $\mathrm{Fun}(X,Y)$. For example, one could take $X=\mathbb{R}^4$, a manifold which is supposed to model our physical experience of “spacetime,” and $Y=\mathbb{R}$, meaning that our observables are numerical measurements on spacetime, which sounds reasonable and not at all problematic.

The next step is to define a “field,” which a measure on $\mathrm{Fun}(X,Y)$, the space of all functions $f \colon X \to Y$. Physicists would like to prescribe this measure by giving its density against the background flat Lebesgue measure, and this density is supposed to be of the form

$e^{-\frac{1}{\hbar} S(f)},$

where $\hbar > 0$ is called the “semiclassical parameter” and $S \colon \mathrm{Fun}(X,Y) \to \mathbb{R}$ is a functional called the “action.” The total flatness and uniformity of the Lebesgue measure represents entropy, while the density above represents energy in the sense that it leads to measure concentrating near the minimizers of the action, much as we saw when we initially discussed the Laplace method, and as is familiar from the setting of statistical mechanics.

Once we have prescribed the spaces $X$ and $Y$ and the action $S \colon \mathrm{Fun}(X,Y) \to \mathbb{R}$, we are looking at something which is more or less what physicists call a QFT, and we start asking questions about it. One of the main things that physicists would like to do is to evaluate the integral

$\mathcal{Z} = \int\limits_{\mathrm{Fun}(X,Y)} e^{-\frac{1}{\hbar}S(f)}\mathrm{d}f,$

which is called the “partition function” or “path integral.” It is not expected that this can be done exactly in most cases; when $S=S_0$ is such that an exact evaluation of the corresponding $\mathcal{Z}=\mathcal{Z}_0$ is possible, we are looking at what is called a “free theory.” The goal then is to select a new action $S$ which is a “perturbation” of $S_0$, i.e. $S$ is close to $S_0$ in some sense. One then tries to estimate the new path integral $\mathcal{Z}$ corresponding to $S$ in the limit $\hbar \to 0$. What is supposed to happen is that in this so-called “semiclassical limit,” there emerges an approximation of the form

$\mathcal{Z} \sim \mathcal{Z}_0(1+a_1\hbar + a_2\hbar^2 + a_3\hbar^3+\dots).$

The terms $a_k\hbar^k$ are called “quantum terms,” and the sum of all these terms is typically a divergent series, i.e. it has radius of convergence zero. So what this meant by the above is that there is that $\mathcal{Z}$ admits a complete asymptotic expansion: we have

$\mathcal{Z} = \mathcal{Z}_0(1 + a_1\hbar + \dots +a_k\hbar^k + o(\hbar^k))$

as $\hbar \to 0$ for any nonnegative integer $k$. Now the question that what physicists want to address is how to compute the quantum coefficients $a_k$. The idea is that these coefficients should all be computable in terms of the “correlation functions” of the free theory, meaning integrals of the form

$\int\limits_{\mathrm{Fun}(X,Y)} A(f) e^{-\frac{1}{\hbar}S_0(f)} \mathrm{d}f.$

The computation of these correlation functions is typically very intricate, and Feynman diagrams are a graphical bookkeeping device introduced by Feynman in order to help organize such computations. The theory is very well developed from a physical perspective, and is in regular, constant use in contemporary theoretical physics.

The problem is that none of the above makes sense mathematically: even when one makes very reasonable, realistic choices like $X=\mathbb{R}^4$ and $Y=\mathbb{R},$ the corresponding space of functions $\mathrm{Fun}(X,Y)$ is infinite-dimensional, and hence does not admit a Lebesgue measure. This means that the definition of the field as a measure which is absolutely continuous with respect to Lebesgue measure makes no sense, and the path integral

$\mathcal{Z} = \int\limits_{\mathrm{Fun}(X,Y)} e^{-\frac{1}{\hbar}S(f)}\mathrm{d}f$

is an ill-defined functional integral.

This leaves us with two options. One is to try make the above general construction rigorous by developing some kind of theory of functional integration that makes mathematical sense. This is an extremely interesting undertaking, but not one that we will discuss. The second option is to choose $X,Y$ in such a way that $\mathrm{Fun}(X,Y)$ is finite-dimensional. The only way this can happen is if one of $X,Y$ is there zero-dimensional manifold $\{\bullet\}$ consisting of a single point. Choosing $Y=\{\bullet\}$ is not advisable, since then $\mathrm{Fun}(X,Y)$ is also zero-dimensional, which leads to a trivial theory. However, if we choose $X=\{\bullet\},$ then $\mathrm{Fun}(X,Y) \simeq Y$, and we have something non-trivial, namely a toy model aptly named zero-dimensional quantum field theory.

What we are going to do now is develop the zero-dimensional quantum field theory corresponding to the data

$X=\{\bullet\},$ $Y=\mathbb{R},$ $S=\text{ a smooth function on }\mathbb{R}.$

The path integral corresponding this quantum field theory is

$\mathcal{Z} = \int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar} S(x)} \mathrm{d}x,$

which is non-problematic provided we choose $S$ such that the integral converges. We have even identified previously an action which gives us a free theory: if we choose

$S_0(x) = \frac{x^2}{2},$

the corresponding path integral evaluation is Lord Kelvin’s favorite formula:

$\mathcal{Z}_0 = \sqrt{2\pi \hbar}.$

And the next part of the construction, the semiclassical limit of the path integral corresponding to a more general action $S$, is just the classical Laplace method, which we now state formally.

Theorem 1: Let $[a,b] \subseteq \mathbb{R}$ be a (possibly infinite) interval, and let $S \colon [a,b] \to \mathbb{R}$ be a smooth function that attains a global minimum at a unique point $c \in (a,b)$. Then

$\int\limits_{[a,b]} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x = e^{-\frac{S(c)}{\hbar}} \sqrt{\frac{2\pi\hbar}{S''(c)}}A(\hbar),$

where $A$ is a smooth function on $[0,\infty)$ such that $A(0)=1.$

Theorem 1 tells us that, for any nonnegative integer $k$, we have

$\int\limits_{[a,b]} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x = e^{-\frac{S(c)}{\hbar}} \sqrt{\frac{2\pi\hbar}{S''(c)}}(1+a_1\hbar + \dots +a_k\hbar^k + o(\hbar^k)),$

where the sum in the brackets is just the $k$th Maclaurin polynomial and the error term follows from Taylor’s theorem (it is important to note that $A$ is smooth, but not necessarily analytic, i.e. its Maclaurin series may be divergent, and even if it is convergent it need not sum to $A(\hbar)$). So we are in a position where we have the perfectly well-defined problem of computing the “quantum corrections” $a_k$. This will be our goal in the next several lectures, and we will see that the Feynman diagrams which accompany this problem are classical combinatorial objects, namely maps on surfaces.

## Math 262A: Lecture 3

In this lecture we continue our long march towards Stirling’s formula. Let me begin by saying that, while we seem to be spending an absurd amount of time on this, Stirling’s approximation is being used as a case study in something much more general, Laplace’s method, and we will understand the general thing almost automatically once we understand the specific thing.

$N! = \frac{N^{N+1}}{e^N}\int_{-1}^\infty e^{-N(w-\log(1+w))}\mathrm{d}w.$

In Lecture 2, we performed some elementary estimates which showed that, for all sufficiently small $\varepsilon >0,$ we have

$\int_{-1}^\infty e^{-N(w-\log(1+w))}\mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w + O(e^{-c(\varepsilon)N}),$

where $c(\varepsilon)$ is a positive number depending on $\varepsilon$ but not on $N.$ This quantifies the intuition that the bulk contribution to the integral comes from a small neighborhood of the minimum of the action.

We now come to the problem of estimating this bulk contribution

$\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w.$

The action

$S(w) = w-\log(1+w)$

is holomorphic on the open unit disc $\mathbb{D}_1 = \{w \in \mathbb{C} \colon |w|<1\},$ and its Maclaurin series is

$S(w) = \sum\limits_{d=2}^{\infty} (-1)^d\frac{w^d}{d},$

this series converging absolutely and uniformly on compact subsets of $\mathbb{D}_1$ to the correct value $S(w).$ In Lecture 2, we observed that if we throw away all terms in the series expansion of $S(w)$ beyond the quadratic term, our integral $\int_{-\varepsilon}^{+\varepsilon}$-integral becomes the truncated Gaussian integral

$\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w,$

which can then be replaced by a total Gaussian integral at the cost of introducing an exponentially small error,

$\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w =\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2}\mathrm{d}w + O(e^{-c(\varepsilon)N}),$

this being beneficial because the total Gaussian integral can be exactly evaluated,

$\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2}\mathrm{d}w = \sqrt{\frac{2\pi}{N}}.$

Thus, the remaining question is to quantify the error incurred by replacing

$\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w \quad\text{with}\quad \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w.$

We now want quantify the error incurred by throwing away all terms of the action $S$ beyond its dominant (quadratic) term. For small $w,$ we naturally expect this error to be a small number which decays as $N$ increases, but this decay turns out to be much slower than exponential and represents the main source of error in the approximation scheme we’re building.

To get started, we write

$\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}}\mathrm{d}w.$

Next, we Maclaurin-expand the non-Gaussian part of the integrand, writing

$e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = \sum\limits_{d=0}^\infty m_d(N) \frac{w^d}{d!},$

where $a_d(N)$ is the $d$th derivative of $e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}}$ at $w=0.$ Assuming $\varepsilon < 1,$ this series converges uniformly on $\overline{\mathbb{D}}_\varepsilon = \{w \in \mathbb{C} \colon |w| \leq \varepsilon\};$ indeed, it is the Maclaurin series of the function $S(w)-\frac{w^2}{2},$ and we aren’t changing the location of singularities by subtracting an entire function. We can thus interchange the order of summation and integration, thereby obtaining

$\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w = \sum\limits_{d=0}^\infty \frac{m_d(N)}{d!} \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}w^d \mathrm{d}w.$

We have thus reduced the initial problem to two problems.

Problem 1: Compute the coefficients $m_d(N).$

Problem 2: Compute the integrals $\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}w^d \mathrm{d}w.$

We will solve Problem 1 today, and Problem 2 on Thursday (in case you’d like to think ahead).

Problem 1 is pure algebraic combinatorics, being a question about composition in the algebra of formal power series $\mathbb{C}[[z]].$ This question can be phrased in greater generality, as follows. Let

$E(z) = \sum\limits_{d=0}^\infty e_d \frac{z^d}{d!} \quad\text{ and }\quad C(z) = \sum\limits_{d=1}^\infty c_d \frac{z^d}{d!}$

be elements of $\mathbb{C}[[z]]$ such that the constant term of $C(z)$ is zero. Notice that we are writing these series as if their coefficients were the derivatives of analytic functions $E(z)$ and $C(z)$ at $z=0,$ but we are not actually imposing any convergence conditions on them. Series of this form are often referred to as exponential generating functions. The composition

$E(C(z)) = \sum\limits_{d=0}^\infty e_d \frac{C(z)^d}{d!} = e_0 + e_1\frac{1}{1!}\left(c_1\frac{z^1}{1!} + c_2\frac{z^2}{2!} + \dots \right)^1 + e_2\frac{1}{2!}\left( c_1\frac{z^1}{1!} + c_2\frac{z^2}{2!} + \dots\right)^2 + \dots$

is well-defined. Let $M(z) := E(C(z)),$ and expand

$M(z) = \sum\limits_{d=0}^\infty m_d \frac{z^d}{d!}.$

The problem is to understand the coefficients of $M(z)$ in terms of the coefficients of $E(z)$ and $C(z).$ This problem was posed and solved by Hurwitz in the nineteenth century in connection with what is now a classical problem in enumerative geometry, namely the question of counting branched covers of the Riemann sphere $\mathbf{P}^1(\mathbb{C})$ with specified ramification data (I hope to touch on this problem more extensively later in the course).

Exercise 1: Compute the first few coefficients of $M(z)$ by hand.

Exercise 2: Does it make sense to do this computation using the chain rule? Why or why not? Do you get correct formulas by this method?

Let’s consider first an easier problem, which well serve as a stepping stone. Consider two series

$A(z) = \sum\limits_{d=0}^\infty a_d \frac{z^d}{d!} \quad\text{ and }\quad B(z) = \sum\limits_{d=0}^\infty b_d \frac{z^d}{d!},$

and let

$P(z) = A(z)B(z) = \sum\limits_{d=0}^\infty p_d \frac{z^d}{d!}$

be their product. It is then immediate that

$p_d = \sum\limits_{k=0}^d {d \choose k} a_k b_{n-k}.$

This algebraic expression also has combinatorial meaning, which becomes clear if we sacrifice its elegance and rewrite it in the garish form

$p_d = \sum\limits_{\substack{ (S_A,S_B) \\ S_A,S_B \subseteq [d] \\ S_A \cap S_B = \{\}}} a_{|S_A|}b_{|S_B|}.$

The enumerative meaning of this expression is the following. Suppose that $a_d$ is the number of combinatorial structures of Type A which can be built on $[d]= \{1,\dots,d\},$ and similarly $b_d$ is the number of combinatorial structures of Type B which can be built on $[d].$ Then, $p_d$ is the number of ways to build a hybrid combinatorial structure of Type AB on $[d]$ by choosing disjoint subsets $S_A,S_B \subseteq \{1,\dots,d\}$ and building a type A structure on $S_A$ and a Type B structure on $S_B.$ This can be iterated: the product of $k$ exponential generating functions $A_1(z),A_2(z),\dots,A_k(z)$ is an exponential generating function $P(z)$ whose coefficients $p_d$ count the number of ways to select a list $(S_{1},\dots,S_{k})$ of disjoin subsets of $[d]$ whose union is $[d],$ and then build a Type A_j structure on $S_j$ for each $1 \leq j \leq k.$

Exercise 3: Read this (easy), this (harder), or this (hardest).

We can now state the main theorem on composition of exponential generating functions, $M(z)=E(C(z)).$ A partition of $[d]$ is a set $\pi$ of disjoint nonempty subsets of $[d]$ whose union is $d.$ The elements $B$ of $\pi$ are called its blocks, and we denote by $b(\pi)$ the number of blocks in $\pi.$ Let $\mathrm{P}(d)$ denote the set of all partitions of $[d].$

Theorem (The Compositional Formula): With notation as above, we have

$m_d = \sum\limits_{\pi \in \mathrm{P}(d)} e_{b(\pi)} \prod\limits_{B \in \pi} c_{|B|}.$

Proof: Write

$M(z) = \sum\limits_{k=0}^\infty e_k \frac{1}{k!} C(z)^k,$

use the combinatorial expansion $C(z)^k,$ and reorganize the sum.

— Q.E.D.

Corollary (The Exponential Formula): If $E(z) = e^z,$ then

$m_d = \sum\limits_{\pi \in \mathrm{P}(d)} \prod\limits_{B \in \pi} c_{|B|}.$

The combinatorial meaning of the Exponential Formula is the following: if $C(z)$ is the exponential generating function for a class of “connected” combinatorial structures The Exponential Formula gets used all over the place — probabilists call it the moment-cumulant formula, while physicists often call it the linked cluster expansion and invoke it using colorful phrases like “connected vacuum bubbles exponentiate.” A basic example is the following: let $m_d$ be the number of simple graphs with vertex set $[d],$ let $c_d$ be the number of connected simply graphs with vertex set $[d],$ and form the generating functions

$M(z) = 1 + \sum\limits_{d=1}^\infty m_d\frac{z^d}{d!} \quad\text{ and }\quad C(z) = \sum\limits_{d=1}^\infty c_d \frac{z^d}{d!}.$

Then $M(z) = e^{C(z)},$ and the sequences $(m_d)_{d=1}^\infty$ and $(c_d)_{d=1}^\infty$ are related as in the Exponential Formula.

Now we return to Problem 1 above, which is to compute the coefficients $m_d(N)$ in the expansion

$e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = \sum\limits_{d=0}^\infty m_d(N) \frac{w^d}{d!}.$

We do this using the Exponential Formula: we have

$e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = e^{C(w)},$

where

$C(w) = \sum\limits_{d=3}^\infty (-N)(-1)^d(d-1)! \frac{w^d}{d!}.$

The Exponential Formula now tells us that the coefficients $m_d(N)$ we seek are given by

$m_d(N) = \sum\limits_{\pi \in \mathrm{P}(d)} \prod\limits_{B \in \pi} (-N)(-1)^{|B|} \delta_{|B| \geq 3}(|B|-1)!.$

Thus, we have solved Problem 1: we see that

$m_d(N) = (-1)^d \sum\limits_{k=1}^{d} (-N)^k t_k(d),$

with $t_k(d)$ the number of permutations of $1,\dots,d$ consisting of $k$ cycles, each of length at least $3.$

Exercise 4: Do the same computation instead using the Compositional Formula with $E(z) = e^{-Nz}.$

## Math 31AH: Lecture 19

Let $\mathbf{V}$ be an $n$-dimensional vector space equipped with a scalar product $\langle \cdot,\cdot \rangle.$ Recall from Lecture 16 that an operator $A \in \mathrm{End}\mathbf{V}$ is said to be selfadjoint (or symmetric) if

$\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle \quad \forall\ \mathbf{v},\mathbf{w} \in \mathbf{V}.$

Also recall from Lecture 18 that $A \in \mathrm{End}\mathbf{V}$ is said to be semisimple if there exists a basis of $\mathbf{V}$ consisting of eigenvectors of $A.$ The goal of this lecture is to prove the following cornerstone result in linear algebra.

Theorem 1 (Spectral Theorem for selfadjoint operators): If $A \in \mathrm{End}\mathbf{V}$ is selfadjoint, then it is semisimple.

The proof of this important theorem occupies the remainder of this lecture. It is a constructive argument that builds an eigenbasis for $A$ one vector at a time. A nice feature of the construction is that the eigenbasis it outputs is an orthonormal basis of $\mathbf{V}.$

Let us begin with an important observation on a special subspecies of selfadjoint operators.

Definition 1: A selfadjoint operator $B \in \mathrm{End}\mathbf{V}$ is said to be nonnegative if the associated quadratic form is nonnegative, i.e. if the function defined by

$Q_B(\mathbf{v}) := \langle \mathbf{v},B \mathbf{v} \rangle, \quad v \in \mathbf{V},$

satisfies $Q_B(\mathbf{v}) \geq 0$ for all $\mathbf{v} \in \mathbf{V}.$

Any nonnegative selfadjoint operator $B$ has the property that membership in its kernel is certified by vanishing of $Q_B.$

Lemma 1: If $B \in \mathrm{End}\mathbf{V}$ is a nonnegative selfadjoint operator, then $\mathbf{v} \in \mathrm{Ker} B$ if and only if $Q_B(\mathbf{v})=0.$

Proof: One direction of this equivalence is obvious: if $\mathbf{v} \in \mathrm{Ker}B,$ then

$Q_B(\mathbf{v}) = \langle \mathbf{v},B\mathbf{v}\rangle = \langle \mathbf{v},\mathbf{0} \rangle = 0.$

The proof of the converse statement is similar to the proof of the Cauchy-Schwarz inequality. More precisely, suppose that $Q_B(\mathbf{v})=0,$ and let $t \in \mathbb{R}$ be any number and let $\mathbf{w} \in \mathbf{V}$ be an arbitrary vector. We have

$Q_B(\mathbf{v}+t\mathbf{w}) = \langle \mathbf{v}+t\mathbf{w},B\mathbf{v}+tB\mathbf{w}\rangle \\= \langle \mathbf{v},B\mathbf{v} \rangle + \langle \mathbf{v},tB\mathbf{w} \rangle + \langle t\mathbf{w},B\mathbf{v} \rangle + \langle t\mathbf{w},tB\mathbf{w} \rangle.$

Using the definition of $Q_B$ together with the fact that $B$ is selfadjoint, this simplifies to

$Q_B(\mathbf{v}+t\mathbf{w}) = Q_B(\mathbf{v}) + 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}),$

and since $Q_B(\mathbf{v})=0$ this further simplifies to

$Q_B(\mathbf{v}+t\mathbf{w}) = 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}).$

Now, as a function of $t \in \mathbb{R}$ the righthand side of this equation is a parabola, and since $Q_B(\mathbf{w}) \geq 0$ this parabola is upward=opening. Moreover, since the lefthand side satisfies $Q_B(\mathbf{v}+t\mathbf{w}) \geq 0,$ the lowest point of this parabola cannot lie below the line $t=0,$ and this forces

$\langle B\mathbf{v},\mathbf{w} \rangle = 0.$

But the vector $\mathbf{w}$ was chosen arbitrarily, so the above equation holds for any $\mathbf{w} \in \mathbf{V},$ in particular $\mathbf{w}=B\mathbf{v}.$ We thus have

$\langle B\mathbf{v},B\mathbf{v}\rangle = \|B\mathbf{v}\|^2=0,$

which means that $B\mathbf{v}=\mathbf{0},$ i.e. $\mathbf{v} \in \mathrm{Ker}B.$

— Q.E.D.

Now, let $A \in \mathrm{End}\mathbf{V}$ be any selfadjoint operator. We are going to use the Lemma just established to prove that $A$ admits an eigenvector $\mathbf{e}$; the argument even gives a description of the corresponding eigenvalue $\lambda.$

Consider the unit sphere in the Euclidean space $\mathbf{V},$ i.e. the set

$S(\mathbf{V}) = \{ \mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}$

of all vectors of length $1.$ The quadratic form $Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle$ is a continuous function, and hence by the Extreme Value Theorem the minimum value of $Q_A$ on the sphere,

$\lambda = \min\limits_{\mathbf{v} \in S(\mathbf{V})} Q_A(\mathbf{v}),$

does indeed exist, and is moreover achieved at a vector $\mathbf{e} \in S(\mathbf{V})$ at which the minimum is achieved, i.e.

$Q_A(\mathbf{e})=\lambda.$

Theorem 2: The minimum $\lambda$ of $Q_A$ on the unit sphere is an eigenvalue of $A,$ and the minimizer $\mathbf{e}$ lies in the eigenspace $\mathbf{V}_\lambda.$

Proof: By definition of $\lambda$ as the minimum value of $Q_A,$ we have that

$\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda \quad \forall \mathbf{v} \in S(\mathbf{V}).$

Since $\mathbf{v},\mathbf{v} \rangle =1$ for any $\mathbf{v} \in S_1(\mathbf{V}),$ the above inequality can be rewritten as

$\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in S_1(\mathbf{V}).$

But actually, this implies that

$\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in \mathbf{V},$

since every vector in $\mathbf{V}$ is a nonnegative scalar multiple of a vector of unit length (make sure you understand this). We thus have that

$\langle \mathbf{v},(A-\lambda I)\mathbf{v} \rangle \geq 0 \quad \forall v \in \mathbf{V}.$

This says that the selfadjoint operator $B:= A-\lambda I$ is nonnegative. Moreover, we have that

$Q_B(\mathbf{e}) = \langle \mathbf{e},(A-\lambda I)\mathbf{e} \rangle = Q_A(\mathbf{e})-\lambda \langle \mathbf{e},\mathbf{e}\rangle = \lambda - \lambda = 0.$

Thus, by Lemma 1, we have that $\mathbf{e} \in \mathrm{Ker}(A-\lambda)I,$ meaning that

$(A-\lambda I)\mathbf{e} = \mathbf{0}.$

or equivalently

$A\mathbf{e} = \lambda \mathbf{e}.$

— Q.E.D.

Theorem 2 has established that an arbitrary selfadjoint operator $A$ has an eigenvector. However, this seems to be a long way from Theorem 1, which makes the much stronger assertion that $A$ has $n$ linearly independent eigenvectors. In fact, the distance from Theorem 2 to Theorem 1 is not so long as it may seem. To see why, we need to introduce one more very important concept.

Defintion 2: Let $T\in \mathrm{End}\mathbf{V}$ be a linear operator, and let $\mathbf{W}$ be a subspace of $\mathbf{V}.$ We say that $\mathbf{W}$ is invariant under $T$ if

$T\mathbf{w} \in \mathbf{W} \quad \forall\ \mathbf{w} \in \mathbf{W}.$

The meaning of this definition is that if $\mathbf{W}$ is invariant under $T,$ then $T$ may be considered as a linear operator on the smaller space $\mathbf{W},$ i.e. as an element of the algebra $\mathrm{End}\mathbf{W}.$

Let us adorn the eigenvalue/eigenvector pair produced by Theorem 2 with a subscript, writing this pair as $(\mathbf{e}_1,\lambda_1).$ Consider the orthogonal complement of the line spanned by $\mathbf{e}_1,$ i.e. the subspace of $\mathbf{V}$ given by

$\mathbf{V}_2 = \{ \mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{e}_1 \rangle = 0\}.$

Proposition 1: The subspace $\mathbf{V}_2$ is invariant under $A.$

Proof: We have to prove that if $\mathbf{v}$ is orthogonal to the eigenvector $\mathbf{e}_1$ of $A,$ then so is $A\mathbf{v}.$ This follows easily from the fact that $A$ is selfadjoint:

$\langle A\mathbf{v},\mathbf{e}_1 \rangle = \langle \mathbf{v},A\mathbf{e}_1 \rangle = \langle \mathbf{v},\lambda_1\mathbf{e}_1 \rangle = \lambda_1 \langle \mathbf{v},\mathbf{e}_1 \rangle=0.$

— Q.E.D.

The effect of Proposition 1 is that we may consider $A$ as a selfadjoint operator defined on the $(n-1)$-dimensional subspace $\mathbf{V}_2.$ But this means that we can simply apply Theorem 2 again, with $\mathbf{V}_2$ replacing $\mathbf{V}.$ We will then get a new eigenvector/eigenvalue pair $(\mathbf{e}_2,\lambda_1),$ where

$\lambda_2 = \min\limits_{\mathbf{v} \in S(\mathbf{V}_2} Q_A(\mathbf{v})$

is the minimum value of $Q_A$ on the unit sphere in the Euclidean space $\mathbf{V}_2,$ and $e_2 \in S_(\mathbf{V}_2)$ is a vector at which the minimum is achieved,

$Q_A(\mathbf{e}_2) = \lambda_2.$

By construction, $\mathbf{e}_2$ is a unit vector orthogonal to $\mathbf{e}_1,$ so that in particular $\{\mathbf{e}_1,\mathbf{e}_2\}$ is a linearly independent set in $\mathbf{V}.$ Moreover, we have that $\lambda_1 \leq \lambda_2,$ since $S(\mathbf{V}_2)$ is a subset of $S(\mathbf{V}_1).$

## Math 31AH: Lecture 18

Let $\mathbf{V}$ be a vector space, and let us consider the algebra $\mathrm{End}\mathbf{V}$ as a kind of ecosystem consisting of various life forms of varying complexity. We now move on to the portion of the course which is concerned with the taxonomy of linear operators — their classification and division into various particular classes.

The simplest organisms in the ecosystem $\mathrm{End}\mathbf{V}$ are operators which act by scaling every vector $\mathbf{v} \in \mathbf{V}$ by a fixed number $\lambda \in \mathbb{R}$; these are the single-celled organisms of the operator ecosystem.

Definition 1: An operator $A \in \mathrm{End}\mathbf{V}$ is said to be simple if there exists a scalar $\lambda \in \mathbb{R}$ such that

$A\mathbf{v}=\lambda \mathbf{v} \quad \forall\ \mathbf{v} \in \mathbf{V}.$

— Q.E.D.

Simple operators really are very simple, in the sense that they are no more complicated than numbers. Indeed, Definition 1 is equivalent to saying that $A=\lambda I,$ where $I \in \mathrm{End}\mathbf{V}$ is the identity operator, which plays the role of the number $1$ in the algebra $\mathrm{End}\mathbf{V},$ meaning that it is the multiplicative identity in this algebra. Simple operators are extremely easy to manipulate algebraically: if $A=\lambda I,$ then we have

$A^k = \underbrace{(\lambda I)(\lambda I) \dots (\lambda I)}_{k \text{ factors }} =\lambda^kI,$

for any nonnegative integer $k,$ and more generally if $p(x)$ is any polynomial in a single variable then we have

$p(A) = p(\lambda)I.$

Exercise 1: Prove the above formula.

The formula $A^k=\lambda^kI$ even works in the case that $k$ is a negative integer, provided that $\lambda \neq 0;$ equivalently, the simple operator $A=\lambda I$ is invertible if and only if $\lambda \neq 0,$ its inverse being $A^{-1} = \lambda^{-1}I.$ If $A =\lambda I$ and $B = \mu I$ are simple operators, then they commute,

$AB = (\lambda I)(\mu I)=(\lambda\mu)I = (\mu I)(\lambda I) = BA,$

just like ordinary numbers, and more generally

$p(A,B) = p(A,B)I$

for any polynomial $p(x,y)$ in two variables.

Exercise 2: Prove the above formula.

Another way to appreciate how truly simple simple operators are is to look at their matrices. In order to do this, we have to restrict to the case that the vector space $\mathbf{V}$ is finite-dimensional. If $\mathbf{V}$ is $n$-dimensional, and $E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ is any basis of $\mathbf{V},$ then the matrix of $A=\lambda I$ relative to $E$ is simply

$[A]_E = \begin{bmatrix} \lambda & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda \end{bmatrix},$

where the off-diagonal matrix elements are all equal to zero. For this reason, simple operators are often called diagonal operators.

Most operators in $\mathrm{End}\mathbf{V}$ are not simple operators — they are complicated multicellular organisms. So, to understand them we have to dissect them and look at their organs one at a time. Mathematically, this means that, given an operator $A \in \mathrm{End}\mathbf{V},$ we look for special vectors in $\mathbf{V}$ on which $A$ acts as if it was simple.

Definition 2: A nonzero vector $\mathbf{e} \in \mathbf{V}$ is said to be an eigenvector of an operator $A \in \mathbf{End} \mathbf{V}$ if

$A\mathbf{e} = \lambda \mathbf{e}$

for some $\lambda \in \mathbf{R}.$ The scalar $\lambda$ is said to be an eigenvalue of $A.$

The best case scenario is that we can find a basis of $\mathbf{V}$ entirely made up of eigenvectors of $A.$

Defintion 3: An operator $A \in \mathrm{End} \mathbf{V}$ is said to be semisimple if there exists a basis $E$ of $\mathbf{V}$ consisting of eigenvectors of $A.$ Such a basis is called an eigenbasis for $A.$

As the name suggests, semisimple operators are pretty simple, but not quite as simple as simple operators. In particular, every simple operator is semisimple, because if $A$ is simple then every nonzero vector in $\mathbf{V}$ is an eigenvector of $A,$ and hence any basis in $\mathbf{V}$ is an eigenbasis for $A.$ The converse, however, is not true.

Let $\mathbf{V}$ be an $n$-dimensional vector space, and let $A \in \mathrm{End} \mathbf{V}$ be a semisimple operator. By definition, this means that there exists a basis $E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ in $\mathbf{V}$ consisting of eigenvectors of $A.$ This in turn means that there exist numbers $\lambda_1,\dots,\lambda_n \in \mathbb{R}$ such that

$A\mathbf{e}_i = \lambda_i \mathbf{e}_i \quad \forall\ 1 \leq i \leq n.$

If $\lambda_1=\dots=\lambda_n,$ then $A$ is simple, but if these numbers are not all the same then it is not. However, even if all these numbers are different, the matrix of $A$ relative to $E$ will still be a diagonal matrix, i.e. it will have the form

$[A]_E = \begin{bmatrix} \lambda_1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda_n \end{bmatrix}.$

For this reason, semisimple operators are often called diagonalizable operators. Note the shift in terminology from “diagonal,” for simple, to “diagonalizable,” for semisimple. The former term suggest an immutable characteristic, independent of basis, whereas the latter indicates that some action must be taken, in that a special basis must be found to reveal diagonal form. More precisely, the matrix of a semisimple operator $A$ is not diagonal with respect to an arbitrary basis; the definition only says that the matrix of $A$ is diagonal relative to some basis.

Most linear operators are not semisimple — indeed, there are plenty of operators that have no eigenvectors at all. Consider the operator

$R_\theta \colon \mathbb{R}^2 \to \mathbb{R}^2$

which rotates a vector $\mathbf{v} \in \mathbb{R}^2$ counterclockwise through the angle $\theta \in [0,2\pi).$ The matrix of this operator relative to the standard basis

$\mathbf{e}_1 = (1,0),\ \mathbf{e}_2 = (0,1)$

of $\mathbb{R}^2$ is

$\begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}.$

If $\theta = 0,$ then $R_\theta = I$, so that $R_\theta$ is a simple operator: $\mathbf{e}_1,\mathbf{e}_2$ are eigenvectors, with eigenvalues $\lambda_1=\lambda_2=1.$ If $\theta = \pi,$ then $R_\theta=-I$ and again $R_\theta$ is simple, with the same eigenvectors and eigenvalues $\lambda_1=\lambda_2=-1.$ However, taking any other value of $\theta,$ for example $\theta = \frac{\pi}{2},$ rotation through a right angle, it is geometrically clear that $R_\theta \mathbf{v}$ is never a scalar multiple of $\mathbf{v},$ so that $R_\theta$ has no eigenvectors at all. In particular, it is not semisimple.

Let us now formulate necessary and sufficient conditions for an operator to be semisimple. In this endeavor it is psychologically helpful to reorganize the eigenvector/eigenvalue definition by thinking of eigenvalues as the primary objects, and eigenvectors as secondary objects associated to them.

Defintion 4: The spectrum of an operator $A \in \mathrm{End}\mathbf{V}$ is the set $\sigma(A) \subseteq \mathbb{R}$ defined by

$\sigma(A) = \{ \lambda \in \mathbb{R} \colon \lambda \text{ is an eigenvalues of } A\}.$

For each $\lambda \in \sigma(A),$ the set $\mathbf{V}_\lambda \subseteq \mathbf{V}$ defined by

$\mathbf{V}_\lambda = \{\mathbf{v} \in \mathbf{V} \colon A\mathbf{v} = \lambda \mathbf{v}\}$

is called the $\lambda$eigenspace of $A.$ The dimension of $\mathbf{V}_\lambda$ is called the geometric multiplicity of $\lambda.$

In these terms, saying that $A \in \mathrm{End}\mathbf{V}$ is a simple operator means that the spectrum of $A$ consists of a single number,

$\sigma(A) = \{\lambda\},$

and that the corresponding eigenspace exhausts $\mathbf{V},$

$\mathbf{V}_\lambda = \mathbf{V}.$

At the other extreme, the rotation operator $R_{\pi/2}$ considered above has empty spectrum,

$\sigma(R_{\pi/2}) = \{\},$

and thus does not have any eigenspaces.

Proposition 1: For any $A \in \mathrm{End}\mathbf{V}$, for each $\lambda \in \sigma(A)$ the eigenspace $\mathbf{V}_\lambda$ is a subspace of $\mathbf{V}.$

Proof: First, observe that $\mathbf{0} \in \mathbf{V}_\lambda,$ because

$A\mathbf{0} = \mathbf{0} = \lambda \mathbf{0}.$

Second, $\mathbf{V}_\lambda$ is closed under scalar multiplication: if $\mathbf{v} \in \mathbf{V}_\lambda,$ then

$A(t\mathbf{v}) = tA\mathbf{v} = t\lambda\mathbf(v)=\lambda(t\mathbf{v}).$

Third, $\mathbf{V}_\lambda$ is closed under vector addition: if $\mathbf{v},\mathbf{w} \in \mathbf{V}_\lambda,$ then

$A(\mathbf{v}+\mathbf{w}) = A\mathbf{v}+A\mathbf{w}=\lambda\mathbf{v}+\lambda\mathbf{w}=\lambda(\mathbf{v}+\mathbf{w}).$

— Q.E.D.

So, the eigenspaces of an operator $A \in \mathrm{End}\mathbf{V}$ constitute a collection of subspaces of $\mathbf{V}_\lambda$ of $\mathbf{V}$ indexed by the numbers $\lambda \in \sigma(A).$ A key feature of these subspaces is that they are independent of one another.

Theorem 1: Suppose that $\lambda_1,\dots,\lambda_k$ are distinct eigenvalues of an operator $A \in \mathrm{End}\mathbf{V}.$ Let $\mathbf{e}_1,\dots,\mathbf{e}_k$ be nonzero vectors such that $\mathbf{e}_i \in \mathbf{V}_{\lambda_i}$ for each $1 \leq i\leq k.$ Then $\{\mathbf{e}_1,\dots,\mathbf{e}_k\}$ is a linearly independent set.

Proof: We prove this by induction on $k.$ The base case is $k=1,$ and in this case the assertion is simply that the set $\{\mathbf{e}_{\lambda_1}\}$ consisting of a single eigenvector of $A$ is linearly independent. This is true, since eigenvectors are nonzero by definition.

For the induction step, suppose that $\{\mathbf{e}_1,\dots,\mathbf{e}_k\}$ is a linearly dependent set. Then, there exist numbers $t_1,\dots,t_k \in \mathbb{R},$ not all equal to zero, such that

$\sum_{i=1}^k t_i\mathbf{e}_i = \mathbf{0}.$

Let us suppose that $t_1 \neq 0.$ Applying the operator $A$ to both sides of the above vector equation, we get

$\sum_{i=1}^k t_i\lambda_i\mathbf{e}_i = \mathbf{0}.$

On the other hand, we can multiply the original vector equation by any scalar and it remains true; in particular, we have

$\sum_{i=1}^k t_i\lambda_k\mathbf{e}_i = \mathbf{0}.$

Now, subtracting this third equation from the second equation, we obtain

$\sum_{i=1}^{k-1} t_i(\lambda_i-\lambda_k)\mathbf{e}_i = \mathbf{0}.$

By the induction hypothesis, $\{\mathbf{e}_1,\dots,\mathbf{e}_{k-1}\}$ is a linearly independent set, and hence all the coefficients in this vector equation are zero. In particular, we have

$t_1(\lambda_1-\lambda_k) \neq 0.$

But this is impossible, since $t_1 \neq 0$ and $\lambda_1 \neq \lambda_k.$ Hence, the set $\{\mathbf{e}_1,\dots,\mathbf{e}_k\}$ cannot be linearly dependent — it must be linearly independent.

— Q.E.D.

Restricting to the case that $\mathbf{V}$ is finite-dimensional, $\dim \mathbf{V}=n,$ Theorem 1 has the following crucial consequences.

Corollary 1: $A \in \mathrm{End}\mathbf{V}$ is semisimple if and only if

$\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.$

Proof: Suppose first that $A$ is semisimple. By definition, this means that the span of the eigenspaces of $A$ is all of $\mathbf{V},$

$\mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \mathbf{V}.$

Thus

$\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \dim \mathbf{V}.$

By Theorem 1, we have

$\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda =\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda,$

and hence

$\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.$

Conversely, suppose that the sum of the dimensions of the eigenspaces of $A$ is equal to the dimension of $\mathbf{V}.$ For each $\lambda \in \sigma(A),$ let $E_\lambda$ be a basis of the eigenspace $\mathbf{V}_\lambda.$ Then, by Theorem 1, the set

$E = \bigcup\limits_{\lambda \in \sigma(A)} E_\lambda$

is a linearly independent set, and hence a basis of the subspace $\mathrm{Span}(E)$ of $\mathbf{V}.$ Thus

$\dim \mathrm{Span}(E) = \sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda.$

Since by hypothesis we have

$\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V},$

this implies that

$\dim \mathrm{Span}(E) =\dim \mathbf{V},$

which in turn implies that

$\mathrm{Span}(E) =\mathbf{V}.$

Thus $E$ is a basis of $\mathbf{V}$ consisting of eigenvectors of $A,$ whence $A$ is semisimple.

— Q.E.D.

Corollay 3: If $|\sigma(A)| = \dim \mathbf{V},$ then $A$ is semisimple.

Proof: To say that $|\sigma(A)|=\dim \mathbf{V}$ is equivalent to saying that the spectrum of $A$ consists of $n=\dim \mathbf{V}$ distinct numbers,

$\sigma(A) = \{\lambda_1,\dots,\lambda_n\}.$

Sampling a collection of nonzero vectors from each corresponding eigenspace,

$e_i \in \mathbf{V}_{\lambda_i}, \quad 1 \leq i \leq n,$

we get a set $E= \{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ of eigenvectors of $A.$ By Theorem 1, $E$ is a linearly independent set, hence it is a basis of $\mathbf{V}.$

— Q.E.D.

## Math 31AH: Lecture 14

In Lecture 13, we discussed matrix representations of linear transformations between finite-dimensional vector spaces. In this lecture, we consider linear transformations between finite-dimensional Euclidean spaces, and discuss the relationship between the scalar product and the matrix representation of linear transformations. Note that any vector space $\mathbf{V}$ can be promoted to a Euclidean space $(\mathbf{V},\langle \cdot,\cdot \rangle)$ by choosing a basis $E$ in $\mathbf{V}$ and defining $\langle \cdot,\cdot \rangle$ to be the unique scalar product on $\mathbf{V}$ such that $E$ is orthonormal.

Let $\mathbf{V}$ and $\mathbf{W}$ be Euclidean spaces; by abuse of notation, we will denote the scalar product in each of these spaces by the same symbol $\langle \cdot,\cdot \rangle$. Let $E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ be an orthonormal basis in $\mathbf{V},$ and let $F=\{\mathbf{f}_1,\dots,\mathbf{f}_m\}$ be an orthonormal basis in $\mathbf{W}.$ Let $A \in \mathrm{Hom}(\mathbf{V},\mathbf{W})$ be a linear transformation.

Definition 1: The matrix elements of $A$ relative to the bases $E$ and $F$ are the scalar products

$\langle \mathbf{f}_i, A\mathbf{e}_j \rangle, \quad 1 \leq i \leq m,\ 1 \leq j \leq n.$

The reason the number $\langle \mathbf{f}_i,A\mathbf{e}_j \rangle$ is called a “matrix element” of $A$ is that this number is exactly the $(i,j)$-element of the matrix $[A]_{E,F}$ of defined in Lecture 13. Indeed, if

$A\mathbf{e}_j = \sum_{k=1}^m a_{kj} \mathbf{f}_k,$

then

$\langle \mathbf{f}_i,A\mathbf{e}_j \rangle = \left\langle \mathbf{f}_i,\sum_{k=1}^m a_{kj} \mathbf{f}_k\right\rangle = \sum_{k=1}^m a_{kj} \langle \mathbf{f}_i,\mathbf{f}_k \rangle = a_{ij},$

where the last equality follows from the orthonormality of $F.$ However, one can note that it is not actually necessary to assume that $\mathbf{V}$ and $\mathbf{W}$ are finite-dimensional in order for the matrix elements of $A$ to be well-defined. However, we will always make this assumption, and thus in more visual form, we have that

$[A]_{E,F} = \begin{bmatrix} {} & \vdots & {} \\ \dots & \langle \mathbf{f}_i,A\mathbf{e}_j \rangle & \dots \\ {} & \vdots & {} \end{bmatrix}_{1 \leq i \leq m, 1 \leq j \leq n}.$

The connection between matrices and scalar products is often very useful for performing computations which would be much more annoying without the use of scalar products. A good example is change of basis for linear operators. The setup here is that $\mathbf{V}=\mathbf{W},$ so that $m=n$ and $E,F$ are two (possibly) different orthonormal bases of the same Euclidean space. Given an operator $A \in \mathrm{End}\mathbf{V},$ we would like to understand the relationship between the two $n \times n$ matrices

$[A]_E \quad\text{ and }\quad [A]_F$

which represent the operator $A$ relative to the bases $E$ and $F,$ respectively. In order to do this, let us consider the linear operator $U \in \mathrm{End}\mathbf{V}$ uniquely defined by the $n$ equations

$U\mathbf{e}_i = \mathbf{f}_i, \quad 1 \leq i \leq n.$

Why do these $n$ equations uniquely determine $U$? Because, for any $\mathbf{v} \in \mathbf{V},$ we have

$U\mathbf{v} = U\sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle U\mathbf{e}_i = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{f}_i.$

Let us observe that the operator $U$ we have defined is an automorphism of $\mathbf{V},$ i.e. it has an inverse. Indeed, it is clear that the linear operator $U^{-1}$ uniquely determined by the $n$ equations

$U^{-1}\mathbf{f}_i=\mathbf{e}_i, \quad 1 \leq i \leq n$

is the inverse of $U.$ Operators which transform orthonormal bases into orthonormal bases have a special name.

Definition 2: An operator $U \in \mathrm{End}\mathbf{V}$ is said to be an orthogonal operator if it preserves orthonormal bases: for any orthonormal basis $\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ in $\mathbf{V},$ the set $\{U\mathbf{e}_1,\dots,U\mathbf{e}_n\}$ is again an orthonormal basis in $\mathbf{V}$.

Note that every orthogonal operator is invertible, since we can always define $U^{-1}$ just as we did above. In particular, the operators $U,U^{-1}$ we defined above by $U\mathbf{e}_i=\mathbf{f}_i,$ $U^{-1}\mathbf{f}_i=\mathbf{e}_i$ are orthogonal operators.

Proposition 1: An operator $U \in \mathrm{End}\mathbf{V}$ is orthogonal if and only if

$\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.$

Proof: Observe that, by linearity of $U$ and bilinearity of $\langle \cdot,\cdot \rangle,$ it is sufficient to prove the claim in the case that $\mathbf{v}=\mathbf{e}_i$ and $\mathbf{w}=\mathbf{e}_j$ for some $1 \leq i,j \leq n,$ where $\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ is an orthonormal basis of $\mathbf{V}.$

Suppose that $U$ is an orthogonal operator. Let $\mathbf{f}_i=U\mathbf{e}_i,$ $1 \leq i \leq n.$ Then $\{\mathbf{f}_1,\dots,\mathbf{f}_n\}$ is an orthonormal basis of $\mathbf{V},$ and consequently we have

$\langle U\mathbf{e}_i,U\mathbf{e}_j \rangle = \langle \mathbf{f}_i,\mathbf{f}_j \rangle = \delta_{ij} = \langle \mathbf{e}_i,\mathbf{e}_j \rangle.$

Conversely, suppose that

$\langle U\mathbf{e}_i,U\mathbf{e}_j \rangle = \langle \mathbf{e}_i,\mathbf{e}_j \rangle.$

We then have that $\langle \mathbf{f}_i,\mathbf{f}_j \rangle = \delta_{ij},$ so that $\{\mathbf{f}_1,\dots,\mathbf{f}_n\}$ is an orthonormal basis of $\mathbf{V},$ and thus $U$ is an orthogonal operator.

— Q.E.D.

Proposition 2: An operator $U \in \mathrm{End}\mathbf{V}$ is orthogonal if and only if it is invertible and

$\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.$

Proof: Suppose first that $U$ is orthogonal. Then, $U$ is invertible and $U^{-1}$ is also orthonal, and hence for any $\mathbf{v},\mathbf{w} \in \mathbf{W},$ we have

$\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},U^{-1}U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle.$

Conversely, suppose that $U$ is invertible and

$\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.$

Then, for any $\mathbf{v},\mathbf{w} \in \mathbf{V},$ we have

$\langle U\mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}U\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle,$

whence $U$ is orthogonal by Proposition 1.

— Q.E.D.

Now let us return to the problem that we were working on prior to our digression into the generalities of orthogonal operators, namely that of computing the relationship between the matrices $[A]_E,[A]_F$. We have

$\langle \mathbf{f}_i,A\mathbf{f}_j \rangle = \langle U\mathbf{e}_i,AU\mathbf{e}_j \rangle = \langle \mathbf{e}_i,U^{-1}AU\mathbf{e}_j \rangle, \quad \forall 1 \leq i, j \leq n,$

where we used Proposition 2 to obtain the second equality. Thus, we have the matrix equation

$[A]_F = [UAU^{-1}]_E = [U]_E [A]_E [U^{-1}]_E = [U]_E [A]_E [U]_E^{-1}$

where on the right hand side we are using the fact that

$[\cdot]_E \colon \mathrm{End}\mathrm{V} \to \mathbb{R}^{n\times n}$

is an algebra isomorphism, as in Lecture 13, which means that the matrix representing a product of operators is the product of the matrices representing each operator individually. This relationship between is usually phrased as the statement that the matrix $[A]_F$ representing the operator $A$ in the “new” basis $F$ is obtained from the matrix $[A]_E$ representing $A$ in the “old” basis $F$ by “conjugating” it by the of the matrix $[A]_E$ by the matrix $[U]_E,$ where $U$ is the orthogonal operator that transforms the old basis into the new basis.

## Math 31AH: Lecture 11

We met linear transformations for the first time back in Lecture 2, and have encountered them a handful of times in subsequent lectures. In this lecture, we begin a systematic study of linear transformation that will take up the rest of the course. As with the study of any new species, there will be a lot of classifying: we will divide up linear transformations into various classes defined by certain common features which to a large extent determine their behavior. Before classifying linear transformations into various special species and families, let us consider the ecosystem of all linear transformations as a collective whole.

Let $\mathbf{V}$ and $\mathbf{W}$ be vector spaces, and let $\mathrm{Hom}(\mathbf{V},\to\mathbf{W})$ be the set of all linear transformations $T \colon \mathbf{V} \to \mathbf{W}.$ This seemingly strange notation stems from the fact that is a fancy alternative name for linear transformations is homomorphisms, which roughly means
“same shape” in Greek.

Theorem 1: $\mathrm{Hom}(\mathbf{V},\mathbf{W})$ is a vector space.

Proof: In order to promote $\mathrm{Hom}(\mathbf{V},\mathbf{W})$ from simply a set to a vector space, we need to give a rule for adding and scaling linear transformations. These rules are simply and natural. If $T_1,T_2 \in \mathrm{Hom}(\mathbf{V},\mathbf{W})$ linear transformations and $a_1,a_2 \in \mathbb{R}$ are scalars we define a new linear transformation $a_1T_1 + a_2T_2$ by

$[a_1T_1+a_2T_2]\mathbf{v} := a_1T_1\mathbf{v} + a_2T_2\mathbf{v} \quad \forall \mathbf{v} \in \mathbf{V}.$

One must check first that the function from $\mathbf{V}$ to $\mathbf{W}$ so defined satisfies the linear transformation axioms, and then that the set $\mathrm{Hom}(\mathbf{V},\mathbf{W})$ equipped with these notions of addition and scalar multiplication satisfies the vector spaces axioms. This is left as an exercise to the reader (it is a long but easy exercise which you should do at least once).

— Q.E.D.

Now let us start asking questions about the vectors in $\mathrm{Hom}(\mathbf{V},\mathbf{W}),$ i.e. considering what properties a linear transformation $T$ from $\mathbf{V}$ to $\mathbf{W}$ may or may not have. Given $T \in \mathrm{Hom}(\mathbf{V},\mathbf{W}),$ one of the first things we would like to know is whether it is injective and/or surjective. These questions reduce to questions about certain subspaces of $\mathbf{V}$ and $\mathbf{W}$ associated to $T.$

Definition 1: The kernel of $T$ is the subset of $\mathbf{V}$ defined by

$\mathrm{Ker}T = \{\mathbf{v} \in \mathbf{V} \colon T(\mathbf{v}) = \mathbf{0}_\mathbf{W}\},$

and the image of $T$ is the subset of $\mathbf{W}$ defined by

$\mathrm{Im}T = \{ \mathbf{w} \in \mathbf{W} \colon \exists\ \mathbf{v} \in \mathbf{V} \text{ such that } T\mathbf{v} = \mathbf{w}\}.$

You can think of the kernel and image of $T$ as the solution sets to two different equations involving $T.$ The kernel in terms of certain vector equations associated to $T.$ First, the kernel of $T$ is the set of solutions to the equation

$T\mathbf{v}=\mathbf{0}_\mathbf{W},$

where $\mathbf{v}$ is an unknown vector in $\mathbf{V}.$ Second, the image of $T$ is the set of all $\mathbf{w} \in \mathbf{W}$ such that the equation

$T\mathbf{v}=\mathbf{w}$

has a solution.

Proposition 1: The kernel of $T$ is a subspace of $\mathbf{V}$, and the image of $T$ is a subspace of $\mathbf{W}.$

Proof: Since $T$ is a linear transformation, it is by definition the case that $T\mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{W},$ so that $\mathbf{0}_\mathbf{V} \in \mathrm{Ker}T.$ Now let $\mathbf{v}_1,\mathbf{v}_2 \in \mathrm{Ker}T$ be any two vectors in the kernel of $T,$ and let $a_1,a_2 \in \mathbb{R}$ be any two scalars. We then have

$T(a_1\mathbf{v}_1 + a_2\mathbf{v}_2) = a_1T\mathbf{v}_1+a_2T\mathbf{v}_2=a_1\mathbf{0}_\mathbf{W} + a_2\mathbf{0}_\mathbf{W} = \mathbf{0}_\mathbf{W},$

which shows that $\mathrm{Ker}T$ is closed under taking linear combinations. Hence, $\mathrm{Ker}T$ is a subspace of $\mathbf{V}.$

Now consider the image of $T.$ Since $T$ is a linear transformation, it is by definition the case that $T\mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{W},$ so that $\mathbf{0}_\mathbf{W} \in \mathrm{Im}T.$ Now let $\mathbf{w}_1,\mathbf{w}_2 \in \mathrm{Im}T$ be any two vectors in the image of $T.$ Then, by definition of $\mathrm{Im}T,$ there exist vectors $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}$ such that $T\mathbf{v}_1=\mathbf{w}_1,\ T\mathbf{v}_2=\mathbf{w}_2.$ So, for any scalars $a_1,a_2 \in \mathbb{R},$ we have that

$a_1\mathbf{w}_1+a_2\mathbf{w}_2 = a_1T\mathbf{v}_1+a_2T\mathbf{v}_2 = T(a_1\mathbf{v}_1+a_2\mathbf{v}_2),$

which shows that $\mathrm{Im}T$ is closed under taking linear combinations.

— Q.E.D.

Proposition 2: The linear transformation $T$ is injective if and only if $\mathrm{Ker}T = \{\mathbf{0}_\mathbf{V}\},$ and surjective if and only if $\mathrm{Im}T = \mathbf{W}.$

Proof: First we consider the relationship between the injectivity of $T$ and its kernel. Suppose that $\mathrm{Ker}T = \{\mathbf{0}_\mathbf{V}\}.$ Let $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}$ be vectors such that $T\mathbf{v}_1=T\mathbf{v}_2.$ This is equivalent to

$T\mathbf{v}_1-T\mathbf{v}_2=T(\mathbf{v}_1-\mathbf{v}_2)=\mathbf{0}_\mathbf{W},$

which says that $\mathbf{v}_1-\mathbf{v}_2 \in \mathrm{Ker}T.$ Since the only vector in $\mathrm{Ker}T$ is $\mathbf{0}_\mathbf{V},$ this forces $\mathbf{v}_1-\mathbf{v}_2=\mathbf{0}_\mathbf{V},$ which means that $\mathbf{v}_1=\mathbf{v}_2,$ and we conclude that $T$ is injective. Conversely, suppose we know that $T$ is injective. Let $\mathbf{v}_1,\mathbf{v}_2 \in \mathrm{Ker}T$ be any two vectors in the kernel of $T.$ Then $T\mathbf{v}_1=T\mathbf{v}_2=\mathbf{0}_\mathbf{W}.$ Since $T$ is injective, this forces $\mathbf{v}_1=\mathbf{v}_2,$ which says that any two vectors in $\mathrm{Ker}T$ are equal to one another. But this means that any vector in $\mathrm{Ker}T$ is equal to $\mathbf{0}_\mathbf{V},$ and we conclude that $\mathrm{Ker}T=\{\mathbf{0}_\mathbf{V}\}.$

Now let us consider the relationship between the surjectivity of $T$ and its image. Suppose that $\mathrm{Im}T=\mathbf{W}.$ This is exactly what it means for the function $T$ to be surjective: every element in the codomain of $T$ is in fact in the range of $T.$ Conversely, suppose we know that $T$ is surjective. By definition of surjectivity, this means that $\mathrm{Im}T=\mathbf{W}.$

— Q.E.D.

We can use the above propositions to gain some insight into linear equations, meaning equations of the form

$T \mathbf{v}=\mathbf{w},$

where $T$ is a given linear transformation from $\mathbf{V}$ to $\mathbf{W},$ $\mathbf{w}$ is a given vector in $\mathbf{W},$ and $\mathbf{v}$ is an unknown vector in $\mathbf{V}.$

Theorem 1: The solution set of the above linear equation is $\mathbf{v}_0+\mathrm{Ker}T,$ i.e. the set of of all vectors in $\mathbf{V}$ of the form $\mathbf{v}_0 + \mathbf{k},$ where $\mathbf{v}_0$ is any particular vector such that $T\mathbf{v}_0=\mathbf{w},$ and $\mathbf{k} \in \mathrm{Ker}T.$

Proof: We begin by first consider the homogenous equation associated to the linear equation we want to solve, which is

$T\mathbf{v}=\mathbf{0}_\mathbf{W}.$

By definition, the solution set of this homogeneous equation is $\mathrm{Ker}T.$ Now, if $\mathbf{v}_0 \in \mathbf{V}$ is any vector such that $T\mathbf{v}_0 = \mathbf{w},$ then we also have

$T(\mathbf{v}_0+\mathbf{k})= T\mathbf{v}_0+T\mathbf{k}=\mathbf{w}+\mathbf{0}_\mathbf{W} = \mathbf{w},$

which shows that all vectors in $\mathbf{v}_0+\mathrm{Ker}T$ are solutions of the equation $T\mathbf{v}=\mathbf{w}.$

Conversely, suppose that $\mathbf{v}_1$ is a vector in $\mathbf{V}$ such that $T\mathbf{v}_1=\mathbf{w}.$ Then, we have

$T(\mathbf{v}_1-\mathbf{v}_0) = T\mathbf{v}_1-T\mathbf{v}_0=\mathbf{w}-\mathbf{w}=\mathbf{0}_\mathbf{W},$

which shows that $\mathbf{v}_1-\mathbf{v}_0$ is a vector in $\mathrm{Ker}T.$ That is, $\mathbf{v}_1-\mathbf{v}_0=\mathbf{k}$ for some $\mathbf{k} \in \mathrm{Ker}T,$ or equivalently $\mathbf{v}_1=\mathbf{v}_0+\mathbf{k}.$ This shows that all solutions of $T\mathbf{v}=\mathbf{W}$ belong to the set $\mathbf{v}_0 + \mathrm{Ker}T.$

— Q.E.D.

We can now make the following statements about solutions of the linear equation $T\mathbf{v}=\mathbf{w}$:

• The equation has a solution if and only if $\mathbf{w} \in \mathrm{Im}T$.
• If the equation has a solution, this solution is unique if and only if $\mathrm{Ker}T = \{\mathbf{0}_\mathbf{V}\}.$
• If the equation has a solution but the solution is not unique, then the equation has infinitely many solutions.

Exercise 1: Arrange the above information into a tree diagram which shows the relationship between the various cases.

In Lecture 12, we will think in more detail about the structure of the vector space $\mathrm{Hom}(\mathbf{V},\mathbf{W})$ in the case that $\mathbf{V}$ and $\mathbf{W}$ are finite-dimensional. In the finite-dimensional case, we shall see that everything we might want to know can be expressed in the language of matrices.