Math 202A: Lecture 6

Last lecture, we gave a definition of what it means for a Hilbert space to be finite-dimensional. Let \mathbf{FVec} be the category whose objects are finite-dimensional Hilbert spaces V and whose morphisms are linear transformations. Note that we do not require maps in \mathrm{Hom}(V,W) to interact with the scalar products on V and W in any particular way. This means that even though the objects in \mathbf{FVec} are both algebraic and geometric in nature, as described in Lecture 4, the morphisms of \mathbf{FVec} are only required to respect their algebraic structure.

Of course, we are free to consider special linear transformations in \mathrm{Hom}(V,W) which do interact with the geometry of the source and target spaces. In particular, the following definition is quite important.

Definition 6.1. A transformation A \in \mathrm{Hom}(V,W) is said to be an isometry if \|Av\|=\|v\| for all v \in V.

Thus an isometry is a linear transformation which preserves norms and hence distances.

Theorem 6.2. A transformation A \in \mathrm{Hom}(V,W) is an isometry if and only if \langle Av_1,Av_2\rangle = \langle v_1,v_2\rangle for all v_1,v_2 \in V.

Proof: It is clear that if A preserves scalar products it preserves norms. That the converse also holds follows from the fact that norms uniquely determine scalar products via polarization. \square

Proposition 6.3. If A \in \mathrm{Hom}(V,W) is an isometry then it is injective.

Proof: Suppose v_1,v_2 \in V are such that Av_1=Av_2. Then, A(v_1-v_2) = 0_w and consequently \|A(v_1-v_2)\|=\|v_1-v_2\|=0. \square

For any A \in \mathrm{Hom}(V,W) we define the kernel of A by

\mathrm{Ker}(A) = \{v \in V \colon Av = 0_W\}.

Certainly you already know that \mathrm{Ker}(A) is a subspace of V. We will discuss subspaces in a systematic way soon. For now, let us simply note that in the course of the preceding argument we observed the following.

Proposition 6.4. A transformation A \in \mathrm{Hom}(V,W) is injective if and only if \mathrm{Ker}(A) = \{0_W\}.

Now let us discuss isomorphism in the category \mathbf{FVec}. The general categorical definition is that two Hilbert spaces V and W are isomorphic if and only if there exists transformations A \in \mathrm{Hom}(V,W) and B \in \mathrm{Hom}(W,V) such that

B \circ A = I_V \quad\text{and}\quad A \circ B=I_W,

where I_V \in \mathrm{End}(V) and I_W \in \mathrm{End}(W) are the identity operators on V and W, respectively. We showed already in a previous homework problem that A \in \mathrm{Hom}(V,W) is an isomorphism if and only if it is bijective. Here is a restatement of this as an extremal property of linear transformations.

Proposition 6.5. A transformation A \in \mathrm{Hom}(V,W) is an isomorphism if and only if its kernel is minimal and its image is maximal.

Now we come to the fact that dimension completely characterizes isomorphism in \mathbf{FVec}.

Theorem 6.6. Two Hilbert spaces V and W are isomorphic if and only if they have the same dimension.

Proof: Suppose first that V and W are two Hilbert spaces of the same dimension, and let X \subset V and Y \subset W be two orthonormal bases. Then |X|=|Y|. Let f \colon X \to Y be any bijective function (i.e. set isomorphism) and define a corresponding linear transformation

A_f \colon V \longrightarrow W

by

A_f(v) = \sum\limits_{x \in X} \langle x,v\rangle f(x).

Since f is a bijection from X to Y, we also have

A_f(v) = \sum\limits_{y \in Y} \langle f^{-1}(y),v\rangle y,

so that A_f(v_1) = A_f(v_2) if and only if v_1,v_2 \in V have the same coordinates relative to X, in which case they are the same vector. Next, let w \in W be an arbitrary vector and let

w = \sum\limits_{y \in Y} \langle y,v \rangle y

be its coordinate expansion relative to the orthonormal basis Y. Then, the vector v \in V defined by

v = \sum\limits_{x \in X} \langle y,v\rangle f^{-1}(x)

satisfies A_f(v)=w, whence A_f is surjective as well as injective.

Conversely, suppose that A \colon V \to W and B \colon W \to V are linear transformations such that AB=I_W and BA=I_V, where I_V is the identity transformation of V and I_W is the identity transformation of W. Let X \subset V be a basis of V and consider the |X|=\dim V vectors in W defined by

Ax, \quad x \in X

These vectors form the image of X under A in V, and we claim they are linearly independent. Indeed, suppose that

\sum\limits_{x \in X} \alpha_x Ax = 0_W.

Then, applying B to each side of this equation we get

\sum\limits_{x \in X} \alpha_x x=0_V,

whence \alpha_x=0 for each x \in X. Thus Y=\{Ax \colon x \in X\} is a linearly independent set in W of cardinality |X|, so that \dim W \geq |X|=\dim V. Applying exactly the same argument with B in place of A we get that \dim V \geq \dim W. We thus conclude \dim V=\dim W.

-QED

Having discussed isomorphisms in \mathbf{FVec}, let us now consider isomorphisms which preserve geometric structure.

Definition 6.7. A transformation U \in \mathrm{Hom}(V,W) is called an isometric isomorphism if it is both an isometry and an isomorphism.

Isometric isomorphisms in \mathbf{FVec} are also referred to as unitary transformations. Since every isometry is injective we have the following.

Proposition 6.8. An isometry A \colon V \to W is an isomorphism if and only if it is surjective.

Having discussed isomorphisms of Hilbert spaces, we move on to automorphisms. As we have already seen, the automorphisms of any object in a locally small category form a group under composition. The automorphisms group \mathrm{Aut}(V) of a Hilbert space is usually called the general linear group of V and denoted \mathrm{GL}(V). Thus, automorphisms A \in \mathrm{GL}(V) are linear bijections A \colon V \to V.

Proposition 6.9. A transformation A \in \mathrm{End}(V) is bijective if and only if it is injective.

Proof: One direction is obvious: bijections are injective. Conversely, suppose that A \colon V \to V is an injective linear operator and let X \subset V be a basis. Then, A(X) = \{Ax \colon x \in X\} is also a basis of V. Indeed, injectivity insures that |A(X)|=|X|, and if \alpha_x, x \in X are scalars such that

\sum\limits_{x \in X} \alpha_x Ax = 0_V

then by linearity we have

A \sum\limits_{x \in X} \alpha_x x=0_V.

Since A has minimal kernel this forces

\sum\limits_{x \in X} \alpha_x x = 0_V

which in turn implies \alpha_x=0 for all x \in X because X is a linearly independent set. \square

Automorphisms of V are not required to preserve scalar products, but nevertheless there is a characterization of automorphisms which does involve the scalar product.

Theorem 6.10. An operator A \in \mathrm{End}(V) is an automorphism if and only if the function

\langle \cdot,\cdot \rangle_A \colon V \times V \longrightarrow \mathbb{C}

defined by \langle v,w \rangle_A = \langle Av,Aw\rangle is a scalar product on V.

Proof: Suppose first that \langle \cdot,\cdot\rangle_A is a scalar product. Then \langle v,v \rangle_A = 0 if and only if v=0_V. Consequently, \|Av\|=0 if and only if v = 0_V, hence A is injective and thus also bijective by Proposition 6.9.

Now suppose A \in \mathrm{GL}(V). We have to check that \langle v,w\rangle_A is consistent with the scalar product axioms. First, observe that the set of vectors v \in V such that \langle v,v\rangle_A = \langle Av,Av\rangle=0 is exactly the kernel of A, which is minimal by hypothesis. Hermitian symmetry is clear,

\langle w,v\rangle_A = \langle Aw,Av\rangle = \overline{\langle v,w\rangle_A}.

Linearity in the second slot follows from the fact that A is linear and \langle \cdot,\cdot \rangle is a scalar product. \square

Having discussed automorphisms we move on to isometric automorphisms, i.e. unitary operators. Let \mathrm{U}(V) be set of operators Y \in \mathrm{End}(V) such that \langle Uv,Uw \rangle = \langle v,w\rangle for all v,w \in V.

Proposition 6.11. The set \mathrm{U}(V) is a subgroup of \mathrm{GL}(V).

The proof of Proposition 6.11 is straightforward and left as an exercise. You can also check for yourself that \mathrm{U}(V) is not a normal subgroup of \mathrm{GL}(V).

Math 202A: Lecture 5

In Lecture 4 we covered the basic algebraic and geometric aspects of Hilbert space and Euclidean space. In this lecture we cover Descartes’ idea to encode points in such spaces as finite lists of numbers — the method of coordinates. A straightforward implementation of this idea only works when finitely many coordinates suffice, so we first need to introduce the notion of dimension, which is done using the concept of linear independence. This material will be mostly familiar from prior exposure to linear algebra, so several proofs are omitted (though I may fill them in later).

Let V be a vector space.

Definition 5.1. A finite set X \subset V is said to be linearly independent if the identity

\sum\limits_{x \in X} \alpha_xx=0_V

holds precisely when \alpha_x = 0 for each x \in X.

Remark that Definition 5.1 applies in the case where X is the empty set, which is a finite set of cardinality zero.

Definition 5.2. We say that V is finite-dimensional if there exists a nonnegative integer n such that that V contains a linearly independent set of cardinality n, but does not contain a linearly independent set of cardinality n+1. In this case the number n is called the dimension of V.

For example, the zero space V=\{0_V\} is a finite-dimensional vector space of dimension zero.

Now let V be a finite-dimensional vector space, fixed for the remainder of the lecture. Let X \subset V be a linearly independent set such that |X|=\dim V.

Theorem 5.3. (Coordinate Theorem) For every vector v \in V, there are scalars \alpha_x such that

v=\sum\limits_{x \in X} \alpha_x x.

Moreover, if also

v=\sum\limits_{x \in X} \beta_xx,

then \beta_x = \alpha_x for each x \in X.

A set X \subset V as in Theorem 5.3 is said to be a basis of V.

Now suppose V is equipped with a scalar product, i.e. it is either a Hilbert space (scalar field \mathbb{C}) or a Euclidean space (scalar field \mathbb{R}).

Theorem 5.4. (Gram-Schmidt) There exists a linearly independent set X \subset V of cardinality |X|=\dim V such that each x,y \in X satisfy

\langle x,y \rangle = \begin{cases} 1, \text{ if } x=y \\ 0, \text{ if }x \neq y \end{cases}.

A set X \subset V as in Theorem 5.4 is said to be an orthonormal basis of V.

Theorem 5.5. (Cartesian Coordinates) Let X \subset V be an orthonormal basis. Then, for ever v \in V we have

v = \sum\limits_{x \in X} \langle x,v \rangle x.

Proof: By the coordinate theorem, we have

v = \sum\limits_{y \in X} \alpha_yy,

and therefore

\langle x,v \rangle =\sum\limits_{y \in X} \alpha_y\langle x,y\rangle=\alpha_x.

-QED

The |X|=\dim V scalar products \langle x,v\rangle are called the Cartesian coordinates of v \in V relative to the orthonormal basis X.

Now suppose we choose an ordering (aka labeling, aka enumeration) x_1,\dots,x_n of X. Then, X is called an ordered orthonormal basis of V and the Cartesian coordinates of any vector v \in V relative to X can be stored as an ordered list,

[v] = \begin{bmatrix} \langle x_1,v \rangle \\ \vdots \\ \langle x_n,v\rangle \end{bmatrix},

i.e. as a spatial vector in \mathbb{F}^n where \mathbb{F} \in \{\mathbb{R},\mathbb{C}\}. In this way abstract vectors in a Euclidean space or Hilbert space can be represented as spatial vectors once an ordered orthonormal basis is chosen. However, for most purposes it is not necessary to choose an ordering on X and one may simply work directly with unordered Cartesian coordinates. For example, we have

\langle v,w \rangle = \sum\limits_{x \in X} \overline{\langle x,v\rangle}\langle x,w\rangle = \sum\limits_{x \in X}\langle v,x\rangle \langle x,w\rangle.

for every pair of vectors v,w \in V, and in particular

\|v\|^2 = \sum\limits_{x \in V} |\langle x,v\rangle|^2.

The use of Cartesian coordinates without imposing an ordering is a middle path between the asceticism of never using any coordinates and the indulgence of always using ordered coordinates. Another important computation is change of coordinates. Suppose that Y\subset V is a novel orthonormal basis generating its own Cartesian coordinates

v = \sum\limits_{y \in Y} \langle y,v\rangle y.

To express the novel coordinates we compute

v = \sum\limits_{y \in Y} \left\langle \sum\limits_{x \in X}\langle x,y\rangle x,v\right\rangle y = \sum\limits_{y \in Y} \left(\sum\limits_{x \in X} \overline{\langle x,y\rangle} \langle x,v\rangle \right)y.

Thus, by uniqueness of coordinates in a given basis, we conclude that

\langle y,v \rangle = \sum\limits_{x \in X}  \overline{\langle x,y\rangle} \langle x,v\rangle =\sum\limits_{x \in X} \langle y,x\rangle\langle x,v\rangle

holds for every y \in V. This is how change-of-basis is done without ordering bases.

Definition 5.6. A function A colon V \to W from one Hilbert space to another is said to be linear if A(0_V)=0_W and

A(\alpha_1 v_1 + \alpha_2 v_2) = \alpha_1 Av_1 +\alpha_2v_2.

For some reason, linear functions between Hilbert spaces are called linear transformations; they are generally denoted by capital letters and the usual brackets surrounding the argument to the function are omitted.

Previously, we discussed functions between finite sets and showed how these can be encoded as binary matrices (“one-hot encoding”). A non-trivial Hilbert space is an uncountably infinite set and functions between such objects can be very complicated. However, linear functions between finite-dimensional Hilbert spaces are not much more complicated than functions between finite sets: they are encoded by matrices whose entries may be any elements of the ground field.

Let X \subset V and Y \subset W be two Hilbert spaces with specified orthonormal bases, and let A \colon V \to W be a linear transformation. For any v \in V, we have

Av = A\sum\limits_{x \in X} \langle x,v \rangle x=\sum\limits_{x \in X} \langle x,v \rangle Ax.

Thus, A is uniquely determined by the |X|=\dim V vectors Ax, x \in X, and for each of these we have

Ax = \sum\limits_{y \in Y} \langle y,Ax\rangle.

It follows that A is uniquely determined by the |Y||X|=(\dim W)(\dim V) scalar products

\langle y,Ax \rangle, \quad x \in X,\ y \in Y,

which we call the matrix elements of A relative to X and Y. If one wishes, orderings x_1,\dots,x_n and y_1,\dots,y_m of X and Y may be chosen and the matrix elements of A may be arranged into the m \times n matrix [A] with entries

[A]_{ij} = \langle y_j, Ax_i\rangle,

but for most purposes it is not necessary to do this and one can work directly with the matrix elements of A relative to two unordered orthonormal bases.

As an example, let us discuss change of basis. In the category of finite sets with functions f \colon X \to Y the only thing that can occur is a relabeling of the points of X and Y, which will change the one-hot encoding of f by permuting its rows and columns. For linear maps between Hilbert spaces V and W, in which X and Y are sitting as orthonormal bases, we may consider genuinely different orthonormal basis \tilde{X} \subset V and \tilde{Y} \subset W. We then have

\langle \tilde{x}, A \tilde{y}\rangle = \left\langle \sum\limits_{y \in Y} \langle y,\tilde{y}\rangle y,\sum\limits_{x \in X} \langle x,\tilde{x}\rangle Ax \right\rangle=\sum\limits_{y \in Y}\sum\limits_{x \in X} \langle \tilde{y},y \rangle \langle y,Ax \rangle \langle x,\tilde{x} \rangle,

a formula which gives the new matrix elements in terms of the old ones and the Cartesian coordinates of \tilde{X},\tilde{Y} relative to the original bases X,Y.

Now let us consider linear transformations A \colon V \to \tilde{V} and B \colon \tilde{v} \to W, where V,\tilde{V},W are Hilbert spaces with orthonormal bases X,\tilde{X},Y respectively. Then, the composition BA:=B \circ A is a function X \to Y, and you can (and should) check that it is a linear transformation. We compute the matrix elements of the composition transformation in terms of the matrix elements of its constituents as follows:

\langle y,BA x\rangle=\left\langle y,\sum\limits_{\tilde{x} \in \tilde{X}} \langle \tilde{x},Ax\rangle B\tilde{x} \right\rangle =\sum\limits_{\tilde{x} \in \tilde{X}} \langle y,B\tilde{x}\rangle \langle \tilde{x},Ax\rangle.

Math 202A: Lecture 4

Definition 4.1. A Hilbert space is a complex vector space V equipped with a scalar product: a function

\langle \cdot,\cdot \rangle \colon V \times V \longrightarrow \mathbb{C}

satisfying:

  1. \langle v,v \rangle \geq 0 with equality if and only if v=0_V;
  2. \langle v,w \rangle=\overline{\langle w,v\rangle};
  3. \langle v, \beta_1w_1+\beta_2w_2\rangle=\beta_1\langle v,w_1\rangle + \beta_2 \langle v,w_2\rangle.

The term “scalar product” indicates that \langle \cdot,\cdot \rangle is a rule for multiplying two vectors v,w in V such that the product \langle v,w\rangle is a complex number rather than a vector in V. Using these axioms you can check that the scalar product satisfies the following augmented FOIL identity from high school algebra, which is known as sesquilinearity:

\langle \alpha_1v_1+\alpha_2v_2,\beta_1w_1+\beta_2w_2\rangle=\overline{\alpha_1}\beta_1\langle v_1,w_1\rangle + \overline{\alpha_1}\beta_2\langle v_1,w_2\rangle + \overline{\alpha_2}\beta_1\langle v_2,w_1\rangle + \overline{\alpha_2}\beta_2\langle v_2,w_2\rangle.

Definition 4.1 is nonstandard in that it typically includes an extra analytic condition which we have omitted (this will be discussed further below). One may also see the pair (V,\langle \cdot,\cdot\rangle) referred to as a Hermitian space.

A vector space V without a scalar product abstracts the familiar operations of addition and scaling of spatial vectors \vec{v}, which are standard in science and engineering. In particular you have already in your mind the image of vectors as directed line segments representing things like velocity or acceleration; these are added tail-to-tip and stretched or squished by scalar multiplication. The reason to generalize from spatial vectors \vec{v} to abstract vectors v is that many other types of objects can be manipulated in the same way as spatial vectors, and we can study all such systems simultaneously through an axiomatic development of general vector spaces. The motivation behind allowing for scaling by arbitrary complex numbers instead of just real ones is a bit more involved, but one may point to the use of complex vector spaces in quantum mechanics as one reason. The role of the scalar product is to provide an axiomatic foundation for abstracting familiar geometric aspects of spatial vectors, such as the length of a vector and the angle between two vectors, to the setting of an arbitrary vector space.

Defintion 4.2. A normed vector space is a complex vector space V equipped with a function

\|\cdot\| \colon V \longrightarrow [0,\infty)

such that

  1. \|v\|=0 if and only if v=0_V;
  2. \|\alpha v\| = |\alpha|\|v\|;
  3. \|v+w\| \leq \|v\| + \|w\|.

The norm \|v\| abstracts the notion of vector length in a way which is compatible with our geometric intuition. The first axiom says that only the zero vector has zero length. The second says that scaling a vector and then measuring its length is the same thing as multiplying the original length measurement by the magnitude of the scaling. The third is the triangle inequality: it abstracts the fact that if we add two spatial vectors \vec{v},\vec{w} tail-to-tip we get a triangle with three directed sides \vec{v},\vec{w} and \vec{v}+\vec{w}. Any normed vector space can be promoted to a metric space with distance defined by \mathrm{d}(v,w) = \|v-w\|.

We now claim that in a Hilbert space V the scalar product gives us a norm defined by

\|v\| = \sqrt{\langle v,v \rangle}.

To verify this claim, we have to check that the three conditions stipulated by Definition 4.2 do in fact hold. The first two are easy to check. The third is a bit more problematic: we compute

\|v+w\| = \sqrt{\langle v+w,v+w\rangle} = \sqrt{\|v\|^2+2\Re \langle v,w\rangle + \|w\|^2}

and we have to control the quantity \Re \langle v,w\rangle. Since the real part of any complex number is bounded by its modulus, where equality holds precisely for nonnegative real numbers, we have

\|v+w\| \leq \sqrt{\|v\|^2+2|\langle v,w\rangle| + \|w\|^2}

with equality if and only if \langle v,w \rangle \geq 0.

Theorem 4.3. (Cauchy-Schwarz inequality) For any vectors v,w \in V we have

|\langle v,w\rangle \leq \|v\| \|w\|,

with equality if and only if w = \alpha v for some \alpha \in \mathbb{C}.

Problem 4.1. Prove the Cauchy-Schwarz inequality.

At this point we have shown that the scalar product on a Hilbert space V gives rise to a genuine norm defined by \|v\|=\sqrt{\langle v,v \rangle} and hence to a genuine metric defined by \mathrm{d}(v,w) = \|v-w\|. The reason Definition 4.1 is nonstandard is that at this stage one typically includes an extra clause: the metric space (V,\mathrm{d}) must be complete. We are not going to make metric completeness part of our definition of Hilbert space because we will for the most part not need to take limits of sequences of vectors in Hilbert space — that would be analysis, and our focus is algebra. Furthermore, we will soon define a notion of dimension for Hilbert spaces and then restrict our study to the finite-dimensional ones, where metric completeness is automatic.

Although we are working over the complex numbers, one may just as well consider real vector spaces equipped with a scalar product: these are called Euclidean spaces because these are indeed the most elementary and natural setting in which to carry out an axiomatic abstraction of Euclidean geometry. So far, everything we have said about Hilbert spaces holds verbatim for Euclidean spaces. However, differences between the two cases will now start to emerge. Let us begin with the fact that in both Euclidean space and Hilbert space the scalar product can be recovered from the norm, but the recipe is a bit different depending on whether one is working over the real or complex numbers.

Theorem 4.4. (Euclidean Polarization) For any vectors v,w in a Euclidean space V, we have

\langle v,w \rangle = \frac{1}{4}\left( \|v+w\|^2 - \|v-w\|^2\right).

Proof: We have

\|v+w\|^2 = \langle v+w,v+w\rangle = \|v\|^2 +2\langle v,w \rangle + \|w\|^2

and

\|v-w\|^2 = \langle v-w,v-w\rangle = \|v\|^2 +2\langle v,w \rangle + \|w\|^2.

Subtracting the second expression from the first, we obtain

\|v+w\|^2 - \|v-w\|^2 = 4\langle v,w \rangle.

-QED

Theorem 4.5. (Hermitian Polarization) For any vectors v,w in a Hilbert space V, we have

\langle v,w \rangle = \frac{1}{4}\left( \|v+w\|^2 - \|v-w\|^2 -\right) + \frac{i}{4}\left( \|v+iw\|^2 - \|v-iw\|^2\right).

Proof: Same as above: expand the right hand side and simplify.

-QED

In both Euclidean space and Hilbert space, two vectors are said to be orthogonal if \langle v,w \rangle = 0 and in this case we have the following.

Theorem 4.6. (Pythagorean Theorem) For any orthogonal vectors v,w we have \|v+w\|^2 = \|v\|^2 + \|w\|^2.

Proof: Expand \|v+w\|^2 = \langle v+w,v+w\rangle and simplify using orthogonality.

-QED

Observe that the above argument also shows that we have \|v-w\|^2=\|v\|^2+\|w\|^2 for orthogonal vectors. If we drop orthogonality, the correct statement is the following, which is again the same in Euclidean and Hilbert space.

Theorem 4.7 (Parallelogram Law) For any two vectors v,w in Euclidean space or Hilbert space, we have

\|v+w\|^2 + \|v-w\|^2 = 2\|v\|^2 + 2\|w\|^2.

Proof: We have

\|v+w\|^2 = \langle v,v \rangle + \langle v,w\rangle + \langle w,v \rangle + \langle w,w\rangle

and

\|v-w\|^2 = \langle v,v \rangle - \langle v,w\rangle - \langle w,v \rangle + \langle w,w\rangle.

Adding these two expressions gives the stated identity.

-QED

Orthogonality is an abstraction of the notion of perpendicularity for spatial vectors. Now let us consider the notion of angles in general Euclidean spaces and Hilbert spaces. In both settings the definition is based on the Cauchy-Schwarz inequality, which implies

-1 \leq \frac{\Re \langle v,w \rangle}{\|v\|\|w\|} \leq 1.

Definition 4.8. The Euclidean angle between nonzero vectors v,w is the unique \theta \in [-\pi,\pi] such that

\cos \theta = \frac{\Re \langle v,w \rangle}{\|v\|\|w\|}.

The definition of the Euclidean angle between two vectors is valid in both Euclidean space and Hilbert space. In Euclidean space, the scalar product is real so we just write

\cos \theta = \frac{\langle v,w \rangle}{\|v\|\|w\|}, \quad \theta \in [-\pi,\pi].

This indeed corresponds to the intuitive notion of angle: if v,w are orthogonal then \theta=\frac{\pi}{2}, and if w=\alpha v then \theta=0 when \alpha >0 and \theta = \pi when \alpha <0. However, while definition 4.6 is logically valid for vectors in Hilbert space, it produces counterintuitive results: for example, if w=iv then \|v\|=\|w\| and \Re \langle v,w \rangle =0, so \theta=\frac{\pi}{2} even though w is a scalar multiple of v. We therefore introduce a different notion of angle measure as follows.

Definition 4.9. The Hermitian angle between nonzero vectors v,w is the unique \theta \in [0,\frac{\pi}{2}] such that

\cos \theta = \frac{|\langle v,w \rangle|}{\|v\|\|w\|}.

The Hermitian angle concept is much better adapted to complex scalars than the Euclidean angle: in particular the Hermitian angle between v and iv is zero. On the other hand, we now see a different kind of counterintuitive behavior in that the Hermitian angle between v and $-v$ is also zero. The explanation for this phenomenon is that -v = e^{i\pi}v, i.e. in complex geometry a ray is a real plane and v and -v point in the same “direction.” In other words, there are no obtuse angles in complex geometry because the real concept of rotating a vector is a special case of complex scaling. This is a feature, not a bug.

Theorem 4.10. (Euclidean Law of Cosines) For any vectors v,w we have

\|v-w\|^2 = \|v\|^2 +\|w\|^2-2\|v\|\|w\|\cos \theta,

where \theta is the Euclidean angle between v and w.

Proof: We have

\langle v-w,v-w\rangle = \|v\|^2+\|w\|^2 - 2\Re \langle v,w\rangle = \|v\|^2 + \|w\|^2 -2\|v\|\|w\|\frac{\Re \langle v,w \rangle}{\|v\|\|w\|},

and the result now follows from the definition of the Euclidean angle between v and w.

– QED

Theorem 4.11 (Hermitian Law of Cosines) For any vectors v,w in a Hilbert space V, we have

\min\limits_{\phi \in \mathbb{R}} \|v-e^{i\phi}w\|^2 = \|v\|^2+\|w\|^2-2\|v\|\|w\|\cos \theta.

where \theta is the Hermitian angle between v and w.

The following corollary of Theorem 4.9 is useful in the applied context of phase retrieval problems.

Corollary 4.12 (Best Phase Law) For any unit vectors v,w in a Hilbert space V, we have

\min\limits_{\phi \in \mathbb{R}} \|v-e^{i\phi}w\|=2\sin\left( \frac{\theta}{2}\right),

where \theta is the Hermitian angle between v and w.

Problem 4.2. Prove the Hermitian Law of Cosines for vectors in Hilbert space. Hint: modify the proof of the Euclidean case.

A lingering question is whether we could use some other class of normed vector spaces apart from Euclidean and Hilbert spaces to axiomatically develop real and complex geometry. The following nice result says that the answer is no.

Theorem 4.13. Let V be either a real or complex normed vector space in which the parallelogram law holds. Then, there exists a scalar product on V such that \|v\|=\sqrt{\langle v,v\rangle}.

Let us consider how one would prove Theorem 4.13, say in the real case. The basic idea is to define a polarization-inspired function

\langle \cdot,\cdot \rangle \colon V \times V \longrightarrow \mathbb{R}

by

\langle v,w \rangle = \frac{1}{4}(\|v+w\|^2 - \|v-w\|^2)

and show that this is a scalar product which induces the given norm. Indeed, we have

\langle v,v \rangle = \frac{1}{4}(\| v+v\|^2 - \|v-v\|^2) = \|v\|^2,

and since \|\cdot\| is a norm we immediately get that the first scalar product axiom holds. Symmetry is also straightforward,

4\langle v,w \rangle = \|v+w\|^2-\|v-w\|^2=\|w+v\|^2-\|w-v\|^2=4\langle w,v \rangle.

Problem 4.3. Complete the proof of Theorem 4.13 in the real case. Hint: one strategy is to first prove additivity,

\langle v,w_1+w_2\rangle=\langle v,w_1+w_2\rangle,

using the Parallelogram Law as your main tool. From here you can show that \langle v,nw\rangle = n \langle v,w\rangle for n \in \mathbb{Z}. Then think about how to bootstrap this to scalars in \mathbb{Q} and finally in \mathbb{R}. If you want to be a hero, adapt your argument to the complex setting as an optional add-on.

Math 202A: Lecture 3

Here is a brief recap of Week 1. We defined combinatorics to be the branch of mathematics which studies the category \mathbf{FSet} whose objects are finite sets and whose morphisms are functions. We noted that a function f \in \mathrm{Hom}(X,Y) can be encoded as a matrix over \{0,1\}. This is done by choosing a labeling x_1,\dots,x_n of X, a labeling y_1,\dots,y_m of Y, and building the m \times n matrix [f] with entries

[f]_{ij} = \begin{cases} 1,\text{ if } f(x_j)=y_i \\ 0,\text{ otherwise}\end{cases}.

In data science, the matrix [f] is referred to as a one-hot encoding of the function f. The one-hot encoding of a function depends on a labeling of the points of its domain and codomain: a relabeling of X will cause [f] to be right-multiplied by an n \times n permutation matrix, while a relabeling of Y requires us to left-multiply [f] by an m \times m permutation matrix. Also note that if gf \in \mathrm{Hom}(X,Y) and g \in \mathrm{Hom}(Y,Z) then [g \circ f]=[g] \circ [f]. So we see that combinatorics can be encoded in terms of structured matrices defined over \mathbb{Z}_2.

We next defined linear algebra to be the study of the category \mathbf{FVec} whose objects are finite-dimensional complex vector spaces equipped with a scalar product, which we refer to these as Hilbert spaces, and whose morphisms are linear transformations between Hilbert spaces. We defined the quantization functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FVec},

which sends each finite set X to the Hilbert space \mathcal{F}(X) of functions a \colon X \to \mathbb{C} equipped with the pointwise operations and scalar product

\langle a,b \rangle = \sum\limits_{x \in X} \overline{a(x)}b(x).

The Hilbert space \mathcal{F}(X) has an orthonormal basis consisting of the elementary functions

e_x(y) = \begin{cases} 1, \text{ if }x=y \\ 0, \text{ if } x\neq y \end{cases}.

Any function a \in \mathcal{F}(X) can be written as a linear combination of elementary functions,

a = \sum\limits_{x \in X} \alpha_x e_x,

where

\alpha_x = a(x)=\langle e_x,a\rangle.

Here we noted an alternative notation favored by algebraists: we simply write x instead of e_x, so that \mathcal{F}(X) becomes the space of formal \mathbb{C}-linear combinations of elements of X,

a= \sum\limits_{x \in X} \alpha_x x.

This saves us one layer of indexing, and it is a good notation for many purposes. When thought of in this way, \mathcal{F}(X) is referred to as the free vector space on the set X, and X sits inside \mathcal{F}(X) as an orthonormal basis. We will mostly use this formal notation going forward, although in other contexts (mostly analysis) the original functional notation is a better fit.

It remains to stipulate how \mathcal{F} transports morphisms in \mathbf{FSet} to morphisms in \mathbf{FVec}, i.e. how \mathcal{F} transforms functions between finite sets into linear transformations between Hilbert spaces. The recipe is so simple that it looks tautological: \mathcal{F} sends a given function f \in \mathrm{Hom}(X,Y) to the transformation \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) defined by

[\mathcal{F}(f)]x = f(x), \quad x \in X.

Thus, if we choose an ordering x_1,\dots,x_n of X and an ordering y_1,\dots,y_m of Y so that X and Y become ordered orthonormal bases of \mathcal{F}(X) and \mathcal{F}(Y), the matrix of the linear transformation \mathcal{F}(f) relative to these bases is simply the one-hot encoding of the original function f,

[\mathcal{F}(f)] =[f].

The reason that this apparently tautological translation of a set function f into a linear transformation \mathcal{F}(f) is useful is that it gives us many, many more matrix representations. Indeed, the only operations we could perform on the one-hot encoding [f] are permuting its rows and columns. However, for the linear transformation \mathcal{F}(f) we can multiply on the left and/or right by arbitrary invertible matrices to get new representations of the same function which are more advantageous and reveal properties of the function f which are not apparent from the one-hot encoding. The fact that this can be done in a systematic way is the Singular Value Decomposition, which is in some sense the main theorem of Math 202A: it tells us we can always find orthonormal bases L \subset \mathcal{F}(X) and R \subset \mathcal{F}(Y) such that the matrix of \mathcal{F}(f) is real diagonal. The bases L and R are called the left singular vectors and the right singular vectors of \mathcal{F}(f), respectively, and the nontrivial entries of the corresponding diagonal form of \mathcal{F}(f) are called its singular values. In fact, this powerful result holds not just for transformations between Hilbert spaces which are functorial images of finite set functions, but for arbitrary linear transformations.

Next week, after a basic review of Hilbert spaces, we will move towards formulating and proving the SVD. In fact, for such a powerful result the proof is quite simple and straightforward, and in the case of linear transformations which come from set functions as above it has a nice interpretation in terms of bipartite graphs which is a good aid to intuition. We finish this week by introducing one more functor which transports us from the combinatorial category \mathbf{FSet} into a different part of mathematics.

Let X be any set, not necessarily finite. Recall that a binary relation on X is a subset of X \times X, the Cartesian square of X. This is a very general concept, and the binary relations that arise in practice are more structured. For example, a function f \colon X \times X is a special type of binary relation on X. Also familiar are equivalence relations. Formally, we say that R \subseteq X \times X is an equivalence relation if it is:

  • Reflexive: for all x \in X we have (x,x) \in \mathsf{R}.
  • Symmetric: for all x,y \in X we have (x,y) \in \mathsf{R} iff (y,x) \in \mathsf{R}.
  • Transitive: if x,y,z \in X are such that (x,y) \in \mathsf{R} and (y,z) \in \mathsf{R}, then (x,z) \in \mathsf{R}.

The simplest and most obvious example of an equivalence relation is equality,

R = \{(x,x) \colon x \in X\}.

A different type of equivalence relation generalizes our intuitive notion of inequality. A partial on X is an relation R \subseteq X \times X which is reflexive and transitive, but antisymmetric rather than symmetric: (x,y) \in R and (y,x) \in R iff x=y. For partial orders, the notation x \leq y is used to indicate that (x,y) \in R because it suggestively makes the antisymmetry into

x \leq y \text{ and }y \leq x \iff x=y.

A set X equipped with a partial order \leq is called a partially ordered set, often abbreviated to poset. If X and Y are posets, a function f \colon X \to Y is said to be order-preserving if

x_1 \leq x_2 \text{ in }X \implies f(x_1) \leq f(x_2) \text{ in }Y.

We can now give a categorical definition of one more branch of mathematics.

Definition 3.1. Order theory is the study of the category \mathbf{PSet} whose objects are posets and whose morphisms are order-preserving functions.

We now define a functor

\mathcal{P} \colon \mathbf{Set} \longrightarrow \mathbf{PSet}

as follows. First, for any set X we declare

\mathcal{P}(X) = \{S \subseteq X\},

so \mathcal{P}(X) is the power set of X. We equip \mathcal{P}(X) with the partial order defined by set containment,

S \leq T \iff S \subseteq T.

If I was allowed to assign problems on Fridays, I would ask you to verify that the above really is a partial order; this is easy but worth doing. In order for \mathcal{P} to be a functor, it remains to stipulate how it acts on morphisms in \mathbf{Set}, i.e. how it converts a function f \colon X \to Y between sets into an order-preserving function \mathcal{P}(f) \colon \mathcal{P}(X) \to \mathcal{P}(Y). As with the functor \mathcal{F}, the recipe is extremely simple: if S \in \mathcal{P}(X) is a subset of X, we simply define

[\mathcal{P}(f)](S) = f(S) = \{f(x) \colon x \in S\}.

If not for the Friday ban I would ask you to verify that \mathcal{P}(f) \colon \mathcal{P}(X) \to \mathcal{P}(Y) is an order-preserving function, but I don’t want to risk a civil rights case.

In fact, \mathcal{P}(X) is a special kind of poset in which every pair of elements has both a least upper bound and a greatest lower bound defined by

\sup(S,T) = S \cup T \quad\text{ and }\quad \inf(S,T) = S \cap T.

A poset equipped with a sup and an inf is called a lattice. We define the category \mathbf{Lat} to be the full subcategory of \mathbf{PSet} whose objects are lattices, so that in fact

\mathcal{P} \colon \mathbf{Set} \longrightarrow \mathbf{Lat}.

Next week, we will see an analogous functor from vector spaces to lattices.

Math 202A: Lecture 2

In this lecture we formulate the basic organizing principle of Math 202A: linear algebra quantizes combinatorics. We will make this precise using the language of category theory, which we began to introduce in Lecture 1. In particular, we defined combinatorics as the study of the category \mathbf{FSet} whose objects are finite sets and whose morphisms are functions. It remains to give a corresponding definition of linear algebra.

Definition 2.1. Linear algebra is the study of the category \mathbf{FVec} whose objects are finite-dimensional complex vector spaces equipped with a scalar product, and whose morphisms are linear transformations.

We are going to call objects in \mathbf{FVec} “Hilbert spaces,” so that we don’t have to say “finite-dimensional complex vector spaces equipped with a scalar product” every time we reference such an object. However, you should be aware that in the world outside Math 202A the term Hilbert space is used in a broader sense and also includes infinite-dimensional complex inner product spaces which are complete for the norm induced by the scalar product. In the finite-dimensional case, completeness is automatic but for infinite-dimensional spaces it is an extra assumption. I repeat: in Math 202A the term Hilbert space exclusively refers to finite-dimensional complex vector spaces which come with a given scalar product. Note that two Hilbert spaces which have the same underlying vector space but different scalar products are distinct objects of \mathbf{FVect}, but they are isomorphic with the identity transformation being a particular isomorphism. In particular, isomorphisms in \mathbf{FVect} are not required to preserve scalar products, but those which do (isometric isomorphisms) are particularly important and useful. Also note that we sometimes (often) reference Hilbert spaces without explicitly mentioning their scalar product.

Problem 2.1. Prove that two Hilbert spaces are isomorphic if and only if there exists a linear bijection between them.

We will commence a careful discussion of \mathbf{FVec} next week. Right now, we want to construct some sort of a mapping

\mathbf{FSet} \longrightarrow \mathbf{FVec}

which will justify our statement that linear algebra is quantized combinatorics. Therefore, we first need to introduce a the general notion of mapping from one category to another. As discussed in Lecture 1, categories generalize sets insofar as \mathbf{Set} is just one example of a category, so mappings between categories generalize functions — they are called functors and defined as follows.

Let \mathbf{Cat} and \mathbf{Dat} be categories. A functor 

\mathcal{F} \colon \mathbf{Cat} \longrightarrow \mathbf{Dat}

is first of all a rule which assigns to each object X \in \mathrm{Ob}(\mathbf{Cat}) a corresponding object \mathcal{F}(X) \in \mathrm{Ob}(\mathbf{Dat}). Second, \mathcal{F} should map morphisms f in \mathbf{Cat} to morphisms \mathcal{F}(f) in \mathbf{Dat} in a way which is compatible with the composition laws in these categories. More precisely, we require that for any objects X,Y,Z \in \mathrm{Ob}(\mathbf{Cat}) and morphisms f \in \mathrm{Hom}(X,Y) and g \in \mathrm{Hom}(Y,Z), the corresponding morphisms \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) and \mathcal{F}(g) \in \mathrm{Hom}(\mathcal{F}(Y),\mathcal{F}(Z)) satisfy the equality

\mathcal{F}(g \circ f) = \mathcal{F}(g) \circ \mathcal{F}(f)

in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Z)).

We are now ready to define a specific functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FVec}

which we will refer to as the quantization functor. The quantization \mathcal{F}(X) of a given finite set X is the Hilbert space whose vectors are functions \varphi \colon X \to \mathbb{C}, with the vector space operations defined pointwise and the scalar product given by

\langle \varphi,\psi\rangle = \sum\limits_{x \in X}\overline{\varphi(x)}\psi(x).

We should verify that \mathcal{F}(X) is in fact a finite-dimensional Hilbert space. First, this requires us to check that the putative scalar product introduced above really is one.

Problem 2.2. Prove that \langle \varphi,\psi\rangle as defined above really is a scalar product on \mathcal{F}(X).

Second, we need to show that \mathcal{F}(X) really is a finite-dimensional vector space, and we do this by giving an explicit basis. For each x \in X define the corresponding Dirac function

\delta_x(y) = \begin{cases} 1, \text{ if } x=y \\ 0,\text{ if } x \neq y\end{cases}.

Thus, \Delta=\{\delta_x\colon x \in X\} is a set of |X| functions in \mathcal{F}(X), and in fact it is an orthonormal set,

\langle \delta_x,\delta_y\rangle = \sum\limits_{z \in X} \delta_x(z)\delta_y(z) = \delta_x(y).

So \Delta is a linearly independent set in \mathcal{F}(X) and only one check remains.

Problem 2.3. Prove that \Delta spans \mathcal{F}(X).

With Problem 2.3 in hand, we can introduce an alternative point of view for the space \mathcal{F}(X) which is often favored by algebraists: we can think of it as the “space of formal linear combinations” of points in X. That is, elements of \mathcal{F}(X) are linear combinations

\varphi = \sum\limits_{x \in X} \varphi_x x,

where \varphi_x \in \mathbb{C} are scalars. This doesn’t really mean anything since we have no notion of scaling a point x \in X by a number \varphi \in \mathbb{C}, and such a formal linear combination is really just a shorthand for the coordinate decomposition

\varphi = \sum\limits_{x \in X} \varphi(x) \delta_x

of a function \varphi \colon X \to \mathbb{C} with respect to the Diract basis \Delta = \{\delta_x \colon x \in X\}.

We are not yet done with defining the functor \mathcal{F} because we need to say how it assigns a linear transformation \mathcal{F}(f) to a given set function f. Before doing this let us explain why we are referring X \mapsto \mathcal{F}(X) as quantization. Consider the canonical finite set X=\{1,\dots,n\}. We can view this as the state space of a particle \bullet which can occupy any one position on a finite lattice of n sites. Equivalently, you can think of \bullet being a ball which may be in any one of n labeled boxes or, as probabilists like to say, “urns.” So X is the state space of a classical particle, which is certain to be in one these boxes. On the other hand, \mathcal{F}(X) is the state space of a quantum particle \bullet, whose state is a superposition of classical states until a measurement is performed. More precisely, a (pure) quantum state is a unit vector \psi \in \mathcal{F}(X), the physical meaning of which is that the probability to observe the quantum particle in state x is given by the amplitude |\langle \delta_x,\psi \rangle|^2.

Now to complete the definition of quantization as a functor \mathcal{F} from \mathbf{FSet} to \mathbf{FVect} it remains to say how a function f \colon X \to Y between finite sets becomes a corresponding linear transformation \mathcal{F} \colon \mathcal{F}(X) \to \mathcal{F}(Y) between the corresponding Hilbert spaces. As in Lecture 1, let

\lambda \colon \{1,\dots,n\} \to X \quad\text{and}\quad \mu\colon \{1,\dots,m\} \to Y

of X and Y, write x_1=\lambda(1),\dots,x_n=\lambda(n) and y_1=\mu(1),\dots,y_m=\mu(m). Then, the function f is encoded by a corresponding lookup table, namely the m \times n matrix [f] whose (i,j)-entry is 1 if f(x_j)=y_i and 0 otherwise. Now for any f \in \mathrm{Hom}(X,Y) we simply define the corresponding linear transformation

\mathcal{F}(f) \colon \mathcal{F}(X) \longrightarrow \mathcal{F}(Y)

to be the linear transformation whose matrix with respect to the bases \{\delta_x \colon x \in X\} and \{\delta_y \colon y \in Y\} is the matrix of the original set function f. That is,

\mathcal{F}(f)\delta_x=\delta_{f(x)}.

In fact, we can give a coordinate free description of the linear transformation \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y) — for every function \varphi \colon X \to \mathbb{C}, the function \mathcal{F}(f)\varphi \colon Y \to \mathbb{C} is the pushforward of \varphi through f.

Problem 2.4. Show that

[\mathcal{F}(f)\varphi](y) = \sum\limits_{x \in f^{-1}(y)} \varphi(x).

Now that we have justified our contention that linear algebra is quantized combinatorics, let us indicate briefly why this might be a useful functor to apply to a combinatorial problem. The first reason is that there are many more linear transformations between \mathcal{F}(X) \to \mathcal{F}(Y) then there are functions X \to Y. Technically one says that the functor \mathcal{F} is not full, and while this sounds like a negative it is actually a good thing. Indeed if X,Y are two finite sets of combinatorial objects which we suspect have the same cardinality, then working in the category \mathbf{FSet} our only option is to demonstrate the existence of a bijection f \colon X \to Y, and this could be hard. However, if we quantize our problem we now have the option of showing that there exists a linear isomorphism A \colon X \to Y, and this opens up a whole new realm of possibilities since we are allowed to map superpositions of objects in X onto superpositions of objects in Y. Another example comes from graph theory, where we can replace a graph with vertex set X by the function space \mathcal{F}(X), and encode the adjacency relation \sim in X as the operator

A \delta_x = \sum\limits_{y \sim x} \delta_y.

By thinking about the adjacency operator A in different bases of \mathcal{F}(X) we may be able to gain more insight into our graph. In particular, if we can find a superposition of vertices on which A acts diagonally we have access to some very useful information, as we will discuss later in the course.

A few more remarks. First, every Hilbert space V is isomorphic to a space of the form \mathcal{F}(X) for some finite set X. Indeed, every vector space has a basis, and hence by the Gram-Schmidt procedure every Hilbert space has an orthonormal basis. Now, V is isomorphic to \mathcal{F}(X) for any set X we use to label the elements of this ordered basis. Second, there is a different functor

\mathcal{F}^* \colon \mathbf{FSet} \longrightarrow \mathbf{FVec}

worth mentioning. As the notation suggests this is closely related to the functor latex \mathcal{F} and indeed \mathcal{F}^*(X)= \mathcal{F}(X) for every X \in \mathrm{Ob}(\mathbf{Fset}). The difference is that we send a function f \in X \to Y to its pullback along f, which is the linear transformation

\mathcal{F}^*(f) \colon \mathcal{F}^*(Y) \to \mathcal{F}^*(X)

defined by

\mathcal{F}^*(f)\psi=\psi \circ f, \quad \psi \in \mathcal{F}^*(Y).

Technically this does not match our definition of a functor given above, since it reverses arrows. Our initial definition in fact specifies the notion of a covariant functor, whereas this new construction is an example of a contravariant functor.

Problem 2.5. Show that the the matrix of the pullback \mathcal{F}^*(f) of a set function f \in \mathrm{Hom}(X,Y) is the adjoint of the lookup table of f.

Math 202A: Lecture 1

As in common parlance, a mathematical category \mathbf{Cat} consists firstly of a collection \mathrm{Ob}(\mathbf{Cat}) of objects which are distinct but similar in some way. In category theory the relationships between the objects of \mathbf{Cat} are morphisms, and for each pair of objects X,Y we have a corresponding collection of morphisms \mathrm{Hom}(X,Y). Morphisms in \mathrm{Hom}(X,X) are called endomorphisms, and the collection \mathrm{Hom}(X,X) itself is denoted \mathrm{End}(X). Morphisms can be composed: given objects X,Y,Z and morphisms f \in \mathrm{Hom}(X,Y) and g \in \mathrm{Hom}(Y,Z) we have a corresponding morphism g \circ f \in \mathrm{Hom}(X,Z). Composition is both associative and unital. Associativity means that for any morphisms such that h \circ (g \circ f) and (h \circ g) \circ f both make sense, they in fact coincide. Unitarity means that for every pair of objects X,Y there exist morphisms \mathrm{id}_X \in \mathrm{Hom}(X,X) and \mathrm{id}_Y \in \mathrm{Hom}(Y,Y) such that

f \circ \mathrm{id}_X = f =\mathrm{id}_Y \circ f

holds for all morphisms f \in \mathrm{Hom}(X,Y). We say that two objects X,Y are isomorphic if there exist morphisms f \in \mathrm{Hom}(X,Y) and g \in \mathrm{Hom}(Y,X) such that

g \circ f = \mathrm{id}_X \quad\text{and}\quad f \circ g = \mathrm{id}_Y.

We denote by \mathrm{Iso}(X,Y) the subcollection of \mathrm{Hom}(X,Y) consisting of isomorphisms. Thus, X and Y are isomorphic if \mathrm{Iso}(X,Y) is a nonempty collection. Isomorphism is a weaker notion than equality, since \mathrm{Iso}(X,Y) may contain multiple isomorphisms. We write \mathrm{Aut}(X) = \mathrm{Iso}(X,X) and call morphisms in \mathrm{Aut}(X) automorphisms of X.

Problem 1.1. Show that \mathrm{Aut}(X) is nonempty.

You might ask: what is the point of the category formalism introduced above? My preferred answer to this question is another question: what is the point of anything? This ontological response is sincere but runs the risk of being perceived as facetious. A more practical answer is that category theory is provides a very general and flexible way to organize objects and relationships, and is applied as such in the sciences. For more about this you can browse the Topos Institute website, or watch the short video below.

The fundamental example of a category is the category \mathbf{Set} whose objects are sets and whose morphisms are functions. The law of composition in \mathbf{Set} is composition of functions.

Problem 1.2. Prove that two sets X,Y are isomorphic if and only if there exists a bijective function f \colon X \to Y. 

Looking at the category \mathbf{Set} calls attention to the use of the term “collection” in the general definition of a category. This term is used because the collection \mathrm{Ob}(\mathbf{Set}) of all sets is not itself a set and attempting to view it as such leads to paradoxes. Thus, in order to think about categories in general we need to consider aggregations of objects which are in some sense larger than sets, and we refer to these as collections. Here you may feel the urge to object that the term “collection” has not been clearly defined. However, unless you also took issue with the use of the word “set” during your undergraduate studies, you forfeited the right to this objection long ago and I will happily ignore it. Of course, Math 202A does not occupy all of reality and you are welcome to examine these foundational issues yourself in its complement.

Having pointed out that the collection of all sets is a somewhat subtle concept, let us point out that the collections \mathrm{Hom}(X,Y) which make up the morphisms are not so bad: they can be considered as sets without running into any foundational issues. Indeed, if you believe that X and Y being sets is enough to certify

X \times Y = \{(x,y) \colon x \in X,\ y \in Y\}

as a set, than there is no further issue because a function f \colon X \to Y is by definition a subset f \subseteq X \times Y with the property that f contains exactly one point of the form (x,*) for each x \in X. Thus, the collection of all functions \mathrm{Hom}(X,Y) is contained in the collection of all subsets of the set X \times Y, and hence is contained in a set.

The upshot is that categories are more general than sets, with \mathbf{Set} being just one example of a category. A helpful heuristic to have in mind is that categories are like directed graphs, possibly very large ones, with objects being vertices and morphisms being directed edges. Composing morphisms is analogous to concatenating arrows. A subcategory should then be like a subgraph, and a function from one category to another should be like a graph homomorphism. We will come back to this.

Definition 1.1: Combinatorics is the study of the category \mathbf{FSet} whose objects are finite sets and whose morphisms are functions.

From the general perspective of Definition 1, the typical problems of combinatorics can be formulated along the following lines. First there is the counting problem: given a finite nonempty set X defined in some explicit but not transparent way (for example, the set of alternating sign matrices), what is the natural number n \in \mathbb{N} such that X is isomorphic to \{1,\dots,n\}? The study of questions of this kind is called enumerative combinatorics. Another type of question is: given two finite nonempty sets X,Y, again defined in some explicit but non-transparent way, is \mathrm{Iso}(X,Y) nonempty? If so, can we explicitly describe an isomorphism f \in \mathrm{Iso}(X,Y)? This endeavor is usually called bijective combinatorics.

The category \mathbf{FSet} is a subcategory of \mathbf{Set}. Hoping that the collection of all finite sets can be thought of as a set without running into paradoxes is a bit too optimistic (to see why, observe that for any set X, the set \{X\} is finite…). However, the set \mathrm{Hom}(X,Y) of functions between two finite sets is finite, and more than this functions between finite sets can be efficiently encoded as matrices. Indeed, suppose |X|=n and |Y|=m. By definition, this means that there exist isomorphisms

\lambda \colon \{1,\dots,n\} \longrightarrow X

and

\mu \colon \{1,\dots,m\} \longrightarrow Y.

It is natural to think of these isomorphisms as labelings of the elements of X and Y, writing

x_1=\lambda(1),\ \dots,\ x_n=\lambda(n)

and

y_1 = \mu(1),\ \dots,\ y_m=\mu(m).

With these labelings in place, we can represent the function f as the m \times n matrix [f] = [f]_{\lambda\mu} whose (i,j)-entry is 1 if f(x_j)=y_i and 0 if not. Thus, every f \in \mathrm{Hom}(X,Y) is represented by an element of \mathbb{Z}_2^{m \times n}, i.e. an m \times n matrix with entries in \mathbb{Z}_2=\{0,1\}. The cardinality of all such binary matrices in |\mathbb{Z}_2^{m \times n}|=2^{mn}, but the cardinality of \mathrm{Hom}(X,Y)| is only m^n = 2^{\log_2(m)n} because the matrix representation of a function has exactly one 1 in each column. It is important to note that the matrix representation of a given function f \in \mathrm{Hom}(X,Y) is not unique because it depends on the labelings \lambda,\mu. In particular, the number of distinct matrix representations of a given function f \in \mathrm{Hom}(X,Y) is at most m!n!.

Since every finite set is a set, every object of \mathbf{FSet} is an object of \mathbf{Set}. Moreover, for any two finite sets X,Y the collection of morphisms \mathrm{Hom}(X,Y) is the same whether we view X,Y as objects of \mathbf{Set} or as objects of \mathbf{FSet}. We say that \mathbf{FSet} is a full subcategory of \mathbf{Set} to indicate that no morphisms are lost. Contrast this with the category \mathbf{FSet}_* whose objects are finite sets and whose morphisms are injective functions; this is a subcategory of \mathbf{Set} but not a full subcategory. Indeed, \mathbf{FSet}_* is not even a full subcategory of \mathbf{FSet} — it has the same collection of objects but a restricted collection of morphisms. One says that \mathbf{FSet}_* is a wide subcategory of \mathbf{FSet}.

Problem 1.3. Verify that \mathbf{FSet}_* really is a category, i.e. obeys the axioms stipulated above. Calculate |\mathrm{Hom}(X,Y)| and |\mathrm{Hom}(Y,X)|.

Continuing onward, most courses you took as an undergraduate can be defined as the study of a particular category. Here is another example.

Definition 1.2. Group theory is the study of the category \mathbf{Grp} whose objects are groups and whose morphisms are group homomorphisms.

Groups form an extremely important and fundamental category because they occur in every category, more or less. The “more or less” is due to the fact that a group is, by definition, a set. A category \mathbf{Cat} is said to be locally small if \mathrm{Hom}(X,Y) is a set for every pair of objects X,Y \in \mathrm{Ob}(X,Y). In particular, every subcategory of \mathbf{Set} is a locally small category.

Problem 1.4. Prove that in any locally small category, \mathrm{Aut}(X) is a group for every object X.

Thus, every locally small category gives us a group for each of its objects. The subject known as representation theory is concerned with reversing this observation by representing the elements of a given group G as automorphisms in a preferred locally small category \mathbf{Cat}. For any object X \in \mathrm{Ob}(\mathbf{Cat}) we can consider actions of G on X, which by definition are group homomorphisms

\varphi \colon G \longrightarrow \mathrm{Aut}(X)

Now we can consider all ways in which G can act on the objects of \mathbf{Cat}, and assemble these into a new category \mathbf{Cat}_G whose objects are pairs (X,\varphi) as above. Such pairs are called representations of G, and for every representation \varphi(G)=\{\varphi(g) \colon g \in G\} is a subgroup of \mathrm{Aut}(X). Thus, we are representing the elements of G as automorphisms of the object X. This might result in losing some of the structure of G, since several group elements could map to the same automorphism of X. For example, the extreme case is the trivial representation where \varphi(g)=\mathrm{id}_X for every g \in G. In the opposite extreme case, where \varphi is injective and \varphi(G) is a subgroup of \mathrm{Aut}(X) isomorphic to G, the we say that (X,\varphi) is a faithful representation of G.

It remains to define the morphisms of our nascent category \mathbf{Cat}_G. For any two representations (X,\varphi) and (Y,\psi) we define the corresponding set of morphisms to be the subset \mathrm{Hom}_G(X,Y) of \mathrm{Hom}(X,Y) consisting of so-called G-equivariant morphisms f \in \mathrm{Hom}(X,Y). By definition, these are morphisms which intertwine the actions \varphi,\psi in the sense that

f \circ \varphi(g) = \psi(g) \circ f

holds for every group element g \in G.

Let us consider the above construction in the case where \mathbf{Cat}=\mathbf{Set}. For a set X, the group \mathrm{Aut}(X) is called the permutation group of \mathbf{X} and its elements are called permutations of X. Consequently, objects (X,\varphi) of \mathbf{Set}_G are called permutation representations of \mathbf{G}. Now, since a group is by definition a set, G is itself an object of \mathbf{Set}. Furthermore, there is a canonical action

\rho \colon G \longrightarrow \mathrm{Aut}(G)

of G on itself defined by \rho(g)h=gh, where g,h are elements of G and gh is the product of these elements in G. This gives a permutation representation (G,\rho) called the regular representation of G.

Problem 1.5. Prove that \rho as defined above really is a group homomorphism, and that \rho(g) really is a set-theoretic automorphism of G for every g \in G. Finally, prove Cayley’s theorem: the groups G and \rho(G) are isomorphic.

Cayley’s theorem is a very familiar result from elementary group theory which, in the language introduced above, says that every group has a faithful representation in the category of sets. Representations of groups on the category of vector spaces are called linear representations and we will study these in Math 202B. In Math 202A, we have a much more modest task, namely that of studying the category of vector spaces as is, without any group acting in the background.

Definition 1.3. Linear algebra is the study of the category \mathbf{Vec} whose objects are vector spaces and whose morphisms are linear transformations.

Technically, Definition 1.3 is incomplete because we must specify the field over which our vector spaces are defined. We will do this next lecture, where we further argue that finite-dimensional linear algebra can be viewed as a “quantization” of combinatorics.

Math 202A: Lecture 0

Math 202A is the first quarter of a three-quarter applied algebra class. It is a PhD level course on linear algebra, and the assumption is that you know linear algebra at the undergraduate level. Furthermore, the way in which the material is delivered will probably cause Math 202 to feel quite different from a standard undergraduate lecture course. Below is a lecture schedule.

DateModalityTopic
Sep 26AsynchronousCourse information
Sep 29In PersonCategories I
Oct 1In PersonCategories II
Oct 3In Person Categories III
Oct 6In PersonLengths
Oct 8In PersonCoordinates
Oct 10Asynchronous Transformations
Oct 13NoneColumbus Day
Oct 15In PersonEnrichment
Oct 17In PersonCovectors
Oct 20In PersonDirect sums
Oct 22In PersonSubspaces
Oct 24Asynchronous Projections
Oct 27AsynchronousCombinatorial SVD I
Oct 29Asynchronous Combinatorial SVD II
Oct 31NoneHalloween
Nov 3In PersonGeometric SVD I
Nov 5In PersonGeometric SVD II
Nov 7In PersonGeometric SVD III
Nov 10In PersonAdjoint and transpose
Nov 12In Person Projection, inversion
Nov 14In PersonMore SVD review
Nov 17NoneTravel
Nov 19Field TripWeingarten Calculus
Nov 21AsynchronousReview notes
Nov 24In Person Singular value geom.
Nov 26In PersonLow rank approx.
Nov 28NoneThanksgiving
Dec 1In PersonNormality
Dec 3In Person Selfadjointness
Dec 5In PersonSpectrum

Here is a bit more detail on the topics we will cover. This is not exhaustive and is intended only as a broad outline of some of the main themes.

We will begin by introducing the rudiments of category theory and discuss its usefulness in organizing our existing knowledge and future endeavors. For us category theory is an organizational device, but it has applications in the sciences and is a branch of applied algebra in its own right.

We will discuss the categories of sets, groups, posets, and vector spaces in some detail, and give a category-theoretic definition of group representation theory. We will then consider the “quantization” functor which transports us from the category of finite sets to the category of finite-dimensional complex vector spaces, which is where we shall stay for most of the course.

We will study finite-dimensional complex vector spaces with a scalar product, which we call Hilbert spaces. Morphisms in this category are linear functions, which for historical reasons are called linear transformations. Any function between finite sets can be encoded as a matrix with entries in \mathbb{Z}_2, and any linear transformation between Hilbert spaces can be encoded as a matrix with entries in \mathbb{C}. We prove the Singular Value Decomposition (SVD), which says that any linear transformation between Hilbert spaces can be represented as a diagonal matrix with real entries, this representation being unique up to signed permutations. This result is of fundamental importance in applications of linear algebra to the sciences.

A nice feature of the set of all linear transformations from one Hilbert space to another is that it is also a Hilbert space with a canonically defined scalar product called the Frobenius form. The Hilbert space structure on linear transformations plays a significant role in applications of linear algebra since it provides a way to discuss the magnitude of a matrix. There are many other useful matrix norms, and we will discuss a few of these from the point of view of the SVD.

After concluding our discussion of homomorphisms between Hilbert spaces we will concentrate on endomorphisms of a Hilbert space, which are called linear operators. Linear operators on a Hilbert space quantize self-maps on a finite set. For example, the quantization of an indicator function is an orthogonal projection and the quantization of a permutation is a unitary operator. Using the SVD, we establish the polar decomposition of an arbitrary operator and the spectral decomposition of a normal operator.

We then move on to a detailed study of the spectral decomposition of selfadjoint operators: the theory of eigenvalues. We will obtain many interesting results about eigenvalues, including results on dominance, interlacing, and perturbation. A very interesting class of selfadjoint operators are the adjacency operators of finite graphs, and we will discuss these in some detail and prove a few famous results on their eigenvalues, such as the Perron-Frobenius theorem on the largest eigenvalue and the Alon-Boppana theorem on the spectral gap.

It is natural to wonder what the spectrum of a typical selfadjoint operator looks like — what are the expected features of its eigenvalue distribution? This question motivates the subject of random matrix theory. We will prove the famous Wigner semicircle law which describes the expected eigenvalue distribution of large random Hermitian matrices.

Not all vector spaces which arise in practice have complex scalars. In particular, real coefficients seem more natural in applications though one should keep in mind that quantum mechanics tells us this is not really so. Throughout the course, we will regularly comment on when our Hilbert space results remain valid for real finite-dimensional vector spaces with scalar product, which are called Euclidean spaces. Typically this is quite straightforward. More significant differences occur when we move to the category of finite-dimensional vector spaces defined over a finite field. This category also has many important applications, and can be thought of as a different kind of quantization of the category of finite sets. We will spend a few lectures discussing this interesting category.

We will end with a discussion of the most important special function of a linear operator: the exponential function. For operators on a Hilbert space of dimension greater than one, it is not in general true that the exponential function converts addition into multiplication. The correct statement is the Campbell-Baker-Hausdorff formula, which is the first step down the road to Lie theory. We will present an elementary proof of this result due to Eichler.

This completes our rough outline of topics. There is no official textbook for the class and I will post typed notes following each lecture. However, while the course is running I recommend consulting the following books:

  • Seven Sketches in Compositionality: An Invitation to Applied Category Theory, by Brendan Fong and David Spivak. Just browse this one.
  • Lectures on Linear Algebra, by Israel Gelfand. Perhaps the best book on linear algebra ever written; I recommend reading the whole thing.
  • Matrix Analysis, by Charles Johnson and Roger Horn. Use this as a comprehensive reference.
  • Groups, Matrices, and Vector Spaces, by James B. Carrell. A nice book worth consulting regularly to broaden your horizons.
  • All the Mathematics You Missed but Need to Know for Graduate School, by Thomas Garrity. Read the chapter on linear algebra as a refresher. This book can be useful for qualifying exam study.

Your grade in Math 202A will be based on weekly problem sets (70%) and a final exam (30%). There is no midterm exam. Problem sets will typically be due on Sunday at 23:59, and will be submitted and returned via GradeScope (the course TA, Runqiu Xu, will send out an email during the first week of classes explaining this protocol in more detail). Your lowest problem set score will be dropped. The final exam is scheduled for 12/09/2025 at 15:00 in our regular lecture room, B412. I have no control over these coordinates, and if you cannot sit for the exam you should not take the course.