Math 31BH: Lecture 1

Let us begin Math 31BH by reviewing some aspects of Math 31AH. Last quarter, you encountered the following definition.

Definition 1: A Euclidean space is a pair (\mathbf{V},\langle \cdot,\cdot \rangle) consisting of a finite-dimensional real vector space together with a scalar product on that space.

Equivalently, a Euclidean space is a finite-dimensional real inner product space.

Given a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle), Math 31AH is largely about the associated vector space \mathrm{End} \mathbf{V} whose elements are linear operators

A \colon \mathbf{V} \to \mathbf{V}

(The “End” here comes from the word endomorphism, which is a more general term used to refer to a structure-preserving function from some object to itself). Every linear operator on \mathbf{V} is completely determined by its action on any basis \mathcal{B} of \mathbf{V}, which gives us a concrete description of A as a matrix [A]_\mathcal{B}. The form of this matrix depends heavily on the basis \mathcal{B}, and one of the main goals of Math 31AH was to treat the following problem: given an operator A \in \mathrm{End}\mathbf{V}, find a basis \mathcal{B} of \mathbf{V} such that [A]_\mathcal{B} is as simple as possible.

It turns out that this problem has a lot to do with how the operator A interacts with the scalar product \langle \cdot,\cdot \rangle on \mathbf{V}. You will recall that associated to every A \in \mathrm{End} \mathbf{V} is another operator A^*, called the adjoint of A, defined by the condition that

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A^*\mathbf{v},\mathbf{w} \rangle \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.

Another way to describe the relationship between A and A^* is by saying that, for any basis \mathcal{B} of \mathbf{V}, the matrix [A^*]_\mathcal{B} is the transpose of the matrix [A]_\mathcal{B}, and vice versa, and indeed sometimes A^* is referred to as the transpose of A.

Definition 2: An operator A \in \mathrm{End} \mathbf{V} is said to be selfadjoint (or symmetric) if A^*=A.

We denote by \mathrm{Sym} \mathbf{V} the subset of \mathrm{End}\mathbf{V} consisting of symmetric operators.

Proposition 1: \mathrm{Sym}\mathbf{V} is a subspace of \mathrm{End}\mathbf{V}.

Proof: The scalar product is bilinear, i.e. we have

\langle a_1\mathbf{v}_1 + a_2\mathbf{v}_2,b_1\mathbf{w}_1 + b_2\mathbf{w}_2 \rangle = \\a_1b_1 \langle \mathbf{v}_1,\mathbf{w}_1 \rangle + a_1b_2 \langle \mathbf{v}_1,\mathbf{w}_2 \rangle + a_2b_1\langle \mathbf{v}_2,\mathbf{w}_1 \rangle + a_2b_2\langle \mathbf{v}_2,\mathbf{w}_2 \rangle,

and from this it follows (why?) that the adjoint is a linear operator on linear operators: we have

(a_1A_1+a_2A_2)^*= a_1A_1^*+a_2A_2^*.

In particular, if A_1,A_2 are symmetric operators, we have that

(a_1A_1+a_2A_2)^*= a_1A_1+a_2A_2,

which says that \mathrm{Sym}\mathbf{V} is closed under taking linear combinations, i.e. it is a subspace of \mathrm{End}\mathbf{V}.

Q.E.D.

We now come to what is arguably the central result of Math 31AH, the Spectral Theorem.

Theorem 1: Given a symmetric operator S \in \mathrm{Sym}\mathbf{V}, there exists an orthonormal basis \mathcal{B}=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} of \mathbf{V} such that

A\mathbf{e}_1 = s_1 \mathbf{e}_1,\ \dots,\ A\mathbf{e}_n=s_n\mathbf{e}_n,

where s_1\geq \dots \geq s_n are real numbers.

The orthonormal vectors \mathbf{e},\dots,\mathbf{e}_n are eigenvectors of the symmetric operator S, and the numbers s_1 \geq \dots \geq s_n are its eigenvalues.

Once the Spectral Theorem is known, Math 31AH tends to devolve into a veritable orgy of diagonalization in which one is compelled to find the eigenvalues of all manner of particular symmetric operators. Diagonalization is an important skill which has many applications in science and engineering, from quantum mechanics to data science, but it is not the perspective taken in Math 31BH.

To elaborate, the Spectral Theorem gives us a function

f \colon \mathrm{Sym} \mathbf{V} \longrightarrow \mathbb{R}^n

which sends a symmetric operator on \mathbf{V} to the list of its eigenvalues arranged in weakly decreasing order, which is a vector in \mathbb{R}^n,

f(S) = (s_1,\dots,s_n).

In Math 31AH, a basic problem is: given S, compute f(S). That is, one wants to calculate the output of the function f for a given input. In Math 31BH, we would like to analyze the function f itself and see what kind of a function it is — what can we say about the function which sends a symmetric matrix to its eigenvalues?

First of all, it is important to understand that the function f is non-linear: given S,T \in \mathrm{Sym} \mathbf{V}, it is typically not the case that f(S+T) = f(S)+f(T), i.e. the eigenvalue vector of S+T is usually not simply the sum of the eigenvalue vector of S and the eigenvalue vector of T. This does occur if S and T happen to have a common set of eigenvectors, a situation which is equivalent to saying that they commute, ST=TS, and that is atypical behavior. So f is not a linear transformation, and even though this function arises from the context of linear algebra, we need to go beyond linear algebra to understand it.

Probably most natural question to be addressed is that of continuity: if two symmetric operators S,T are close together, are their eigenvalues also close together? Before we can answer this question, we need to formulate it precisely, and to do this we will briefly leave the world of vector spaces in order to axiomatize the notion of distance.

Definition 2: A metric space is a pair (X,\mathrm{d}) consisting of a set X together with a function

\mathrm{d} \colon X \times X \longrightarrow \mathbb{R}

which has the following properties:

  1. \mathrm{d}(x_1,x_2) \geq 0 for all x_1,x_2 \in X, with equality if and only if x_1=x_2;
  2. \mathrm{d}(x_1,x_2) = \mathrm{d}(x_2,x_1) for all x_1,x_2 \in X;
  3. \mathrm{d}(x_1,x_2) \leq \mathrm{d}(x_1,y) + \mathrm{d}(y,x_2) for all x_1,x_2,y \in X.

The function \mathrm{d} is referred to as a distance function or metric on X, and indeed the three axioms above are chosen so that \mathrm{d} has all the features we intuitively associate to the notion of distance: the first axiom says that the distance between two points cannot be negative, and is zero if and only if the two points are the same point; the second axiom says that the distance from home to work is the same as the distance from work to home; the third says that the shortest distance between to points is a straight line. The notion of distance is all we need to define the concepts of limit and continuity.

Defintion 2: Let (X,\mathrm{d}_X) and (Y,\mathrm{d}_Y) be metric spaces, and let f \colon X \to Y be a function. Given points p \in X and q \in Y, we say that the limit of f(x) as x approaches p is q, written

\lim\limits_{x \rightarrow p} f(x) = q,

if the following holds: for any \varepsilon > 0, there exists a corresponding \delta > 0 such that

\mathrm{d}_X(x,p) < \delta \implies \mathrm{d}_Y(f(x),q) < \varepsilon.

We say that f is continuous at p if the above holds with q=f(p). We say that f is continuous on a set P \subseteq X if it is continuous at every point p \in P.

In order to get a feel for the above definition, you should try to prove a familiar result from calculus in this general context: the composition of two continuous functions is continuous. Here is the precise formulation of this result.

Proposition 1: Let (X,\mathrm{d}_X), (Y,\mathrm{d}_Y), and (Z,\mathrm{d}_Z) be metric spaces, and let

f \colon X \to Y \quad\text{ and }\quad g \colon Y \to Z

be functions such that f is continuous at p and g is continuous at f(p). Then, the composite function h= g \circ f is continuous at p.

Proof: Try it! If you aren’t able to write down a proof of this theorem, go back to Definition 2 and read it again. Iterate this procedure as many times as necessary, and don’t feel bad if a few iterations are required.

Q.E.D.

We are now quite close to having the conceptual tools needed to meaningfully ask whether the function which sends a symmetric operator to its eigenvalues is continuous. The remaining conceptual step is to understand that the scalar product on a Euclidean space gives us a very natural metric. To see this, let us take an intermediate step.

Definition 3: Given a vector space \mathbf{V}, a norm on \mathbf{V} is a function

\|\cdot\| \colon \mathbf{V} \longrightarrow \mathbb{R}

which has the following properties:

  1. We have \|\mathbf{V}\|\geq 0 for all \mathbf{v} \in \mathbf{V}, with equality if and only if \mathbf{v}=\mathbf{0}_\mathbf{V} is the zero vector;
  2. We have \|t\mathbf{v}\| = |t| \|\mathbf{v}\| for all scalars t \in \mathbb{R} and vectors \mathbf{v} \in \mathbf{V}, where |\cdot| is the usual absolute value function on \mathbb{R};
  3. We have \|\mathbf{v}_1+\mathbf{v}_2\| \leq \|\mathbf{v}_1\|+\|\mathbf{v}\|_2 for all \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

A normed vector space is a pair (\mathbf{V},\|\cdot\|) consisting of a vector space together with a norm on that space.

Reading through the norm axioms, it is apparent that they are chosen to abstract the features of the familiar absolute value function |\cdot| on the real numbers \mathbb{R}, which has all of these features. This is relevant because the distance between x_1,x_2 \in \mathbb{R} is the absolute value of their difference,

\mathrm{d}(x_1,x_2) = |x_1-x_2|.

Proposition 2: If \|\cdot\| is a norm on \mathbf{V}, then \mathrm{d}(\mathbf{v}_1,\mathbf{v}_2)= \|\mathbf{v}_1-\mathbf{v}_2\| defines a metric on \mathbf{V}.

Proof: We have the prove that \mathrm{d} satisfies the metric axioms. The first distance axiom follows immediately from the first norm axiom. For the second, we have

\mathrm{d}(\mathbf{v}_1,\mathbf{v}_2) = \|\mathbf{v}_1-\mathbf{v}_2\| = \|(-1)(\mathbf{v}_2-\mathbf{v}_1)\| = |-1| \|\mathbf{v}_2-\mathbf{v}_1\|= \mathrm{d}(\mathbf{v}_2,\mathbf{v}_1),

where we made use of the second norm axiom. Finally, for any three vectors \mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, we have

\mathrm{d}(\mathbf{v}_1,\mathbf{v}_2) = \|\mathbf{v}_1-\mathbf{v}_2\| = \|\mathbf{v}_1-\mathbf{w} + \mathbf{w}-\mathbf{v}_2\| \leq \|\mathbf{v}_1-\mathbf{w}\| + \|\mathbf{w}-\mathbf{v}_2\| = \mathrm{d}(\mathbf{v}_1,\mathbf{w}) + \mathrm{d}(\mathbf{w},\mathbf{v}_2),

where we made use of the third norm axiom.

Q.E.D.

In view of Proposition 2, we will have found a metric on (\mathbf{V},\langle \cdot,\cdot \rangle) as soon as we’ve found a norm. The recipe to get a norm out of the scalar product is inspired by the familiar relation between the absolute value and the dot product in the most familiar of Euclidean spaces, \mathbb{R}^n:

|(x_1,\dots,x_n)| = \sqrt{(x_1,\dots,x_n) \cdot (x_1,\dots,x_n)}.

First we need the following important lemma.

Lemma 1 (Cauchy-Schwarz inequality): In any Euclidean space, the following inequality holds for every two vectors: we have

\langle \mathbf{v}_1,\mathbf{v}_2 \rangle \langle \mathbf{v}_1,\mathbf{v}_2 \rangle \leq \langle \mathbf{v}_1,\mathbf{v}_1 \rangle \langle \mathbf{v}_2,\mathbf{v}_2 \rangle,

with equality if and only if \mathbf{v}_1 and \mathbf{v}_2 are linearly dependent.

Proof: The proof is not difficult, and actually it’s a beautiful argument involving the quadratic formula, which you’ve known for many years. However it’s a bit long, and I don’t want to break up our flow — so look it up! I’ve provided you with a convenient link above.

Q.E.D.

Proposition 3: If \langle \cdot, \cdot \rangle is a scalar product on \mathbf{V}, then \|\mathbf{v}\| = \sqrt{\langle \cdot,\cdot \rangle} defines a norm on \mathbf{V}.

Proof: We have to check that the norm axioms are satisfied. The first norm axiom follows immediately from the first scalar product axiom, namely that \langle \mathbf{v},\mathbf{v} \rangle \geq 0 with equality holding only for the zero vector. For the second norm axiom, we have

\| t \mathbf{v}\| = \sqrt{\langle t\mathbf{v},t\mathbf{v} \rangle} = \sqrt{t^2}\sqrt{\langle \mathbf{v},\mathbf{v} \rangle} = |t| \|\mathbf{v}\|.

Last is the triangle inequality, and to verify this property we use the Cauchy-Schwarz inequality. We have

\|\mathbf{v}_1+\mathbf{v}_2\|^2 = \langle \mathbf{v}_1+\mathbf{v}_2,\mathbf{v}_1+\mathbf{v}_2\rangle = \langle \mathbf{v}_1,\mathbf{v}_1 \rangle +2\langle \mathbf{v}_1,\mathbf{v}_2 \rangle+ \langle \mathbf{v}_2\mathbf{v}_2 \rangle,

which gets us to

\|\mathbf{v}_1+\mathbf{v}_2\|^2 = \|\mathbf{v}_1\|^2 + 2\langle \mathbf{v}_1,\mathbf{v}_2 \rangle+\|\mathbf{v}_2\|^2 \leq \|\mathbf{v}_1\|^2 +2\|\mathbf{v}_1\|^2 \|\mathbf{v}_2\|^2+\|\mathbf{v}_2\|^2 = \left( \|\mathbf{v}_1\|+\|\mathbf{v}_2\|\right)^2,

so that taking the square root on both sides we obtain the triangle inequality,

\|\mathbf{v}_1+\mathbf{v}_2\| \leq \|\mathbf{v}_1\|+\|\mathbf{v}_2\|.

Q.E.D.

So, is the function which sends a symmetric matrix to its eigenvalues continuous? From the above developments, we are almost at the point where we can meaningfully ask this question: all that remains is to find a metric on the vector space \mathrm{Sym}\mathbf{V}, which reduces to finding a norm on \mathrm{Sym}\mathbf{V}, which in turn reduces to finding a scalar product on \mathrm{Sym}\mathbf{V}. In fact, we will define a scalar product on all of \mathrm{End} \mathbf{V}, and the subspace \mathrm{Sym}\mathbf{V} will inherit this scalar product.

The construction of a scalar product on \mathrm{End} \mathbf{V} uses the concept of trace, which you encountered in Math 31AH and which we now review. Let \mathcal{B} = \{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthonormal basis of \mathbf{V}, and define the trace relative to \mathcal{B} to be the function

\mathrm{Tr}_\mathcal{B} \colon \mathrm{End}\mathbf{V} \longrightarrow \mathbb{R}

given by

\mathrm{Tr}_\mathcal{B}A = \sum\limits_{i=1}^n \langle \mathbf{e}_i,A\mathbf{e}_i\rangle,

which is simply the sum of the diagonal elements of the matrix [A]_\mathcal{B}.

Proposition 3: For any operator A \in \mathrm{End}\mathbf{V}, we have

\mathrm{Tr}_\mathcal{B}A^* = \mathrm{Tr}_\mathcal{B}A,

and for any pair of operators A,B \in \mathrm{End}\mathbf{V}, we have

\mathrm{Tr}_\mathcal{B}(aA+bB) = a\mathrm{Tr}_\mathcal{B} A+b\mathrm{Tr}_\mathcal{B}(B)

for all a,b \in \mathbb{R}, and

\mathrm{Tr}_\mathcal{B}(AB)=\mathrm{Tr}_\mathcal{B}(BA).

Proof: By definition of the adjoint, we have

\mathrm{Tr}_\mathcal{B}A^* = \sum\limits_{i=1}^n \langle \mathbf{e}_i,A^*\mathbf{e}_i\rangle = \sum\limits_{i=1}^n \langle A\mathbf{e}_i,\mathbf{e}_i\rangle = \sum\limits_{i=1}^n \langle \mathbf{e}_i,A\mathbf{e}_i\rangle = \mathrm{Tr}_\mathcal{B}A,

where symmetry of the scalar product on \mathbf{V} was used in the second to last equality.

The second identity says that \mathrm{Tr}_\mathcal{B} is a linear function from \mathrm{End}\mathbf{V} to \mathbb{R}, and this follows immediately from the definition of \mathrm{Tr}_\mathcal{B} together with the bilinearity of the scalar product on \mathbf{V} (work this out to make sure you understand why!).

For the third identity, using Problem 1 in Assignment 1 we have

\mathrm{Tr}_\mathcal{B}AB = \sum\limits_{i=1}^n \langle \mathbf{e}_i,AB\mathbf{e}_i \rangle = \sum\limits_{i=1}^n \left\langle \mathbf{e}_i,A\sum\limits_{j=1}^n \langle \mathbf{e}_j,B\mathbf{e}_i\rangle \mathbf{e}_j \right\rangle = \sum\limits_{i=1}^n\sum\limits_{j=1}^n \langle \mathbf{e}_i,A\mathbf{e}_j\rangle \langle \mathbf{e}_j,B\mathbf{e}_i\rangle,

and likewise

\mathrm{Tr}_\mathcal{B}BA =\sum\limits_{i=1}^n\sum\limits_{j=1}^n \langle \mathbf{e}_i,B\mathbf{e}_j\rangle \langle \mathbf{e}_j,A\mathbf{e}_i\rangle.

Thus, the (i,j)-term in the expansion of \mathrm{Tr}_\mathcal{B}AB coincides with the (j,i)-term in the expansion of \mathrm{Tr}_\mathcal{B}BA, and since these expansions run over all pairs (i,j) \in \{1,\dots,n\} \times \{1,\dots,n\}, they are in fact the same sum.

Q.E.D.

Theorem 2: The function \langle A,B \rangle = \mathrm{Tr}A^*B is a scalar product on \mathrm{End}\mathbf{V}.

Proof: We have to check the scalar product axioms; they all follow directly form the properties of the trace established above. For non negativity, we have

\langle A,A \rangle = \mathrm{Tr} A^*A = \sum\limits_{i,j=1}^n \langle \mathbf{e}_i,A^*\mathbf{e}_j \rangle \langle \mathbf{e}_j,A\mathbf{e}_i \rangle = \sum_{i,j=1}^n \langle A\mathbf{e}_i,\mathbf{e}_j \rangle\langle A\mathbf{e}_i,\mathbf{e}_j \rangle.

This is a sum of squares, hence it is nonnegative, and equal to zero if and only if all of its terms are equal to zero. But if \langle A\mathbf{e}_i,\mathbf{e}_j \rangle=0 for all 1 \leq i,j \leq n, then A is the zero operator (make sure you understand why).

For symmetry, we have

\langle A,B \rangle = \mathrm{Tr}_\mathcal{B}(A^*B) = \mathrm{Tr}_\mathcal{B}(BA^*) = \mathrm{Tr}_\mathcal{B}(B^*A) = \langle B,A \rangle.

Finally, for (bi)linearity we have

\langle a_1A_1+a_2A_2,B \rangle = \mathrm{Tr}_\mathcal{B} (a_1A_1^*B + a_2A_2^*B) = a_1\mathrm{Tr}_\mathcal{B} A_1^*B + a_2\mathrm{Tr}_\mathcal{B}A_2^*B = a_1 \langle A_1,B\rangle + a_2\langle A_2,B \rangle.

Q.E.D.

In view of Proposition 3, Theorem 2 immediately gives us the following.

Corollary 1: The function \|A\|=\sqrt{\mathrm{Tr}_\mathcal{B} A^*A} defines a norm on \mathrm{End}\mathbf{V}.

The above norm on operators is called the Frobenius norm, and it is very important in matrix analysis and data science. The Frobenius norm is defined using the trace, and the since our definition of the trace depends on a choice of orthonormal basis \mathcal{B}, so does our definition of the Frobenius norm. In fact, this dependence is illusory.

Proposition 4: For any orthonormal basis \mathcal{C}=\{\mathbf{f}_1,\dots,\mathbf{f}_n\} of \mathbf{V}, the function \mathrm{Tr}_\mathcal{C} \colon \mathrm{End}\mathbf{V} \to \mathbb{R} defined by

\mathrm{Tr}_\mathcal{C}A = \sum\limits_{I=1}^n \langle \mathbf{f}_i,A\mathbf{f}_i\rangle

coincides with \mathrm{Tr}_\mathcal{B}.

Proof: the linear transformation U\colon \mathbf{V} \to \mathbf{V} defined by U\mathbf{e}_i=\mathbf{f}_i for i=1,\dots,n is invertible — it’s inverse transforms the basis \mathcal{C} back into the basis \mathcal{B}, and moreover U^{-1}=U^*, i.e. U is an orthogonal transformation of \mathbf{V}. For any operator A \in \mathrm{End}\mathbf{V}, we have

\mathrm{Tr}_\mathcal{C}A = \sum\limits_{i=1}^n \langle \mathbf{f}_i,A\mathbf{f}_i \rangle = \sum\limits_{i=1}^n \langle U\mathbf{e}_i,AU\mathbf{e}_i \rangle = \sum\limits_{i=1}^n \langle \mathbf{e}_i,U^*AU\mathbf{e}_i \rangle = \mathrm{Tr}_\mathcal{B}U^*AU=\mathrm{Tr}_\mathcal{B}(AUU^{-1})=\mathrm{Tr}_\mathcal{B}A.

Q.E.D.

In view of the above, we may simply refer to “the trace,” and write \mathrm{Tr} for this function without specifying a basis.

Now that \mathrm{End}\mathbf{V} has been promoted from a vector space to a Euclidean space, the function which sends a symmetric matrix to its ordered list of eigenvalues is a function between Euclidean spaces, and we can legitimately ask if this function is continuous. Next lecture, we will answer this question.

Leave a Reply