Let be a function from one Euclidean space to another. In making this declaration, we did not make the scalar products on the source and target spaces explicit. This omission is commonly made for the sake of convenience, and by abuse of notation we just denote all scalar products , since the vector space on which a given scalar product is defined can always be deduced from context. The resulting notational ambiguity is ultimately less confusing than requiring distinct symbols for all scalar products in play at any given time.
As discussed in Lecture 2, our goal is to do calculus with functions which take vector inputs and produce vector outputs. Since is a function of only one variable, the vector , let us explain why this subject is also known as multivariable calculus. Let be an orthonormal basis of , so that every is given by
If is viewed as a variable (rather than a particular vector), then its coordinates
relative to the orthonormal basis also become variables. In other words, associated to the function is another function defined by
which is a function of the scalar variables , where . The objects and contain exactly the same information, even though the former is a function of a single vector variable and the latter is a function of scalar variables . Notice that the construction of from makes no mention of the scalar product on , nor does it reference a basis in .
Now suppose we choose an orthonormal basis of . Then, for any , we have
and since we are thinking of as a variable, it is natural to think of the coordinates of in the basis as functions of ,
The functions so defined are called the component functions of relative to the basis , and they contain exactly the same information as the function itself. In particular, we have the following.
Proposition 1: Given a function and vectors and we have
if and only if
where are the components of the vector relative to the orthonormal basis of . In particular, the vector-valued function is continuous at if and only if the scalar-valued functions are continuous at .
Proof: Since the basis is fixed, let us write as an abbreviation for We have
Suppose first that as . Then, for any , there exists such that implies
where we are assuming so that Since each term in the sum of squares is bounded by the total sum, this gives
for each which shows that as
Conversely, suppose that for each we have as and let be given. Then, for each there is a such that implies
Setting , we thus have that implies
which shows that as
Technicalities aside, the structure of the above proof is very simple: the argument is that if a sum of nonnegative numbers is small, then each term in the sum must be small, and conversely the sum of a finite number of small numbers is still small.
Now let us consider the above paragraphs simultaneously, meaning that we have chosen orthonormal bases and . Then each coordinate function gives rise to an associated function defined by
In particular, upon choosing orthonormal bases and , every function gives rise to an associated function defined by
Example 1: Let be the function defined by , where is a specified vector. This function is called “translation by ” Choose an orthonormal basis , and suppose that the coordinate vector of relative to this basis is . Then, the function is given by
The above discussion shows that the perspective of vector calculus, in which we consider functions of a single vector variable , is equivalent to the perspective of multivariable calculus, in which we consider functions of multiple scalar variables . From this perspective, one might wonder about the prospect of a “multivector” calculus in which we consider functions of multiple vector variables, where it may even be that each vector variable ranges over its own Euclidean space . In fact, this is already included in vector calculus, because such -tuples of vectors are themselves single vectors in an enlarged Euclidean space.
Definition 1: Given Euclidean spaces their direct product is the Euclidean space consisting of the Cartesian product
with vector addition and scalar multiplication defined component-wise, i.e.
and scalar product defined by
It is good to be comfortable with both perspectives; the former is better for conceptual understanding, while the latter is useful for visualization and calculation.
Thus the calculus of functions of two vector variables, , is just the calculus of functions on the direct product Euclidean space . Equivalently, the calculus of functions is the same thing as the calculus of functions There is in fact a further generalization of vector calculus called tensor calculus, which is very useful in physics and engineering (particularly in the theory of relativity and in materials science), but that is beyond the scope of this course.
Example 2: It may be tempting to throw away the more abstract perspective entirely, and in the previous lectures I have been arguing against doing this by holding up the example of the function
which sends each symmetric operator on to the list of its eigenvalues arranged in weakly decreasing order. Conceptually, the function which sends a symmetric operator to its eigenvalues is very natural, and something you can hold in your mind quite easily. However, it is not easy to work concretely with this function by choosing coordinates. On Problem Set 1, we showed how to the choice of a basis leads to a corresponding basis , and in particular that if then . So, according to our discussion above we have an associated function
Moreover, if we choose the standard basis , then we have component functions
which send a symmetric operator to its $i$th largest eigenvalue,
and writing down a formula for these functions in terms of the coordinates of relative to amounts to writing down a formula for the eigenvalues of a symmetric matrix in terms of its entries, and doing this for is in a sense impossible. You’ll work out the case on PSet 2. Again, this all points to the need to be able to do approximations a la calculus, a question which we return to now.
We now come back to our general discussion of functions from one Euclidean space to another. In linear algebra, we consider only the case where is linear, and in that context what matters are associated vector spaces like the kernel and image of . However, to study more general (i.e. non-linear functions) between Euclidean spaces we need a more general vocabulary that includes a larger variety of special subsets of Euclidean space.
Defintion 1: Given a vector and a number , the open ball of radius centered at is the subset of defined by
This is the set of vectors whose distance to is strictly less than . Observe that is the empty set unless .
In terms of open balls, the continuity of a function at a point may be formulated as follows: is continuous at is and only if it has the property that, for any given , there is a corresponding such that the image of under is contained in .
Definition 2: A subset of a Euclidean space is said to be open if for any there exists such that . A subset of is said to be closed if its complement is open.
There is a characterization of continuous functions in terms of open and closed sets.
Theorem 1: A function is continuous if and only if the preimage of any open set is open. Equivalently, is continuous if and only if the preimage of any closed set is closed.
We won’t use Theorem 1 much, so we shall skip the proof – you will see this result again in a real analysis course.
We can also characterize continuity of a function as continuity of its components.
Theorem 2: A function is continuous at if and only if its component functions relative to an arbitrary orthonormal basis are continuous.
Definition 3: A set is said to be bounded if there exists such that i.e. if for all .
Definition 4: A set is said to be compact if it is closed and bounded.
Theorem (Extreme Value Theorem): If is compact, every continuous function attains a maximum and a minimum: there are points such that
The point is said to be a maximizer of , and is said to be a minimizer of .
There is a particular situation in which we can say more about maximizers and minimizers. Recall that a linear combination of vectors is an expression of the form
where are arbitrary scalars, and that the linear span of is the subset of consisting of all linear combinations of the these vectors,
There is constrained version of this in which we consider only linear combinations whose scalar coefficients are nonnegative and sum to . These special linear combinations of are called convex combinations, and the set
of all convex combinations of For example, the convex hull of two vectors may be visualized as the line segment whose endpoints are and , while the convex hull of three vectors may be visualized as the triangular region whose vertices are .
Theorem (Convex Optimization Theorem): For any finite set of vectors , every linear function has a maximizer and a minimizer in .
Here is a very interesting example of a convex hull. Let be a Euclidean space, and let be an orthonormal basis in . Recall that a permutation is a bijective function
If we write the table of values of such a function as a matrix,
then the bottom row of the matrix consists of the numbers arranged in some order. For example, in the case , the permutations consist of the following matrices:
For each permutation , the associated permutation operator on is defined by its action on the basis , which is given by
For example, in the case , the matrices of these operators relative to the basis are, in the same order, as follows:
Since these matrices have exactly one in every row and column, with all other entries , they obviously have the property that each row and column sums to . There are many more such matrices, however, and the following theorem characterizes them as the convex hull of the permutation matrices.
Theorem (Birkhoff-von Neumann theorem): The convex hull of the permutation operators consists of all operators on whose matrices relative to the basis have nonnegative entries, and whose rows and columns sum to .
We now have everything we need to prove that the eigenvalue map is continuous. In fact, we prove the following stronger result.
Theorem (Hoffman-Wielandt inequality): Let be the function which sends each symmetric operator on to its eigenvalues listed in weakly decreasing order. Then, we have
Proof: Let us make sure we understand which norms we are using. On the LHS of the inequality, we have the norm on corresponding to the standard inner product, so that if
On the right hand side, we have the Frobenius norm for operators,
we thus have that proving , which is our objective, is equivalent to proving
which we do now.
Since is a symmetric operator on , by the Spectral Theorem there exists and orthonormal basis of such that
From Lecture 1, we have the formula
Invoking the Spectral Theorem again, there is an orthonormal basis such that
We then have that
and plugging this into the above we finally have
Now observe that the matrix
has nonnegative entries, and also each row and column of sums to (why?). Thus, if we define a function on the convex hull of the permutation matrices by
then this function is linear and we have . It now follows from the convex optimization theorem together with the Birkhoff-von Neumann theorem that
where the maximum is over all permutations . Evaluating the right hand side, we get that
where the final equality follows from the fact that and
Corollary 1: The eigenvalue function is continuous.
Proof: A function from one metric space to another is said to be a “contraction” if
Thus, a contraction is a function which brings points closer together, or more precisely doesn’t spread them farther apart. It is immediate to check that the definition of continuity holds for any contraction (in fact, we can choose when checking continuity). The Hoffman-Wielandt inequality says that the eigenvalue mapping is a contraction: the distance between the eigenvalue vectors of symmetric operators is at most Frobenius distance between and