Massively Multivariable Open Online Calculus Course Notes

January 14, 2017 | Author: markstrayer | Category: N/A
Share Embed Donate


Short Description

Download Massively Multivariable Open Online Calculus Course Notes...

Description

Massively Multivariable Open Online Calculus Jim Fowler and Steve Gubkin EXPERIMENTAL DRAFT This document was typeset on April 27, 2014.

Contents

Contents

2

1

An n-dimensional space

We package lists of numbers as “vectors.” In this course we will be studying calculus of many variables. That means that instead of just seeing how one quantity depends on another, we will see how two quantities could affect a third, or how five inputs might cause changes in three outputs. The very first step of this journey is to give a convenient mathematical framework for talking about lists of numbers. To that end we define: Definition 1 is,

Rn is the set of all ordered lists containing n real numbers. That Rn = {(x1 , x2 , . . . , xn ) : x1 , x2 , . . . , xn ∈ R}.

The number n is called the dimension of Rn , and Rn is called n-dimensional space. When speaking aloud, it is also acceptable to say “are en.” We call the elements of Rn points or n-tuples. Example 2 R1 is just the set of all real numbers, which is often visualized by the number line, which is 1-dimensional. −1

0

1

2

3

Example 3 R2 is the set of all pairs of real numbers, like (2, 5) or (1.54, π). This can be visualized by the coordinate plane, which is 2-dimensional.

(1, 2)

Example 4 R3 is the set of all triples of real numbers. It can be visualized as 3-dimensional space, with three coordinate axes. Question 5

If (3, 2, e, 1.4) ∈ Rn , what is n?

Solution Hint: n-dimensional space consists of ordered n-tuples of numbers. How many coordinates does (3, 2, e, 1.4) have? Hint: Warning 6

Be careful to distinguish between commas and periods.

3

1

An n-dimensional space

Hint:

n=4

n=4

Question 7

Which point is farther away from the point (0, 0)?

Solution

(1, 4)

(−2, 3)

(0, 1)

(1, 1)

Hint: (a)

(0, 1)

(b)

(1, 1)

(c)

(−2, 3)

(d)

(1, 4)

X

It becomes quite difficult to visualize high dimensional spaces. You can sometimes visualize a higher dimensional object by having 3 spatial dimensions and one color dimension, or 3 spatial dimensions and one time dimension to get a movie. Sometimes you can project a higher dimensional object into a lower dimensional space. If you have the time, you should watch the excellent film Dimensions1 which will get you to visualize some higher dimensional objects. Although we may often be working with high-dimensional objects, we will generally not try to visualize objects in dimensions above 3 in this course. Nevertheless, we hope the video is enlightening!

1

4

http://www.dimensions-math.org/

2

Vector spaces

Vector spaces are where vectors live. It will be convenient for us to equip Rn with two algebraic operations: “vector addition” and “scalar multiplication” (to be defined soon). This additional structure will transform Rn from a mere set into a “vector space.” To distinguish between Rn as a set and Rn as a vector space, we think of elements of Rn as a set as being ordered lists, such as p = (x1 , x2 , x3 , . . . , xn ), but elements of Rn the vector space will be written typographically as vertically oriented lists flanked with square brackets, like this   x1  x2      ~v =  x3   ..   .  xn We will try to stick to the convention that bold letters like p represent points, while letters with little arrows above them (like ~v ) represent vectors. Unfortunately (like practically everybody else in the world), we use the same symbol Rn to refer to both the vector space Rn and the underlying set of points Rn . Vector addition is defined as follows:       x1 + y1 y1 x1  x2   y2   x2 + y2          ..  +  ..  =  ..   .  .  . xn + yn yn xn Warning 1

You cannot add vectors in Rn and Rm unless n = m.

An element of R is a number, but it is also called a “scalar” in this context, and vectors can be multiplied by scalars as follows:     x1 cx1  x2   cx2      c .  =  .   ..   ..  xn

cxn

Warning 2 We have not yet defined a notion of multiplication for vectors. You might think it is reasonable to define      x1 y1 x1 y1  x2   y2   x2 y2        ..   ..  =  ..  ,  .  .   .  xn

yn

xn yn 5

2

Vector spaces

but actually this operation is not especially useful, and will never be utilized in this course. We will have a notion of “vector multiplication” called the dot product, but that is not the (faulty) definition above. Question 3

Hint:

Solution

        4 1+3 3 1 2 + −2 = 2 + −2 = 0 7 3+4 4 3

    1 3 What is 2 + −2? 3 4

Question 4

Solution



Hint:

     3 3(3) 9 3 −2 = 3(−2) =  6  4 3(4) 12 

 3 What is 3 −2? 4

 Question 5

If ~v1 =

     3 1 1 , ~v2 = , and ~v3 = can you find a, b ∈ R so that −2 5 1

a~v1 + b~v2 = v3 ? Solution Hint: a~v1 + b~v2 = v3      3 1 1 a +b = −2 5 1       3a b 1 + = −2a 5b 1     3a + b 1 = −2a + 5b 1 

Can you turn this into a system of two equations?

6

2

Vector spaces

Hint: ( 3a + b = 1 −2a + 5b = 1 ( 15a + 5b = 5 −2a + 5b = 1 ( 17a = 4 −2a + 5b = 1  a = 4 17 −2( 4 ) + 5b = 1 17  a = 4 17 b = 5 17 a =4/17 Solution

b = 5/17

7

3

Geometry

Vectors can be viewed geometrically.



 x1  x2    n  Graphically, we depict a vector   .  in R as an arrow whose base is at the origin  .  xn and whose head is at the point (x1 , x2 , ..., xn ). For example, in R2 we would depict 3 the vector ~v = as follows 4

~v

Question 1

What is the vector w ~ pictured below? 8

3

Geometry

w ~

Solution

Hint:

Consider whether the x and y coordinates are positive or negative.



−4 2





3 −3





−4 −2



(c)

(d)

  4 2

(a)

(b)

X

9

3

Geometry

Question 2

Hint:

~v

  3 On a sheet of paper, draw the vector ~v = . Click the hint to see if you got 1 it right.

10

3

Question 3

Geometry

Hint:

~v1 + ~v2 ~v1

~v2

~v1 and ~v2 are drawn below. Redraw them on a sheet of paper, and also draw their sum ~v1 + ~v2 . Click the hint to see if you got it right.

~v1

~v2

11

3

Geometry

Question 4

Hint:

3~v

~v

~v is drawn below. Redraw it on a sheet of paper, and also draw 3~v . Click the hint to see if you got it right

~v

12

3

Geometry

You may have noticed that you can sum vectors graphically by forming a parallelogram. You also may have noticed that multiplying a vector by a scalar leaves the vector pointing in the same direction but ”scales” its length. That is the reason we call real numbers ”scalars” when they are coefficients of vectors: it is to remind us that they act geometrically by scaling the vector.

13

4

Span

Vectors can be combined; all those combinations form the “span.” Definition 1 We say that a vector w ~ is a linear combination of the vectors ~v1 , ~v2 , ~v3 , . . . , ~vk if there are scalars a1 , a2 , . . . , ak so that w = a1~v1 +a2~v2 +· · ·+ak~vk . Definition 2 The span of a set of vectors ~v1 , ~v2 , . . . , ~vk ∈ Rn is the set of all linear combinations of the vectors. Symbolically, span(~v1 , ~v2 , . . . , ~vk ) = {a1~v1 + a2~v2 + · · · + ak~vk : a1 , a2 , . . . , ak ∈ R}.       1 0 x Example 3 The span of 0, 1 is all vectors of the form y  for some x, y ∈ R 0 0 0             8 2 4 2 4 8 Example 4 is in the span of and because 2 + = 13 3 7 3 7 13

Question 5

      3 1 3 Is 4 in the span of 2 and −3? 0 0 2

Solution     1 3 Hint: The linear combinations of 2 and −3 are all the vectors of the form 0 0       1 3 3 a 2 + b −3 for scalars a, b ∈ R. Could 4 be written in such a form? 0 0 2 Hint:

No, because the last coordinate of all of these vectors  is 0. In fact, graphically, 3 the span of these two vectors is just the entire xy-plane, and 4 lives off of that plane. 2 (a)

Yes, it is in the span of those two vectors.

(b)

No, it is not in the span of those two vectors.

X

Graphically, we should think of the span of one vector as the line which contains the vector (unless the vector is the zero vector, in which case its span is just the zero vector). The span of two vectors which are not in the same line is the plane containing the two vectors. The span of three vectors which are not in the same plane is the “3D-space” which contains those 3 vectors.

14

5

Functions

A function relates inputs and outputs. Definition 1 A function f from a set A to a set B is an assignment of exactly one element of B to each element of A. If a is an element of A, we write f (a) for the element of B which is assigned to a by f . We call A the domain of f , and B the codomain of f . We will also commonly write f : A → B which we read out loud as “f from A to B” or “f maps A to B.” Example 2 Let W = {yes, no} and A = {Dog, Cat, Walrus}. Let f : A → W be the function which assigns to each animal in A the answer to the question “Is this animal commonly a pet?” Then f (Dog) = yes, f (Cat) = yes, and f (Walrus) = no. In this case, A is the domain, and W is the codomain. In these activities, we mostly study functions from Rn to Rm . Question 3

Let g : R1 → R2 be defined by g(θ) = (cos(θ), sin(θ)).

Solution Hint: Warning 4

In everything that follows, cos and sin are in terms of radians.

Hint:

π π π g( ) = (cos( ), sin( )) 6 6 6

Hint:

√  √ 3 3 1   If you remember your trig facts, this is ( , ). Format this as  2  for this 1 2 2 2

question.

π What is g( ) ? Give your answer as a vertical column of numbers. 6

Can you imagine what would happen to the point g(θ) as θ moved from 0 to 2π?

Question 5

Let h : R2 → R2 be defined by h(x, y) = (x, −y).

Solution Hint:

Consider h(2, 1) = (2, −1).

15

5

Functions

(2, 1)

h(2, 1)

Hint:

Hint:

h takes any point (x, y) to its reflection in the x−axis.

Hint:

Format your answer as



 2 −1

What is h(2, 1)? Format your answer as a vertical column of numbers. Try to understand this function graphically. How does it transform the plane? The hint reveals the answer to this question.

Question 6 x1 ).

Let f : R4 → R2 be defined by f ((x1 , x2 , x3 , x4 )) = (x1 x2 + x3 , x4 2 +

Solution Hint:

f (3, 4, 1, 9) = (3 · 4 + 1, 92 + 3) = (13, 84).

Hint:

Format this as

  13 . 84

What is f (3, 4, 1, 9)? Format your answer as a vertical column of numbers.

Note that this function has too many inputs and outputs to visualize easily. That certainly does not stop it from being a useful and meaningful function; this is a “massively multivariable” course.

16

6

Composition

One way to build new functions is via “composition.” Practically the most important thing you can do with functions is to compose them. Definition 1 Let f : A → B and g : B → C. Then there is another function (g ◦ f ) : A → C defined by (g ◦ f )(a) = g (f (a)) for each a ∈ A. It is called the composition of g with f . Warning 2 of g.

The composition is only defined if the codomain of f is the domain

Question 3 Let A = {cat, dog}, B = {(2, 3), (5, 6), (7, 8)}, C = R. Let f be defined by f (cat) = (2, 3) and f (dog) = (7, 8). Let g be defined by the rule g((x, y)) = x + y. Solution Hint:

First, (g ◦ f )(cat) = g (f (cat)).

Hint:

Then note that f (cat) = (2, 3).

Hint:

So this is g ((2, 3)) = 2 + 3 = 5.

(g ◦ f )(cat) = 5

Question 4 Let h : R2 → R3 be defined by h(x, y) = (x2 , xy, y), and let ω : R3 → R2 be defined by ω(x, y, z) = (xyz, z). Solution Hint: (ω ◦ h)(x, y) = ω [h(x, y)] = ω(x2 , xy, y) = ((x2 )(xy)(y), y) = (x3 y 2 , y) What is (ω ◦ h)(x, y)? Format your answer as a vertical column of formulas.

17

7

Higher-order functions

Sometimes functions act on functions. Functions from Rn → Rm are not the only useful kind of function. While such functions are our primary object of study in this multivariable calculus class, it will often be helpful to think about “functions of functions.” The next examples might seem a bit peculiar, but later on in the course these kinds of mappings will become very important. Question 1 I : C[0,1]

Let C[0,1] be the set of all continuous functions from [0, 1] to R. Define Z 1 → R by I(f ) = f (x)dx 0

Solution Hint: Z

1

I(g) =

g(x)dx 0

Z

1

=

x2 dx

0

1 3 1 x 0 3 1 = (1 − 0) 3 1 = . 3

=

If g(x) = x2 , then I(g) = 1/3

Let C ∞ (R) be the set of all infinitely differentiable (“smooth”) funcf 00 (0) 2 x . tions on R. Define Q : C ∞ (R) → C ∞ (R) by Q(f )(x) = f (0) + f 0 (0)x + 2 Question 2

Solution Hint:

Question 3

Solution

Hint:

f (0) = cos(0) = 1

f (0) = 1

Question 4 Hint:

Solution

f 0 (x) = − sin(x), so f 0 (0) = − sin(0) = 0

f 0 (0) = 0

Question 5

Solution

18

7

Hint:

Higher-order functions

f 00 (x) = − cos(x), so f 00 (0) = − cos(0) = −1

f 00 (0) = -1

Hint:

So Q(f )(x) = 1 −

x2 2

If f (x) = cos(x), then Q(f )(x) = 1 − x2 /2?

This is an example of a function which eats a function and spits out another function. In particular, this takes a function and returns the second order MacLaurin polynomial of that function.

Question 6 Define dotn : Rn × Rn → R by dotn ((x1 , x2 , ..., xn ), (y1 , y2 , ..., yn )) = x1 y1 + x2 y2 + x3 y3 + ... + xn yn . Solution Hint:

dot3 ((2, 4, 5), (0, 1, 4)) = 2(0) + 4(1) + 5(4) = 24

dot3 ((2, 4, 5), (0, 1, 4)) = 24

19

8

Currying

Higher-order functions provide a different perspective on functions that take many inputs. Definition 1 Let A and B be two sets. The product A × B of the two sets is the set of all ordered pairs A × B = {(a, b) : a ∈ A and b ∈ B}. Example 2

If A = {1, 2, Wolf} and B = {4, 5}, then A×B = {(1, 4), (1, 5), (2, 4), (2, 5), (Wolf, 4), (Wolf, 5)}

Example 3 We write R2 for pairs of real numbers, but we could have written R × R instead. Question 4 Let Func(R, R) be the set of all functions from R to R. Define Eval : R × Func(R, R) → R by Eval(x, f ) = f (x). Solution Hint:

Eval(−3, g) = g(−3) = | − 3| = 3

If g(x) = |x|, then Eval(−3, g) = 3?

Question 5 Let Func(A, B) be the set of all functions from A to B for any two sets A and B. Let Curry : Func(R2 , R) → Func(R, Func(R, R)) be defined by Curry(f )(x)(y) = f (x, y). Let h : R2 → R be defined by h(x, y) = x2 + xy. Solution Hint: G(3) = Curry(h)(2)(3) = h(2, 3) = 22 + 2(3) = 10 Let G = Curry(h)(2). Then G(3) = 10

This wacky way of thinking is helpful when thinking about the λ-calculus1 . It also helps a lot if you ever want to learn to program in Haskell—which is one of the languages that Ximera was written in.

1

http://en.wikipedia.org/wiki/Lambda_calculus

20

9

Python

Python provides a playground for multivariable functions. We can use Python to experiment a bit with multivariable functions. Question 1

Solution

Model the function f (x) = x2 as a python function.

Hint: Warning 2

Hint:

Python does not use ^ for exponentiation; it denotes this by **

Try using return x**2 Python

1 2

def f(x): return #your code here

3 4 5

def validator(): return (f(4) == 16) and (f(-5) == 25) ( Solution Hint:

Model the function g(x) =

−1 1

if x ≤ 0 as a Python function. if x > 0

Try using an if Python

1 2 3

def g(x): # your code here return # the value of g(x)

4 5 6

def validator(): return (g(0) == -1) and (g(-17) == -1) and (g(25) == 1) (

Solution 1 2 3

x/(1 + y) 0 Python

Model the function h(x, y) =

if y 6= −1 as a Python function. if y = −1

def h(x,y): # your code here return # the value of h(x,y)

4 5 6

def validator(): return (h(6,2) == 2) and (h(17,-1) == 0) and (h(-24,5) == -4)

21

10

Higher-order python

One nice feature of Python is that we can play with functions which act on functions. Question 1 Here is an example of a higher order function horizontal_shift. It takes a function f of one variable, and a horizontal shift H, and returns the function whose graph is the same as f , only shifted horizontally by H units. Solution 1 2 3 4 5 6 7 8

Find a function f so that horizontal_shift(f,2) is the squaring function.

Python def horizontal_shift(f,H): # first we define a new function shifted_f which is the appropriate shift of f def shifted_f(x): return f(x-H) # then we return that function return shifted_f def f(x): return # a function so that horizontal_shift(f,2) is the squaring function

9 10 11

def validator(): return (f(1) == 9) and (f(0) == 4) and (f(-3) == 1) Solution Write a function forward_difference which takes a function f : R → R and returns another real-valued function defined by forward difference(f )(x) = f (x + 1) − f (x). Python

1 2

def forward_difference(f): # Your code here

3 4 5 6 7 8 9

def validator(): def f(x): return x**2 def g(x): return x**3 return (forward_difference(f)(3) == 7) and (forward_difference(g)(4) == 61)

22

11

Calculus

We can do some calculus with Python, too. Let’s try doing some single-variable calculus with a bit of Python. Let epsilon be a small, but positive number. Suppose f : R → R has been coded as a Python function f which takes a real number and returns a real number. Seeing as f (x + h) − f (x) f 0 (x) = lim , h→0 h can you find a Python function which approximates f 0 (x)? Given a Python function f which takes a real number and returns a real number, we can approximate f 0 (x) by using epsilon. Write a Python function derivative which takes a function f and returns an approximation to its derivative. Solution Hint:

1 2 3 4

To approximate this, use (f(x+epsilon) - f(x))/epsilon.

Python epsilon = 0.0001 def derivative(f): def df(x): return (f(blah blah) - f(blah blah)) / blah blah return df

5 6 7 8 9 10 11 12 13

def validator(): df = derivative(lambda x: 1+x**2+x**3) if abs(df(2) - 16) > 0.01: return False df = derivative(lambda x: (1+x)**4) if abs(df(-2.642) - -17.708405152) > 0.01: return False return True This is great! In the future, we’ll review this activity, and then extend it to a multivariable setting.

23

12

Linear maps

Linear maps respect addition and scalar multiplication. We begin by defining linear maps. Definition 1 A function L : Rn → Rm is called a linear map if it “respects addition and scalar multiplication.” Symbolically, for a map to be linear, we must have that L(~v + w) ~ = L(~v ) + L(w) ~ for all ~v , w ~ ∈ Rn and also L(a~v ) = aL(~v ) for all a ∈ R and ~v ∈ Rn . Definition 2 Linear Algebra is the branch of mathematics concerning vector spaces and linear mappings between such spaces. Question 3

Which of the following functions are linear?

Solution Hint: be  linear, it must respect scalar Let’s    For a function to    multiplication.  see how 1 1 1 1 f 5 compares to 5f , and also how h 5 compares to 5h . 1 1 1 1 Question 4 Hint:

Solution

Remember f is defined by f

       x 1 5 = x + 2y, so f 5 = f = y 1 5

5 + 2(5) = 15    1 What is f 5 ? 15 1 Solution Hint:

Remember f is defined by f

    x 1 = x + 2y, so f = 1 + 2 (1) = 3 y 1

  1 What is f ? 3 1 Solution (a)

Yes

(b)

No

Is f

     1 1 5 = 5f ? 1 1

X

Great! So f has a chance of being linear, since it is respecting scalar multiplication in this case. What about h? Solution Hint:

Remember h is defined by h

           x 17 1 5 17 = , so h 5 =h = y x 1 5 5

   1 What is h 5 ? 1 Solution

24

12

Hint:

Remember h is defined by h

What is h

  1 ? 1

Yes

(b)

No

        17 1 17 x = , so h = 1 1 x y

     1 1 ? = 5h Is h 5 1 1

Solution (a)

Linear maps

X

Great! So h is not linear: by looking at this particular example, we can see that h does not always respect scalar multiplication. So h is not linear. Since we know one of the two functions is linear, we can already answer the question: The answer is f . To be thorough, lets check that f really is linear. First we check that f really does respect   scalar multiplication: x Let a ∈ R be an arbitrary scalar and ∈ R2 be an arbitrary vector. Then y f

     x ax a =f y ay = ax + 2ay = a (x + 2y)   x = af y

Nowwecheck that  f really does respect vector addition: x1 x2 Let and be arbitrary vectors in R2 . Then y1 y2  f

     x1 x2 x1 + x2 + =f y1 y2 y1 + y2 = (x1 + x2 ) + 2 (y1 + y2 ) = x1 + x2 + 2y1 + 2y2 = (x1 + 2y1 ) + (x2 + 2y2 )     x1 x2 =f +f y1 y2

This proves that f is linear!

(a) (b)

  x f : R → R defined by f = x + 2y y     x 17 h : R2 → R2 defined by h = y x 2

1

X

What about these two functions? Which of them is a linear map? Solution

25

12

Linear maps

Hint:

For a function to be linear, it must scalar addition.  Let’s  howh(5+2)  respect    see   2 1 2 1 compares to h(5)+h(2) and also how g 3 + 4 compares to g 3 +g 4. 1 5 1 5

Question 5

Solution

   7 x 7 x    Remember h is defined by h(x) =  , so h(5 + 2) = h(7) =   7 x 28 4x 

Hint:

What is h(5 + 2)? Solution 

       x 5 2 7 x  5  2  7         Remember h is defined by h(x) =  , so h(5) + h(2) =   +   =   x 5 2 7 4x 20 8 28

Hint:

What is h(5) + h(2)? Solution (a)

Yes

(b)

No

Is h(5 + 2) = h(5) + h(2)? X

Great! So h has a chance of being linear, since it is respecting vector addition in this case. What about g? Solution            x 2 1 3 x Hint: Remember g is defined by g y  = , so g 3 + 4 = g 7 = xy z 1 5 6     3 3 = 3(7) 21      2 1 What is g 3 + 4? 1 5 Solution Hint:

26

    x x Remember g is defined by g y  = , so xy z         2 1 2 1         3 4 g +g = + 2(3) 1(4) 1 5     2 1 = + 6 4   3 = 10

12

Linear maps

    1 2 What is g 3 + g 4? 5 1          1 2 1 2 Is g 3 + 4 = g 3 + g 4 5 1 5 1

Solution (a)

Yes

(b)

No

X

Great! So g is not linear: by looking at this particular example, we can see that g does not always respect vector addition. So g is not linear. Since we know one of the two functions is linear, we can already answer the question: The answer is h. To be thorough, lets check that h really is linear. First we check that h really does respect scalar multiplication: Let a ∈ R be an arbitrary scalar and x ∈ R be an arbitrary vector. Then 

 ax  ax   h (ax) =   ax  4ax   x x  = a x 4x = ah(x) Now we check that h really does respect vector addition: Let x and y be arbitrary vectors in R1 . Then 

 x+y  x+y   h (x + y) =   x+y  4(x + y)   x+y  x+y   =  x+y  4x + 4y     x y x y    = +  x y 4x 4y = h(x) + h(y) This proves that h is linear!

(a)

    x x g : R3 → R2 defined by g y  = xy z

27

12

Linear maps  x x  h : R → R4 defined by h(x) =  x 4x 

(b)

X

And finally, which of the following functions are linear? Solution Hint:

For a function to be linear, it must respect how  see  scalar  multiplication. Let’s 1 1      2  2 2 2     and also how G  compares to 2A A 2 2 3 compares to 2G 3. 3 3 4 4

Question 6 Hint:

Solution

Remember A is defined by A

           x 0 2 4 0 = , so A 2 =A = y 0 3 6 0

   2 What is A 2 ? 3 Solution Hint:

Remember A is defined by A

What is 2A

  2 ? 3      2 2 Is A 2 = 2A )? 3 3

Solution (a)

Yes

(b)

No

          x 0 2 0 0 = , so 2A =2 = y 0 3 0 0

X

Great! So A has a chance of being linear, since it is respecting vector addition in this case. What about G? Solution

Hint:

28

    x ex+y y     x + z , so Remember G is defined by G   z  = sin(x + t) t      1 2  2 4     G 2 3 = G 6 4 8   e2+4 = 2+6  sin(2 + 8)   e6 = 8  sin(10)

12

Linear maps

   1  2 ?   What is G 2   3  4 Solution     x ex+y y     x + z , so Hint: Remember G is defined by G   z  = sin(x + t) t     1 e1+2 2    1+3  2G  3 = 2 sin(1 + 4) 4  3  e = 2 4  sin(5)   2e3 = 8  2 sin(5)    1 2   ? What is 2G   3  4      1 1  2 2     Is G  2 3 = 2G 3? 4 4

Solution

(a)

Yes

(b)

No

X

Great! So G is not linear: by looking at this particular example, we can see that G does not always respect scalar multiplication. So G is not linear. Since we know one of the two functions is linear, we can already answer the question: The answer is A. To be thorough, lets check that A really is linear. First we check that A really does respect   scalar multiplication: x Let c ∈ R be an arbitrary scalar and ∈ R2 be an arbitrary vector. Then y      x ax A c =A y ay   0 = 0   0 =a 0 Nowwecheck that  A really does respect vector addition: x1 x2 Let and be arbitrary vectors in R2 . Then y1 y2

29

12

Linear maps

 A

     x1 + x2 x2 x1 =A + y1 + y2 y2 y1   0 = 0     0 0 = + 0 0     x1 x2 =A +A y1 y2

This proves that A is linear!

(a)

(b)

    x ex+y     y   x + z  G : R4 → R3 defined by G   z  = sin(x + t) t     x 0 A : R2 → R2 defined by A = X y 0

Warning 7 is linear.

Note that the function which sends every vector to the zero vector

    1 3 Question 8 Let L : R3 → R2 be a linear function. Suppose L 0 = , 4 0         0 0 −2 1 , and L 0 = . L 1 = 0 −1 1 0 Solution Hint:

The only thing we know about linear maps is that they respect  scalar  multi4 plication and vector addition. So we need to somehow rewrite the vector −1 in terms 2       1 0 0 of the vectors 0, 1 and 0, scalar multiplication, and vector addition, to exploit 0 0 1 what we know about L.         4 1 0 0 Question 9 Can you rewrite −1 in the form a 0 + b 1 + c 0? 2 0 0 1 Solution 

Hint:

30

       4 1 0 0 Observe that −1 = 4 0 + −1 1 + 2 0. 2 0 0 1

12

Hint:

  1 Consider the coefficient on 0. 0

Hint:

In this case, a = 4.

Hint:

Moreover, b = −1.

Hint:

Finally, c = 2.

Linear maps

a=4 Solution

b = -1

Solution

c=2

Now using the linearity of L, we can see that 

        4 1 0 0 L −1 = L 4 0 + −1 1 + 2 0 2 0 0 1       1 0 0 = 4L 0 + −1L 1 + 2L 0 0 0 1 Can you finish off the computation?

Hint: 

       4 1 0 0 L −1 = 4L 0 + −1L 1 + 2L 0 2 0 0 1       3 −2 1 =4 + −1 +2 4 0 −1       12 2 2 = + + 16 0 −2   16 = 14 

 4 Let ~v = L −1. What is ~v ? 2

Can you generalize this? Solution

31

12

Linear maps

Hint:

The only thing we know about linear maps is that they respectscalar  multix plication and vector addition. So we need to somehow rewrite the vector y  in terms z       0 0 1 of the vectors 0, 1 and 0, scalar multiplication, and vector addition, to exploit 1 0 0 what we know about L.         0 0 1 x Question 10 Can you rewrite y  in the form a 0 + b 1 + c 0? 1 0 0 z Solution Hint:

        x 1 0 0 y  = x 0 + y 1 + z 0 z 0 0 1

a=x Solution

b=y

Solution

c=z

Hint:

Now using the linearity of L, we can see that          x 1 0 0 L y  = L x 0 + y 1 + z 0 z 0 0 1          1 0 0 = xL 0 + yL 1 + zL 0 0 0 1

Can you finish off the computation? Hint:            x 1 0 0 L y  = xL 0 + yL 1 + zL 0 z 0 0 1       3 −2 1 =x +y +z 4 0 −1       3x −2y z = + + 4x 0 −z   3x − 2y + z = 4x − z   x Let ~v = L y ? What is ~v ? z

32

12

Linear maps

n m As you have already discovered a linear map L: R  fully determined  → R  is 0 1 1 0     0   by its action on the “standard basis vectors” e1 =  , e2 = 0, and so on, until  ..   ..  . .

0 0   0 0     we reach en =  ... .   0 1 Argue convincingly that if L : Rn → Rm is a linear map and you know L(~ei ) for n i = 1, 2, 3, ..., n, then you could  figure  out L(~v ) for any ~v ∈ R . I want to determine x1  x2     x3    n  what L does to any vector ~v =   .  ∈ R . I can rewrite ~v as x1 e~1 + x2 e~2 + x3 e~3 +  .     .  xn ...+xn e~n . By the linearity of L, L(~v ) = x1 L(e~1 )+x2 L(e~2 )+x3 L(e~3 )+...+xn L(e~n ). Since I already know the value of L(~ ei ) for all i = 1, 2, 3, ..., n, this allows me to compute L(~v ). So L is completely determined once I know what it does to each of the standard basis vectors.

1

1

YouTube link: http://www.youtube.com/watch?v=8BFsz1FCdxM

33

13

Matrices

Matrices are a way to represent linear maps. To make writing a linear map a little less cumbersome, we will develop a compact notation for linear maps using our previous observation that a linear map is determined by its action on the standard basis vectors. Definition 1 An m × n matrix is an array of numbers which has m rows and n columns. The numbers in a matrix are called entries. When A is a matrix, we write A = (aij ), meaning that ai,j is the entry in the ith row and j th column of the matrix. Note: We start counting with 1 not 0. So the upper lefthand entry of the matrix is a1,1 .   1 −1 4 is an n × m matrix. Question 2 The matrix A = 2 3 −5 Solution Hint:

Note that this is n × m whereas the definition above used m × n.

Hint:

n is the number of rows, and m is the number of columns

Hint:

n = 3 and m = 2

In this case, n is 3. Solution

And m is 2.

Remember, we write ai,j for the entry in the ith row and j th column of the matrix. Solution Hint:

a3,2 is the entry in the 3rd row and the 2nd column.

Hint:

a3,2 = −5

Therefore a3,2 is −5.

Next, suppose the 3 × 4 matrix B has bi,j = i + j. Solution Hint:

Question 3

Solution

Hint:

b1,2 = 1 + 2 = 3

According to this rule, b1,2 is 3 So the entry in the first row and second column of this matrix should be 3.

34

13 

Hint:

2 B = 3 4

3 4 5

4 5 6

Matrices

 5 6 7

What is B?

Definition 4 To each linear map L : Rn → Rm we associate a m × n matrix AL called the matrix of the linear map with respect to the standard coordinates. It is defined by setting ai,j to be the ith component of L(ej ). In other words, the j th column of the matrix AL is the vector L(ej ). Going the other way, we likewise associate to each matrix m × n matrix M a linear map LM : Rn → Rm by requiring that L(ej ) be the j th column of the matrix M.       3 0 1 = Question 5 The linear map L : R2 → R3 satisfies L = −5 and L 1 0 2   −1  1 . What is the matrix of L? 1 Solution   1 Hint: Remember that, by definition, the first column of this matrix should be L 0   0 and the second column should be L . 1 Hint:

The matrix of L is



3 −5 2

 −1 1 1

Let’s do another example. Question 6

 1 Suppose L is a linear map represented by the matrix A = 2 3

 −1 4 . −5

Solution Hint:

A should have one column for each basis vector of the domain.

Hint:

A has 2 columns, so the dimension of the domain is 2.

The dimension of the domain of L is 2. Solution Hint:

Each column of A is the image of a basis vector under the action of L

35

13

Matrices

Hint: 3.

Since the columns are of length 3, that means L is spitting out vectors of length

Hint:

The codomain of L is R3 which is 3 dimensional.

The dimension of the codomain of L is 3.

  0 Suppose ~v = L . What is ~v ? 1 Solution Hint:

Remember that, by definition, the ith column of A is L(~ ei ).

Hint:

So, by definition, L

Hint:

    −1 0 So L = 4  1 −5

  0 is the second column of the matrix A. 1

  4 Suppose w ~ =L . What is w? ~ 5 Solution Hint: By definition of the matrix associated to a linear map, we know that L       1 −1 0 2 and L =  4 . 1 3 −5

  1 = 0

      4 1 0 Can you rewrite in terms of and so that you can use the linearity 5 0 1   4 of L to compute L ? 5 Hint:

Hint:

L

       4 1 0 =L 4 +5 5 0 1

Hint: L

36

       4 1 0 =L 4 +5 5 0 1     1 0 = 4L + 5L 0 1     1 −1 = 4 2 + 5  4  3 −5     4 −5 =  8  +  20  12 −25   −1 =  28  −13

13

What is L

Matrices

  x ? y

Solution Hint: By definition of the matrix associated to a linear map, we know that L       −1 1 0 2 and L =  4 . 1 −5 3

  1 = 0

      0 1 x so that you can use the linearity and in terms of Can you rewrite 1 0 y   4 of L to compute L ? 5 Hint:

Hint:

L

       x 1 0 =L x +y y 0 1

Hint: L

       x 1 0 =L x +y y 0 1     1 0 = xL + yL 0 1     1 −1 = x 2 + y  4  3 −5     x −y = 2x +  4y  3x −5y   x−y = 2x + 4y  3x − 5y

As an antidote to the abstraction, let’s take a look at a simplistic “real world” example. Question 7

In the local barter economy, there is an exchange where you can



trade 1 spoon for 2 apples and 1 orange,



trade 1 knife for 2 oranges, and



trade 1 fork for 3 apples and 4 oranges.

3 2 3 Model this   as a linear map from L : R → R , where the coordinates on R are   spoons  knives  and the coordinates on R2 are apples . oranges forks

37

13

Matrices

Solution Hint: Remember the matrix of a linear map is defined by the fact the the kth column of the matrix is the image of the kth standard basis vector.   1 0 represents one spoon in the codomain. Its image under this linear map is Hint: 0   2 2 apples and 1 orange, which is represented by the vector in the codomain. So the 1   2 first column of the matrix should be 1 Hint:

The full matrix is



2 1

0 2

3 4



What is the matrix of the linear map L? Solution Hint:        3 1 0 L 0 = L 3 0 + 4 0 4 0 1      1 0 = 3L 0 + 4L 0 0 1     2 3 =3 +4 1 4     6 12 = + 3 16   18 = 19

(1)

(2)

(3) (4) (5)

So you would be able to get 18 apples and 19 oranges. Hint: Now the “5 year old” solution: If you have 3 spoons, 0 knives, and 4 forks, and you traded them all in for fruit, how many apples would you have? Hint: 3 spoons would get you 6 apples, and 4 forks get you 12 apples, so you would have a total of 18 apples.    3 The first (“apples”) entry of L 0 is 18. 4 Try to answer this question both by applying the matrix to the vector, but also as a 5 year old would solve it.

38

13

Matrices

Prove the following statement: if S : Rn → Rm and T : Rn → Rm are both linear maps, then the map (S + T ) : Rn → Rm defined by (S + T )(~v ) = S(~v ) + T (~v ) is also linear. We need to check that (S + T ) respects both scalar multiplication and vector addition. Scalar multiplication: Choose and arbitrary scalar c ∈ R and an arbitrary vector ~v ∈ Rn . Then (S + T )(c~v ) = S(c~v ) + T (c~v ) by definition of (S + T ) = cS(~v ) + cT (~v ) by the linearity of S and T = c (S(~v ) + T (~v )) by the distributivity of scalar multiplication over addition in Rm = c(S + T )(~v ) by definition of (S + T ) Vector addition: Choose two arbitrary vectors ~v and w ~ in Rn . Then (S + T )(~v + w) ~ = S(~v + w) ~ + T (~v + w) ~ by definition of S + T = S(~v ) + S(w) ~ + T (~v ) + T (w) ~ by the linearity of S and T = S(~v ) + T (~v ) + S(w) ~ + T (w) ~ by the commutativity of vector addition in Rm = (S + T )(~v ) + (S + T )(w) ~ by the definition of S + T. Prove that if T : Rn → Rm is a linear map and c ∈ R is a scalar, then the map cT : Rn → Rm , defined by (cT )(~v ) = cT (~v ) is also a linear map. We need to check that cT respects both scalar multiplication and vector addition. Scalar multiplication: Choose and arbitrary scalar a ∈ R and an arbitrary vector ~v ∈ Rn . Then (cT )(a~v ) = cT (a~v ) = acT (~v ) = a(cT )(~v ) Vector addition: Choose two arbitrary vectors ~v and w ~ in Rn . Then (cT )(~v + w) ~ = cT (~v + w) ~ = c (T (~v ) + T (w)) ~ = cT (~v ) + cT (w) ~ = (cT )(~v ) + (cT )(w) ~ Observation 8 The last two exercises show that we have a nice way to both add linear maps and multiply linear maps by scalars. So linear maps themselves “feel” a bit like vectors. You do not have to worry about this now, but we will see that the linear maps from Rn → Rm form an “abstract vector space.” Much of the power of linear algebra is that we can apply linear algebra to spaces of linear maps!

39

14

Composition

The composition of linear maps can be computed with matrices. Prove that if S : Rn → Rm is a linear map, and T : Rm → Rk is a linear map, then the composite function T ◦ S : Rn → Rk is also linear. We need to show that T ◦ S respects scalar multiplication and vector addition: Scalar multiplication: For every scalar a ∈ R and every vector ~v ∈ Rn , we have: (T ◦ S)(a~v ) = T (S(a~v )) = T (aS(~v )) because S respects scalar multiplication = aT (S(~v )) because T respects scalar multiplication = a(T ◦ S)(~v ) Vector addition: For every two vectors ~v , w ~ ∈ Rn , we have: (T ◦ S)(~v + w) ~ = T (S(~v + w)) ~ = T (S(~v + S(w)))because ~ S respects vector addition = T (S(~v )) + T (S(w))because ~ T respects vector addition = (T ◦ S)(~v ) + (T ◦ S)(w) ~ 

2 Question 1 Suppose the matrix of S is MS = −1   −1 −1 2 . T is MT =  0 −1 1

 0 −1 and the matrix of 1 1

Solution Remember that the matrix for S ◦ T will have columns given by (S ◦ T )   0 and (S ◦ T ) 1 Hint:

Hint:

Question 2

  1 0

Solution

Hint:      1 1 (S ◦ T ) =S T 0 0      −1 1 is the first column of the matrix of T = S  0  because by definition, T 0 −1      1 0 = −1S 0 + −1S 0 by the linearity of S 0 1     2 −1 = −1 + −1 because ??? −1 1   −1 = 0

40

14

What is (S ◦ T ) Question 3

Composition

  1 ? 0

Solution

Hint:      0 0 =S T (S ◦ T ) 1 1      −1 0 is the second column of the matrix of T = S  2  because by definition, T 1 1         0 0 1 = −1S 0 + 2S 1 + S 0 by the linearity of S 1 0 0       2 0 −1 = −1 +2 + because ??? −1 1 1   −3 = 4   0 What is (S ◦ T ) ? 1  Hint:

The matrix of (S ◦ T ) is

−1 0

−3 4



What is the matrix of S ◦ T ? Solution   1 Hint: Remember that the matrix for T ◦S will have columns given by (T ◦S) 0, 0     0 0 (T ◦ S) 1 and (T ◦ S) 0 0 1 Hint:

Question 4

Solution

Hint:      1 1 (T ◦ S) 0 = T S 0 0 0      1 2 =T because by definition, S 0 is the first column of the matrix of S −1 0     1 0 = 2T + −1T by the linearity of T 0 1     −1 −1 = 2  0  + −1  2  because ??? −1 1   −1 = −2 −3

41

14

Composition

  1 What is (T ◦ S) 0? 0 Question 5

Solution

Hint:      0 0 (T ◦ S) 1 = T S 1 0 0     1 0 because by definition, S 0 is the first column of the matrix of S =T 1 0     −1 0 =  2  we got lucky: by definition T is the second column of the matrix of T 1 1

  0 What is (T ◦ S) 1? 0 Question 6

Solution

Hint:      0 0 (T ◦ S) 0 = T S 0 1 1      0 −1 =T because by definition, S 0 is the third column of the matrix of S 1 1     1 0 = −1T +T by the linearity of T 0 1     −1 −1 = −1  0  +  2  because ??? −1 1   0 =  2 2   0 What is (T ◦ S) 0? 1



Hint:

−1 The matrix of (T ◦ S) is −2 −3

What is the matrix of T ◦ S?

42

−1 2 1

 0 2 2

14

Composition

Definition 7 If M is a m × n matrix and N is a k × m matrix, then the product N M of the matrices is defined as the matrix of the composition of the linear maps defined by M and N . In other words, N M is the matrix of LN ◦ LM . Warning 8 You may have seen another definition for matrix multiplication in the past. That definition could be seen as a shortcut for how to compute the product, but it is usually presented devoid of mathematical meaning. Hopefully our definition seems properly motivated: matrix multiplication is just what you do to compose linear maps. We suggest working out the problems here using our definition: you will develop your own efficient shortcuts in time. You have already multiplied two matrices, even though you didn’t know it, above. Take some time now to get a whole lot of practice. You do not need us to prompt you: invent your own matrices and try to multiply them, on paper. What condition is needed on the rows and columns of the two matrices for matrix multiplication to even make sense? You can check your work using a computer algebra system, like SAGE1 or you can use a free web hosted app like Reshih2 . Use our definition, and think through it each time. Try to get faster and more efficient. Eventually you should be able to do this quite rapidly.   1 2 Question 9 Suppose B = . Find a 2 × 2 matrix A so that AB 6= BA. Play 3 4 around! Can you find more than one? Solution Hint: There is no systematic way to answer this question: you just have to play around, and see what you discover! Hint:

Question 10 Solution      1 2 1 0 1 0 Hint: = 3 4 0 0 3 0    1 2 1 0 What is ? 3 4 0 0 Question 11 Solution     1 0 1 2 1 Hint: = 0 0 3 4 0    1 0 1 2 What is ? 0 0 3 4

2 0



A matrix that doesn’t commute with B is 1 2

http://www.sagemath.org/ http://matrix.reshish.com/

43

14

Composition

Question 12 Hint:

Solution

Try some simple matrices. Maybe limit yourself to 2 × 2 matrices?

    y x . Applying this = 0 y twice to any vector would give you the zero vector. This linear map is great for cooking up counterexamples to all sorts of naive things you might think about matrices! See this Mathoverflow answer3 (you will understand more and more of these terms as the course progresses). Hint:

One simple linear map which would work is L



 0 1 0 0 What is the matrix of the example linear map L?

Question 13

Hint:

The matrix of L is

Find A 6= 0 with AA = 0. (Note: such a matrix is called “nilpotent”)

 2 If A = 3

Question 14

 8 , find v 6= 0 with Av = ~0. 12

Solution Hint:

Let ~v =

  x , and solve a system of equations y

Hint:



A(~v ) = ~0     2 8 x 0 = 3 12 y 0     2x + 8y 0 = 3x + 12y 0

Hint: Both of these conditions (2x + 8y = 0 and 3x + 12y = 0) are saying the same thing: x = −4y.  Hint:

So

Question 15

 −4 works, for example. 1

 1 If A = 2

   3 0 , find ~v with A~v = . 4 8

Solution 3 http://mathoverflow.net/questions/16829/what-are-your-favorite-instructional-counterexamples/ 16841#16841

44

14

Hint:

Let ~v =

Composition

  x and solve a system of equations. y

Hint:   0 A~v = 8      1 3 x 0 = 2 4 y 8     x + 3y 0 = 2x + 4y 8

Hint: (

x + 3y = 0 2x + 4y = 8

(

x + 3y = 0 x + 2y = 4

(

x + 3y = 0 y = −4

(

x = 12 y = −4

In the last two exercises, you found that solving matrix equations is equivalent to solving systems of linear equations.   (   x 4x + 7y + z = 3 3   Question 16 Rewrite as A y = . 2 −x + 8y − z = 2 z Solution  Hint:

A=

4 −1

7 8

1 −1



45

15

Python

Build up some linear algebra in python.   1 Exercise 1 We will store a vector as a list. So the vector 2 will be stored as 3 [1,2,3]. Let’s try to write some Python code for working with lists as if they were vectors. Solution Hint:

This was discussed on http: // stackoverflow. com/ questions/ 14050824/ add-sum-of-values-of-two-lists

Write a “vector add” function. Your function may assume that the two vectors have the same number of entries. 1 2 3 4 5

Python # write a function vector_sum(v,w) which takes two vectors v and w, # and returns the sum v + w. # # For example, vector_sum([1,2], [4,1]) equals [5,3] #

6 7 8 9

def vector_sum(v,w): # your code here return # the sum v+w

10 11 12 13 14 15 16 17

def validator(): # It would be better to try more cases if vector_sum([-5,23],[10,2])[0] != 5: return False if vector_sum([1,5,6],[2,3,6])[1] != 8: return False return True

18

Solution Hint:

Try a Python “list comprehension”

Hint:

For example, return [alpha * x for x in v]

Next, write a scalar multiplication function. 1 2 3 4

Python # write a function scale_vector(alpha, v) which takes a number alpha and a vector v # and returns alpha * v # # For example, scale_vector(5,[1,2,3]) equals [5,10,15]

5 6 7 8

def scale_vector(alpha, v): # your code here return # the scaled vector alpha * v

46

15

Python

9 10 11 12 13 14 15 16

def validator(): # It would be better to try more cases if scale_vector(-3,[2,3,10])[1] != -9: return False if scale_vector(10,[4,3,2,1])[2] != 20: return False return True

17

Let’s write a dot product function. Solution 1 2 3 4

Python # Write a function dot_product(v,w) which takes two vectors v and w, # and returns the dot product of v and w. # # For example, dot_product([1,2],[0,3]) is 6.

5 6 7 8

def dot_product(v,w): # your code here return # the dot product "v dot w"

9 10 11 12 13 14 15

def validator(): if dot_product([1,2],[-3,5]) != 7: return False if dot_product([0,4,2],[2,3,-7]) != -2: return False return True

And we will store a matrix as a list of lists. For example the list [[1,3,5],[2,4,6]] will represent the matrix   1 3 5 . 2 4 6 Note that there are two different conventions that we could have chosen: the innermost lists could be the rows, or the columns. There are good reasons to have chosen the opposite convention: after all, when thinking of a matrix as a linear map, we should be paying attention to the columns, since the ith column tells us what the corresponding linear map does when applied to ~ei . Nevertheless, the innermost lists are rows in our chosen representation. This way, to talk about the entry mij , we write m[i][j]. Had we made the other choice, the mij entry would have been accessed by writing j and i in the other order. This is also the same convention used by the computer algebra system, Sage. Exercise 2

Write a “matrix multiplication” function.

Solution

47

15

1 2

Python

Python # write a function multiply(A,B) which takes two matrices A and B stored in the above format, # and returns the matrix of their product

3 4 5 6

def multiply(A,B): # your code here return # the product AB

7 8 9 10 11 12 13 14 15 16 17 18 19

def validator(): # It would be better to try more cases a = [[-2, 0], [-2, -3], [-1, 3]] b = [[-3, 2, -1, -2], [3, 2, 1, 3]] result = multiply(a,b) if (len(result) != 3): return False if (len(result[0]) != 4): return False if (result[2][1] != 4): return False return True

Fantastic! Next, let’s think more about how matrices and linear maps are related. Solution Hint: Warning 3 Hint:

This is a function whose output is a function.

Try using lambda.

Write a function matrix_to_function which takes a matrix ML representing the linear map L, and returns a Python function. The returned Python function should take a vector ~v and send it to L(~v ). 1

Python # For example, if M = [[1,2],[3,4]], then matrix_to_function(M)([0,1]) should be [2,4]

2 3 4 5

def matrix_to_function(M): #your code here return # the function which sends v to M(v)

6 7 8 9 10 11 12

def validator(): if matrix_to_function([[-3,2,4],[5,-7,2]])([5,3,2])[0] != -1: return False if matrix_to_function([[4,3],[2,-1],[-5,3]])([2,-4])[2] != -22: return False return True

Now you can go back and check—for some examples of A, B, and ~v —that the following is true: matrix_to_function(A)(matrix_to_function(B)(v)) is the same as matrix_to_function(multiply(A,B))(v). 48

15

Python

Solution Now let’s go the other way. Write a function function_to_matrix which takes a Python function f—assumed to be a linear map from R2 to R2 —and returns the 2 × 2 matrix representing that linear map. Python 1 2 3 4 5 6

# For example if you had defined # # def L(v): # return [2*v[0]+3*v[1], -4*v[0]] # # Then function_to_matrix(L) is

7 8 9

# You may assume that L takes [x,y] to another list with two entries # and you may assume that L is linear

10 11 12 13

def function_to_matrix(L): #your code here return # the matrix

14 15 16 17 18 19 20 21 22 23 24 25

def validator(): M = function_to_matrix( lambda v: [3*v[0]+5*v[1], -2*v[0] + 4*v[1]] ) if (M[0][0] != 3): return False M = function_to_matrix( lambda v: [2*v[0]-3*v[1], -7*v[0] - 5*v[1]] ) if (M[1][0] != -7): return False M = function_to_matrix( lambda v: [v[0]+7*v[1], 3*v[0] - 2*v[1]] ) if (M[1][1] != -2): return False return True

Great work! If you like, you can try to compute function_to_matrix(matrix_to_function(M)). You should get back M .

49

16

An inner product space

The dot product provides a way to compute lengths and angles. In order to do geometry in Rn , we will want to be able to compute the length of a vector, and the angle between two vectors. Miraculously, a single operation will allow us to compute both quantities.

50

17

Covectors

A covector eats vectors and provides numbers. Definition 1 A covector on Rn is a linear map from Rn → R. As a matrix, it is a single row of length n.   2 −1 3 is the matrix of a covector on R3 . Example 2 Question 3

Solution     3 2 −1 3 5 = 2(3) + −1(5) + 3(7) = 22 7    3 3 5 =22 7

Hint:

 2

−1

Now we can do this a bit more abstractly. 

Hint:



x

y

x

y

   a z  b  = ax + by + cz c

   a z  b  = ax + by + cz c

There is a natural way to turn a vector into a covector, or a covector into a vector: just turn the matrix 90◦ one direction or the other!   x1  x2    Definition 4 We define the transpose of a vector v =  .  to be the covector  ..  xn   v with matrix x1 x2 · · · xn .  Similarly we define the transpose of a covector ω : x1 x2   x1  x2    the vector ω > with matrix  . .  ..  xn >

Question 5

  1 Suppose ~v = 4. What is (~v > )> ? 3

Solution (a)

  1 (~v > )> = 4 3

X

51

···

xn



to be

17

Covectors  (~v > )> = 1

(b)

4

3



> > > > Indeed, (~ v ) = ~v and(ω ) = ω for any vector ~v and covector ω. 5 2 Let v = 3 and w = −2 1 7

Solution  2 1 −2 = 5(2) + 3(−2) + 1(7) = 11 7 

Hint:

>



v (w) = 5

3



v > (w) = 11? Solution Hint: 

 2   w(v ) = −2 5 3 1 7   10 6 2 = −10 −6 −2 35 21 7 >

What is wv > ?

52

18

Dot product

The standard inner product is the dot product. Definition 1 Given two vectors ~v , w ~ ∈ Rn , we define their standard inner product > h~v , wi ~ by h~v , wi ~ = ~v (w) ~ ∈ R. We sometimes use the notation ~v · w ~ for h~v , wi, ~ and call the operation the dot product. Warning 2 Note that ~v > (w) ~ 6= w(~ ~ v > ): one is a number, while the other is an n × n matrix. Question 3

Make sure for yourself, by using the definition, that     y1 x1  x2   y2       ..  ·  ..  = x1 y1 + x2 y2 + x3 y3 + · · · + xn yn .  .   .  yn

xn

Prove the following facts about the dot product. ~u, ~v , w ~ ∈ Rn and a ∈ R (a)

~v · w ~ =w ~ · ~v (The dot product is commutative)

(b)

(~u + ~v ) · w ~ = ~u · w ~ + ~v · w ~ and (a~v ) · w ~ = a(~v · w) ~ (The dot product is linear in the first argument)

(c)

~u · (~v + w) ~ = ~u · ~v + ~u · w ~ and ~v · (aw) ~ = a(~v · w) ~ (The dot product is linear in the second argument)

(d)

~v · ~v ≥ 0 (We say that the dot product is “positive definite”)

(e)

if ~v · ~z = 0 for all ~z ∈ Rn , then ~v = ~0 (The dot product is nondegenerate)

1. ~v · w ~ = v1 w1 + v2 w2 + ... + vn wn = w1 v1 + w2 v2 + ... + wn vn = w · v, so the dot product is commutative. (skipping item 2 for now) 3. ~u · (v + w) ~ = ~u> (v + w) ~ by definition = ~u> (v) + ~u> (w) ~ since ~u> : Rn → R is linear = ~u · v + ~u · w ~ by definition and ~u · (aw) ~ = ~u> (aw) ~ by definition = a~u> (w) ~ since ~u> : Rn → R is linear = a~u · w ~ by definition 53

18

Dot product

2. follows from 3 and 1 4. ~v · ~v = v12 + v22 + v32 + ... + vn2 , and the square of a real number is nonnegative, so the sum of these squares is also nonnegative. 5. is perhaps the trickiest fact to prove. Observe that if ~v · ~z = 0 for every ~z ∈ Rn , then this formula is true in particular for z = ~ej . But ~v · ~ej = vj . Thus, by dotting with all of the standard basis vectors, we see that every coordinate of ~v must be 0. Thus ~v is the zero vector The fact that the dot product is linear in two separate vector variables means that it is an example of a “bilinear form”. We will make a careful study of bilinear forms later in this course: it will turn out that the second derivative of a multivariable function gives a bilinear form at each point. So far, the inner product feels like it belongs to the realm of pure algebra. In the next few exercises, we will start to see some hints of its geometric meaning.   5 Question 4 Let v = . 1 Solution Hint:

h~v , ~v i = 52 + 12 = 26

h~v , ~v i = 26

  x Let’s think about this a bit more abstractly. Set v = . y Solution Hint:

h~v , ~v i = x2 + y 2

h~v , ~v i = x2 + y 2

Notice that the length of the line segment from (0, 0) to (x, y) is the Pythagorean theorem.

54

p x2 + y 2 by

19

Length

The inner product provides a way to measure the length of a vector. You should have discovered that v · v is the square of the length of the vector v when viewed as an arrow based at the origin. So far, you have only shown this in the 2-dimensional case. See if you can do it in three dimensions. √ Show   that the length of the line segment from (0, 0, 0) to (x, y, z) is ~v · ~v , where x ~v = y . z Until now, you may not have seen a treatment of length in higher dimensions. Generalizing the results above, we define: √ Definition 1 The length of a vector ~v ∈ Rn is defined by |v| = v · v.

Question 2

Solution

Question 3

Solution

  6 2  The length of the vector   = sqrt(62 + 22 + 32 + 12 ) 3 1

Hint:

By the Pythagorean theorem, we can see that the distance is

Hint:

We could also view this as the length of the vector

p (5 − 2)2 + (9 − 3)2

  3 which “points” from 6

(2, 3) to (5, 9). The distance between the points (2, 3) and (5, 9) is sqrt(32 + 62 )

Definition 4 The distance between two points p and q in Rn is defined to be the length of the “displacement” vector p~ − ~q. Question 5

Solution

Hint:

    5−2 3 6 − 7 1    The displacement vector between these points is  =  9 − 3 6 8−1 7

Hint:

The length of the displacement vector is

p 32 + 1 2 + 6 2 + 7 2

The distance between the points (2, 7, 3, 1) and (5, 6, 9, 8) is sqrt(32 + 1 + 62 + 72 )

Question 6 Write an equation for the sphere centered at (0, 0, 0, 0) in R4 of radius r using the coordinates x, y, z, w on R4 . Solution

55

19

Length

Hint: For a point p = (x, y, z, w) to be on the sphere of radius r centered at (0, 0, 0, 0), the distance from p to the origin must be r p

x2 + y 2 + z 2 + w2

Hint:

r=

Hint:

x2 + y 2 + z 2 + w 2 = r 2

x2 + y 2 + z 2 + w 2 = r 2

Question 7 Write an inequality stating that the point (x, y, z, w) is more than 4 units away from the point (2, 3, 1, 9) Solution Hint:

The distance between the point (x, y, z, w) and (2, 3, 1, 9) is

Hint:

So we need

p

p (x − 2)2 + (y − 3)2 + (z − 1)2 + (w − 9)2 .

(x − 2)2 + (y − 3)2 + (z − 1)2 + (w − 9)2 > 4

sqrt((x − 2)2 + (y − 3)2 + (z − 1)2 + (w − 9)2 ) > 4

Prove that |a~v | = |a||~v | for every a ∈ R. Warning 8 These two uses of | · | are distinct: |a| means the absolute value of a, and |~v | is the length of ~v .

|a~v | =

p

ha~v , a~v i by definition

p

a2 h~v , ~v i by the linearity of the inner product in each slot √ p = a2 h~v , ~v i =

= |a||~v |

56

20

Angles

Dot products can be used to compute angles. Question 1 Give a vector of length 1 which points in the same direction as ~v =   1 (i.e. is a positive multiple of ~v ). 2 Solution Hint: Remember that you just argued that |a~v | = |a|~v for any a ∈ R. What positive a could you choose to make |a||~v | = 1?

1 |~v |

Hint:

We need to take a =

Hint:

The length of ~v is

Hint:

 1  √   The vector  25  points in the same direction as ~v , but has length 1. √ 5

p √ 12 + 2 2 = 5

Now that we understand the relationship between the inner product and length of vectors, we will attempt to establish a connection between the inner product and the angle between two vectors. Do you remember the law of cosines? It states the following: Theorem 2 If a triangle has side lengths a, b, and c, then c2 = a2 +b2 −2ab cos(θ), where θ is the angle opposite the side with length c. Prove the law of cosines. You may want to read the lovely proof at mathproofs1 . You can find a beautiful proof here2 . We can rephrase this in terms of vectors, since geometrically if ~v and w ~ are vectors, the third side of the triangle is the vector w ~ − ~v . Theorem 3 For any two vectors v, w ∈ Rn , |w − v|2 = |w|2 + |v|2 − 2|v||w| cos(θ), where θ is the angle between v and w. (For you sticklers, this is really being taken as the definition of the angle between two vectors in arbitrary dimension.) Rewrite the theorem above by using our definition of length in terms of the dot product. Performing some algebra you should obtain a nice expression for v · w in terms of |v|, |w|, and cos(θ). 1 2

http://mathproofs.blogspot.com/2006/06/law-of-cosines.html http://mathproofs.blogspot.com/2006/06/law-of-cosines.html

57

20

Angles

|w − v|2 = |v|2 + |w|2 − 2|v||w| cos(θ) hw − v, w − vi = |v|2 + |w|2 − 2|v||w| cos(θ)

hw, w − vi − hv, w − vi = |v|2 + |w|2 − 2|v||w| cos(θ) by the linearity of the inner product in the first slot

hw, wi − hw, v − hv, wi + hv, vi = |v|2 + |w|2 − 2|v||w| cos(θ) by the linearity of the inner product in the second s |w|2 − 2hv, wi + |v|2 = |v|2 + |w|2 − 2|v||w| cos(θ) hv, wi = |v||w| cos(θ) You should have discovered the following theorem: Theorem 4 For any two vectors v, w ∈ Rn , v · w = |v||w| cos(θ). In words, the dot product of two vectors is the product of the lengths of the two vectors, times the cosine of the angle between them. This gives an almost totally geometric picture of the dot product: Given two vectors ~v and w, ~ |~v cos(θ)| can be viewed as the length of the projection of ~v onto the line containing w. ~ So |~v ||w| ~ cos(θ) is the “length of the projection of ~v in the direction of w ~ times the length of w”. ~ As mentioned above, this theorem is really being used to define the angle bev·w tween two vectors. This is not quite rigorous: how do we even know that is |v||w| even between −1 and 1, so that it could be the cosine of an angle? This is clear from the “Euclidean Geometry” perspective, but not as clear from the “Cartesian Geometry” perspective. To make sure that everything is okay, we prove the “Cauchy-Schwarz” theorem which reconciles these two worlds.

58

21

Cauchy-Schwarz

The Cauchy-Schwarz inequality relates the inner product and the norm of the two vectors. This is the Cauchy-Schwarz inequality. Theorem 1

|v · w| ≤ |v||w| for any two vectors v, w ∈ Rn

Proof If ~v or w ~ is the zero vector, the result is trivial. So assume ~v 6= ~0 and ~ w ~ 6= 0 Start by noting that hv − w, v − wi ≥ 0. Expanding this out, we have: hv, vi − 2hv, wi + hw, wi ≥ 0 2hv, wi ≤ hv, vi + hw, wi Now, if ~v and w ~ are unit vectors, this says that 2h~v , wi ~ ≤2 h~v , wi ~ ≤1 Now to prove the result for any pair of nonzero vectors, simply scale them to make them unit vectors: h

1 1 ~v , wi ~ ≤1 |~v | |w| ~ hv, wi ≤ |v||w| 

We are not quite done with the proof, because we have not proven that v · w ≥ −|v||w|. Following the same basic outline, try to prove the other half of this inequality below. Start by noting that hv + w, v + wi ≥ 0. Expanding this out, we have: hv, vi + 2hv, wi + hw, wi ≥ 0 2hv, wi ≥ −hv, vi + −hw, wi Now, if ~v and w ~ are unit vectors, this says that 2h~v , wi ~ ≥ −2 h~v , wi ~ ≥ −1 Now to prove the result for any pair of nonzero vectors, simply scale them to make them unit vectors: h

1 1 ~v , wi ~ ≥ −1 |~v | |w| ~ hv, wi ≤ −|v||w|

In the next question, we ask you to fill in the details of an alternative proof which, while a little harder than the one above, is at least as beautiful. 59

21

Cauchy-Schwarz

Question 2

Start by noting that hv −w, v −wi ≥ 0. Expanding this out, we have: hv, vi − 2hv, wi + hw, wi ≥ 0 2hv, wi ≤ hv, vi + hw, wi

Now notice that the left hand side is unaffected by scaling v by a scalar λ and 1 w by , but the right hand side is! This allows us to breathe new life into the λ inequality: we know that for every scalar λ ∈ (0, ∞) 1 |w|2 λ2 This is somewhat miraculous: we have a stronger inequality than the one we started with “for free.” This new inequality is strongest when the right hand side (RHS) is minimized. As it stands the RHS is just a function of one real variable λ. hv, wi ≤ λ2 |v|2 +

Solution Hint:

We can minimize the right hand side using single variable calculus.

Hint:

Let f (λ) = λ2 |v|2 +

Then f 0 (λ) = 2λ|v|2 − 2

1 |w|2 . λ2

|w|2 λ3

The minimum must occur where f 0 vanishes Hint: f 0 (λ) = 0 |w|2 =0 λ3 4 2 λ |v| = |w|2 s |w| λ= |v|

2λ|v|2 − 2

Hint:

You can type |w| by writing abs(w).

The value of λ which minimizes the left hand side is sqrt(abs(w)/abs(v))

Conclude that the Cauchy-Schwarz theorem is true! Credit for this beautiful line of reasoning goes to Terry Tao at this blog post1 .

Question 3 Hint: 1

60

Solution

We know that ~v · w ~ = |~v ||w| ~ cos θ

https://terrytao.wordpress.com/2007/09/05/amplification-arbitrage-and-the-tensor-power-trick/

21

Hint:

    1 2 3 · 1 = 2(1) + 3(1) + 1(1) = 6 1 1

Hint:

|~v | =

Hint:

|w| ~ =

Hint:

Thus, 6 =

Cauchy-Schwarz

√ √ ~v · ~v = 14 √

w ~ ·w ~ =

√ 3

√ √ 14 3 cos(θ)

6 Therefore, θ = arccos( √ ) 42     2 1 The angle between the vectors ~v = 3 and w ~ = 1 is arccos(6/(sqrt(14)*sqrt(3))) 1 1 Hint:

This problem probably would have stumped you before you started this activity!

Question 4

  2 Find a vector which is perpendicular to w ~ = 3. 1

Solution  For ~v to be perpendicular to ( 2, 3, 1), we would need that the angle between π −π ±π ~v and w ~ is (or ). In either case ~v · w ~ = |~v ||w| ~ cos( ) = 0 So we need to find a 2 2 2 vector for which ~v · w ~ =0 Hint:

Hint:

  x Let ~v = y . Then z ~v · w ~ =0     x 2 y  · 3 = 0 z 1 2x + 3y + z = 0

Hint: There are a whole lot of choices for x, y, and z that fit these criteria (In fact there is an entire plane of vectors perpendicular to w) ~  Hint:

 0  1  works for instance. −3

61

21

Cauchy-Schwarz

Question 5

    2 5 Find a vector ~u which is perpendicular to both ~v = 3 and w ~ = 9 1 2

Solution Hint:

Hint:

We need both ~ u · ~v = 0 and ~ u·w ~ =0   x Letting ~ u = y , we have the conditions z ( 2x + 3y + z = 0 5x + 9y + 2z = 0

Hint: (

4x + 6y + 2z = 0 5x + 9y + 2z = 0

(

x + 3y = 0 5x + 9y + 2z = 0

Hint: Picking whatever you like for x, you should be able to find the other values now. Try x = 3.  Hint:

 3 −1 works. 3

Prove the “Triangle inequality”: For any two vectors ~v , w ~ ∈ Rn , |~v + w| ~ ≤ |~v | + |w|. ~ Draw a picture. Why is this called the triangle inequality? 2 The inequality is equivalent to |~v + w| ~ 2 ≤ ||~v | + |w|| ~ , which is easier to handle because it does not involve square roots. |~v + w| ~ 2 = h~v + w, ~ ~v + wi ~ = |v|2 + 2hv, wi + |w|2 ≤ |v|2 + 2|v||w| + |w|2 by the Cauchy-Schwarz inequality = (|v| + |w|)2

62

22 Multiplying matrices using dot products There is a quick way to multiply matrices using dot products

Question 1

 2  Let M = 4 1

   3 0 5, and ~e2 = 1. 2 0

Solution Hint: ~e> 2 M

  0 2 = 1 4 0 1   = 4 5

 3 5 2

~e> 2 M= nd Did you notice how multiplying by ~e> row of M ? 2 on the right selected the 2

Prove that if M is an m × n matrix and ej ∈ Rm is the j th standard basis vector of Rm , then e~j > M is the j th row of M . We know that w ~ = e~j > M is a covector th (row) just by looking at dimensions. What is the i entry of this row? Well, we can only figure that out by applying the map to the basis vectors. e~j > M e~i is the dot product of e~j with the ith column of M . But that just selects the j th element of that column. So the ith element of w ~ is the j t h element of the ith column of M . t This just says that w ~ is the j h column of M . (Whew.) Now we can use this observation to great effect. If M is an m × n matrix, ~ej is the standard basis of Rm and ~bk is the standard basis of Rn , then we can select ~ Mj,k by performing the operation ~e> j M bk . This is so important we will label it as a theorem: Theorem 2 If M is an m × n matrix, ~ej is the standard basis of Rm and ~bk is ~ the standard basis of Rn , then Mj,k = ~e> j M bk . Proof The proof is simply that M~bk is by definition the k th column of the th ~ matrix, and by our observation above ~e> row of that column j M bk must be the j vector, which consists of the single number Mi,j  Question 3

 4 1 Let M = 3 1

 −2 . 0

Solution Hint: of M

By the above theorem, it will be the entry in the 2nd row and the 1st column

63

22

Multiplying matrices using dot products

Hint:



0



0

  1 1 M 0 = 3 0 

  1 1 M 0 =3 0 

The philosophical import of this theorem is that we can probe the inner structure of any matrix with simple row and column vectors to find out every component of the matrix. What happens when we apply this insight to a product of matrices?   −1 1 Question 4 Let A =  2 2 and B =]. [Let C = AB. 3 0 Solution

Hint:

Hint:

Hint:

 By the theorem above, C2,3 = 0

 So C2,3 = 0

 But 0

1

1

 So 0

Hint:

 Thus C2,3 = 2

1

  0  0   0 AB   1 0

 0 A is the 2nd

Hint:

1

  0  0  0 C  1 0

  0 A= 2

2

  0 0 rd  row of A, and B  1 is the 3 column of B 0

  0    0  = 1 2 and B   1 9 0

   1 = 2(1) + 2(9) = 20 9

Without computing the whole matrix C, can you find C2,3 = 20

Wow! So it looks like we can find the entries of a product of two matrices just by looking at the dot product of rows of the first matrix with columns of the second matrix!

64

22

Multiplying matrices using dot products

Theorem 5 Let A and B be composable matrices. Let C = AB. Then Ci,j is the product of the ith row of A with the j th column of B Prove this theorem We can prove this by combining the other two theorems in this section. Ci,j = e~i > C~ej by the second theorem. But C = AB, so we have Ci,j = e~i > AB~ej . By the first theorem e~i > A is the ith row of A, and by our definition of matrix multiplication, B~ej is the j th column of B. So Ci,j is the product of the ith row of A with the k th column of B. Now try multiplying some matrices of your choosing using this method. This is likely the definition of matrix multiplication you Xlearned in high school (or the same thing defined by some messy formula with a ). Do you prefer this method? Or do you prefer whatever method you came up with on your own earlier? Maybe they are the same! Another note: it is interesting that we are feeding two vectors ei and ej into the matrix and getting out a number somehow. In week 4 we will learn that we are treading in deep water here: this is the very tip of the iceberg of bilinear forms, which are a kind of 2-tensor.

65

23

Limits

Limits are the difference between analysis and algebra Limits are the backbone of calculus. Multivariable calculus is no different. In this section we will deal with limits on an intuitive level. We will postpone the rigorous -δ analysis to the next section. Definition 1

Let f : Rn → Rm and let p ∈ Rn . We say that lim f (x) = L

x→p

for some L ∈ Rm if as x “gets arbitrarily close to ” p, the points f (x) “get arbitrarily close to L”. Definition 2 A function f : Rn → Rm is said to be continuous at a point p ∈ Rn if lim f (x) = f (p) x→p

Most functions defined by formulas are continuous where they are defined. For example, the function f (x, y) = (cos(xy + y 2 ), esin(x)+y + y 2 ) is continuous because each component function is a string of composites of continuous functions. f (x, y) = (xy, cos(x)/(x+y)) is continuous everywhere it is defined (it is not defined on the line y = −x, because the denominator of the second component function vanishes there). This is basically because all of the functions we have names for like cos(x), sin(x), ex , polynomials, rational functions, are all continuous, so if you can write down a function as a “single formula” it is probably continuous. The problematic points are basically just zeros of denominators, like our example above. Piecewise defined functions can also be problematic: ( 0 if x < y 2 Argue intuitively that the function f : R → R defined by f (x, y) = 1 if x ≥ y is continuous at every point off the line y = x, and is discontinuous at every point on the line y = x For any point p which is not on the line y = x, there is a little neighborhood of p where f is the constant function 0, which is known to be continuous. So f is continuous at p. For any point p on the line y = x, we get a different limit if we approach p along the line y = x (we get 1), versus approaching through points not on the line y = x (we get 0). Question 3 Hint:

Solution

Since x cos(π(x + y)) + sin(

at (1, 2).

Hint:

So

lim

πy ) is continuous, we can just evaluate the function 4

x cos(π(x+y))+sin(

(x,y)→(1,2)

lim (x,y)→(1,2)

x cos(π(x + y)) + sin(

πy π2 ) = 1·cos(π(1+2))+sin( ) = −1+1 = 0 4 4

πy )=0 4

66

23

Limits

x2 + xy , this is actually a little (x,y)→(0,0) x + y bit interesting. The function is not continuous at 0, because it is not even defined at 0. What is more, the numerator and denominator are both approaching 0, which each ”pull” the limit in opposite directions. (Dividing by smaller and smaller numbers would tend to make the value larger and larger, while multiplying by smaller and smaller numbers has the opposite effect) There are essentially two ways to work with this: If we are confronted with a limit like

lim



show that it does not have a limit by finding two different ways of approaching (0, 0) which give different limiting values, or



show that it does have a limit by rewriting the expression algebraically as a continuous function, and just plug in to get the value of the limit.

Question 4

Consider

x2 + xy . (x,y)→(0,0) x + y lim

Solution Hint:

This limit does exist, because it can be rewritten as a continuous function.

Do you think the limit exists? (a)

Yes

(b)

No

X

Solution Hint:

lim (x,y)→(0,0)

Hint:

lim (x,y)→(0,0)

lim (x,y)→(0,0)

x(x + y) x2 + xy = lim (x,y)→(0,0) (x + y) x+y x(x + y) = lim x=0 (x,y)→(0,0) (x + y)

x2 + xy =0 x+y

Question 5

Consider

x2 − 9 . (x,y)→(3,3) xy − 3y lim

Solution Hint:

This limit does exist, because it can be rewritten as a continuous function.

Do you think the limit exists? (a)

Yes

(b)

No

X

Solution Hint:

lim (x,y)→(3,3)

(x − 3)(x + 3) x2 − 9 = lim (x,y)→(3,3) xy − 3y y(x − 3)

67

23

Limits

Hint:

lim (x,y)→(3,3)

lim (x,y)→(3,3)

(x − 3)(x + 3) x+3 3+3 = lim = =2 (x,y)→(3,3) y(x − 3) y 3

x2 − 9 =2 xy − 3y

Question 6

Let f : R2 → R2 be defined by f (x, y) = (

x2 y − 4y , xy) x−2

Solution Hint:

We can consider the limit component by component

Hint: lim (x,y)→(2,2)

(x − 2)(x + 2)y x2 y − 4y = lim (x,y)→(2,2) x−2 x−2 =

lim

y(x + 2)

(x,y)→(2,2)

= 2(2 + 2) =8

Hint:

lim

xy = 2(2) = 4, since xy is continuous.

(x,y)→(2,2)

Hint:

  8 Format your answer as 4

Writing your answer as a vertical vector, what is

lim

f (x, y)?

(x,y)→(2,2)

Question 7

Consider

lim (x,y)→(0,0)

x y

Solution Hint: Think about approaching (0, 0) along the line x = 0 first, and then along the line x = y 0 , this is just the limit of the constant 0 function. So y the function approaches the limit 0 along the line x = 0 Hint:

If we look at

lim

(0,y)→(0,0)

t , this is just the limit of the constant 1 function. So t the function approaches the limit 1 along the line y = x

Hint:

If we look at

lim

(t,t)→(0,0)

Hint:

So the limit does not exist.

Do you think the limit exists?

68

23

(a)

Yes

(b)

No

Limits

X

The last example showcased how you could show that a limit does not exist by finding two different paths along which you approach different limiting values. Let’s try another example of that form Question 8

Solution

Hint:

On the line y = kx, we have f (x, y) = f (x, kx) =

Hint:

So we have lim

x→0

1+k+x x + kx + x2 = . x − kx 1−k

1+k+x 1+k = 1−k 1−k

The limit of f : R2 → R defined by f (x, y) =

x + y + x2 as (x, y) → (0, 0) along the line x−y

y = kx is (1 + k)/(1 − k)

The last two questions may have given you the idea that if a limit does not exist, it must be because you get a different value by approaching along two different lines. This is not always the case. Consider the function ( 1 if y = x2 f (x, y) = 0 if y 6= x2 Through any line containing the origin, f approaches 0 as points get closer and closer to (0, 0), but as points approach (0, 0) along the parabola y = x2 , f approaches 1. So the limit lim f (x, y) does not exist, even though the limit (x,y)→(0,0)

along each line does. Here is a more ”natural” example of such a phenomenon (defined by a single formula, not a piecewise defined function): f (x, y) =

x2 y x4 + y 2

Along each line y = kx, we have f (x, y) =

kx3 kx3 , so lim = x→0 x4 + k 2 x2 x4 + k 2 x2

k = 0. On the other hand, along the parabola y = x2 , we have x + k 2 x−1 x4 1 1 f (x, y) = = where the limit is . So even though the limit along all lines 4 2x 2 2 through the origin is 0, the limit does not exist. lim

x→0

69

24

The formal definition of the limit

Limits are defined by formalizing the notion of closeness.

This optional section explores limits from a formal and rigorous point of view. The level of mathematical maturity required to get through this section is much higher than others. If you get through it and understand everything, you can consider yourself “hardcore.” Definition 1 Let U ⊂ Rn . The closure of U , written U is defined to be the set of all p ∈ Rn such that every solid ball centered at p contains at least one point of U. Symbolically, U = {p ∈ Rn : for all r > 0 there exists x ∈ U so that |x − p| < r} Prove that U ⊂ U for any subset U of Rn . Let p ∈ U . Then for every r > 0, p is an element of U whose distance to p is less than r. In other words, since every solid ball centered at p must contain p, and p is in U , then p must be in the closure of U . So p ∈ U . Prove that the closure of the open unit ball is the closed unit ball. That is, show that if U = {x : |x| < 1}, then U = {x : |x| ≤ 1}. Let B = {x : |x| ≤ 1}. We need to see that U = B. It is easy to see that B ⊂ U , since for each point p ∈ B, either p ∈ U (in which case it is in the closure), or |p| = 1. In this case, for every r > 0, the point 1 q = p − rp is in U and satisfies |p − q| < r. 2 |p| − 1 On the other hand, if |p| > 1, then a solid ball of radius centered at p 2 will not intersect U . So we are done. Definition 2 Let f : U → V with U ⊂ Rn , V ⊂ Rm and p ∈ U . We say that lim f (x) = L if for every  > 0 we can find a δ > 0 so that if 0 < |x − p| < δ and

x→p

x ∈ U , then |f (x) − L| < . Definition 3 Let f : U → V with U ⊂ Rn and V ⊂ Rm . We say that f is continuous at p ∈ U if lim f (x) = f (p). x→p

Prove, using the -δ definition of the limit, that f : R2 → R defined by f (x, y) = xy is continuous everywhere. Let p = (a, b). Let  > 0 be given. Without loss of generality, assume a, b ≥ 0. We work ”backwards”: 70

24

The formal definition of the limit

|xy − ab| <  ⇐= |[(x − a) + a][(y − b) + b] − ab| <  ⇐= |(x − a)(y − b) + a(y − b) + b(x − a)| <  ⇐= |x − a||y − b| + a|y − b| + a|x − a| <  by the triangle inequality    |x − a||y − b| <    3  ⇐= a|y − b| < 3    b|x − a| <  3  Now it is easy to arrange that a|y − a| and b|x − a| are less than . If a = 0, or 3 b = 0, you do not have to do anything to get that condition satisfied, but otherwise    |x − a| ≤ is implied by |(x, y) − (a, b)| ≤ √ and |y − a| ≤ is implied by 3b 3a 3b 2  |(x, y) − (a, b)| ≤ √ . 3a 2 √   |x − a||y − b| < is implied by |(x, y) − (a, b)| ≤ √ . So if we let δ = 3 3 √    min( √ , √ , √ ) we are done. 3b 2 3a 2 3 Of course, this fact—that (x, y) 7→ xy is continuous—is something you probably believe intuitively. “Wiggling two numbers by a little bit doesn’t affect their product by very much.” Making that intuition precise obviously took some work in this activity.

71

25

Single variable derivative, redux

The derivative is the slope of the best linear approximation. Our goal is to define the derivative of a multivariable function, but first we will recast the derivative of a single variable function in a manner which is ripe for generalization. The derivative of a function f : R → R at a point x = a is the “instantaneous rate of change” of f (x) with respect to x. In other words, f (a + ∆x) ≈ f (a) + f 0 (a)∆x. This is really the essential thing to understand about the derivative. Question 1

Let f be a function with f (3) = 2, and f 0 (3) = 5.

Solution Hint:

f (3.01) ≈ f (3) + f 0 (3)(0.01)

Hint:

≈ 2 + 5(0.01)

Hint:

≈ 2.05

Then f (3.01) ≈ 2.05

Question 2

Let f be a function with f (4) = 2 and f (4.2) = 2.6.

Solution Hint:

f (4.2) ≈ f (4) + f 0 (4)(0.2)

Hint:

2.6 ≈ 2 + f 0 (4)(0.2)

Hint:

f 0 (4) ≈

Hint:

f 0 (4) ≈ 3

2.6 − 2 0.2

Then f 0 (4) ≈ 3

We have not made precise what we mean by the approximate sign. After all, if ∆x is small enough and f is continuous, f (a + ∆x) will be close to f (a), but we do not want to say that the derivative is always zero. We will make the ≈ sign precise by asking that the difference between the actual value and the estimated value goes to zero faster than ∆x goes to zero. 72

25

Single variable derivative, redux

Definition 3 Let f : R → R be a function, and let a ∈ R. f is said to be differentiable at x = a if there is a number m such that f (a + ∆x) = f (a) + m∆x + Errora (∆x) with lim

∆x→0

|Errora (∆x)| =0 |∆x|

. If f is differentiable at a, there is only one such number m, which we call the derivative of f at a. Verbally, m is the number which makes the error between the function value f (a + ∆x) and the linear approximation f (a) + m∆x go to zero “faster than ∆x” does. This definition looks more complicated than the usual definition (and it is!), but it has the advantage that it will generalize directly to the derivative of a multivariable function. Confirm that for f (x) = x2 , we have f 0 (2) = 4 using our definition of the derivative. f (2 + ∆x) = (2 + ∆x)2 = 22 + 2(2)∆x + (∆x)2 So we have f (2 + ∆x) = f (2) + 4∆x + Error(∆(x)), where Error(∆x) = (∆x)2

lim

∆x→0

Error(∆x) ∆x (∆x)2 ∆x→0 ∆x = lim (∆x) = lim

∆x→0

=0 Thus, f 0 (2) = 2(2) = 4, according to our new definition! Show the equivalence of our definition of the derivative with the “usual” definif (a + ∆x) − f (a) . tion. That is, show that the number m in our definition satisfies m = lim ∆x→0 ∆x This also shows the uniqueness of m. Let f be differentiable (in the sense above) at x = a, with derivative m. Then lim

∆x→0

|Errora (∆x)| =0 |∆x|

where Errora (∆x) is defined by f (a + ∆x) = f (a) + m∆x + Errora (∆x), i.e. Errora (∆x) = f (a + ∆x) − f (a) − m∆x. So 73

25

Single variable derivative, redux

|f (a + ∆x) − f (a) − m∆x| =0 |∆x| f (x + ∆x) − f (a) lim − m = 0 ∆x→0 ∆x

lim

∆x→0

But this implies that f (a + ∆x) − f (a) ∆x→0 ∆x So our definition of the derivative agrees with the “usual” definition m = lim

74

26

Multivariable derivatives

We introduce the derivative. The derivative in multiple variables requires a bit more machinery. 1

1

YouTube link: http://www.youtube.com/watch?v=LuDlwFeAv-I

75

27

Intuitively

The derivative is the linear map which best approximates changes in a function near a point. The single variable derivative allows us to find the best linear approximation to a function at a point. In several variables we will define the derivative to be a linear approximation which approximates the change in the values of a function. In this section we will explore what the multivariable derivative is from an intuitive point of view, without making anything too formal. We give the following wishy-washy “definition”: Definition 1 Let f : Rn → Rm be a function. Then the derivative of f at a point p ∈ Rn is the linear map D(f ) p : Rn → Rm which allows the following approximation property: f (p + ~h) ≈ f (p) + D(f ) p (~h) We will make the sense in which this approximation holds precise in the next section. Note: we also call the matrix of the derivative the Jacobian Matrix in honor of the mathematician Carl Gustav Jacob Jacobi1 . Let f : R2 → R3 be a function, and   suppose f (2, 3) = (4, 8, 9). −1 3 5 . Suppose that the matrix of D(f ) (2,3) is  4 2 −3

Question 2

Solution Hint:

By the defining property of  derivatives,  0.01 f (2.01, 3.04) ≈ f (2, 3) + D(f ) (2,3) 0.04

Hint:

   4 −1 = 8 +  4 9 2

Hint:

    4 0.01(−1) + 0.04(3) = 8 +  0.01(4) + 0.04(5)  9 0.01(2) + 0.04(−3)

Hint:

    4 0.11 = 8 +  0.24  9 −0.1

1

  3  0.01 5  0.04 −3

http://en.wikipedia.org/wiki/Carl_Gustav_Jacob_Jacobi

76

27

Hint:

Intuitively

  4.11 = 8.24 8.9

Approximate f (2.01, 3.04), giving your answer as a column matrix.

Question 3 Let f : R2 → R be a function with f (1, 2) = 3, f (1.01, 2) = 3.04 and f (1, 2.002) = 3.002. Solution Hint: Since f : R2 → R, D(f ) (1,2) : R2 → R, so the matrix of the derivative is a row of length 2.

Hint:

    1 0 To find the matrix, we need to see how D(f ) (1,2) acts on and 0 1

Hint:

f (1.01, 2) ≈ f (1, 2) + D(f ) (1,2)



0.01 0

 by the fundamental property of the

derivative

Hint:  0.01 0   1 0.04 ≈ 0.01D(f ) (1,2) by the linearity of the derivative 0   1 D(f ) (1,2) ≈4 0

3.04 ≈ 3 + D(f ) (1,2)



Hint:   0 f (1, 2.002) ≈ f (1, 2) + D(f ) (1,2) 0.02   0 3.002 ≈ 3 + D(f ) (1,2) 0.02   0 0.002 ≈ 0.002D(f ) (1,2) by the linearity of the derivative 1   0 D(f ) (1,2) ≈1 1

Hint:

 Thus the matrix of D(f ) (1,2) is 4

1



What is Jacobian matrix of f at (1, 2)? Solution

77

27

Intuitively

Hint:

f (0.9, 2.03) ≈ f (1, 2) + D(f ) (1,2)

Hint:

 =3+ 4

1



−0.1 0.03



   −0.1 0.03

Hint: = 3 + 4(−0.1) + 1(0.03) = 2.63 Using your approximation of the Jacobian matrix, f (0.9, 2.03) ≈ 2.63

This problem shows that if a function has a derivative, then only knowing how it changes in the coordinate directions lets you determine how it changes in any direction. This is so important it is worth driving it home: we only started with information about how f (1.01, 2) and f (1, 2.02) compared to f (1, 2), but because this function had a derivative, we could obtain the approximate value of the function at any near by point by exploiting linearity. This is powerful.

Prepare yourself : the following two paragraphs are going to be very difficult to digest. So far we have only talked about the derivative of a function at a point. The derivative of a function is actually a function which assigns a linear map to each point in the domain of the original function. So the derivative is a function which takes (functions from Rn → Rm ) and returns a function which takes ( points in Rn ) and returns ( linear maps from Rn → Rm ). This level of abstraction is why we wanted you to get comfortable with “higher-order functions” earlier. We are not as crazy as we seem. 2 As an example, if f : R2 → R2 is the function defined by f (x, y) = (x y, y + x), then it will turn out that at any point (a, b), the derivative Df (a,b) will be the   2ab a2 linear map from R2 to R2 given by the matrix (we do not know why yet, 1 1 but this is true). So Df is really which takes a point (a, b) and spits out  a function  2ab a2 the linear map with matrix . So what about just plain old D? D takes 1 1 a function (f ) and returns the function Df   which takes a point (a, b) and returns 2ab a2 the linear map whose matrix is . Letting L(A, B) stand for all the linear 1 1 functions from A → B and Func(A, B) be the set of all functions from A → B, we could write D : Func(Rn , Rm ) → Func(Rn , L(Rn , Rm )). Please do not give up on the course after the last two paragraphs! Everything is going to be okay. Hopefully you will be able to slowly digest these statements throughout the course. Not understanding them now will not hold you back.  2 2  3x y 2x3 y Question 4 Let f be a function which satisfies Df (x,y) =  2x 2y . xy ye xexy Solution

78

27 3(1)2 (2)2  2(1) = 2e(1)(2) 

Hint:

D(f )

 Hint:

f

(1,2)

Intuitively

 2(1)3 (2) 2(2)  1e(1)(2)

  12 0.01 (1, 2) + ≈ f (1, 2) +  2 −0.02 2e2 

  4  0.01 4 −0.02 e2



Hint:

 12(0.01) + 4(−0.02) f (1.01, 1.99) ≈ (2, 3, 1) +  2(0.01) + 4(−0.02)  2e2 (0.01) + e2 (−0.02)

Hint:

f (1.01, 1.99) ≈ (2.04, 2.94, 1)

Hint:

  2.04 Format this as 2.94 1

Given that f (1, 2) = (2, 3, 1). Approximate f (1.01, 1.98).

79

28

Rigorously

The derivative approximates the changes in a function to first order accuracy We are now ready to define the derivative rigorously. Mimicking our development of the single variable derivative, we define: Definition 1 Let f : Rn → Rm be a function, and let p ∈ Rn . f is said to be differentiable at p if there is a linear map M : Rn → Rm such that f (p + ~h) = f (p) + M (~h) + Errorp (~h) with Errorp (~h) =0 lim ~ ~ h→0 h . If f is differentiable at p, there is only one such linear map M , which we call the (total) derivative of f at p. Verbally, M is the linear function which makes the error between the function value f (p + ~h) and the affine approximation f (a) + M (~h) go to zero ”faster than ~h” does. This definition is great, but it doesn’t tell us how to actually compute the derivative of a differentiable function! Lets dig a little deeper:     x f (x, y) = 1 . Assuming f y f2 (x, y) is differentiable at the point (1, 2), lets try to compute the derivative there. Let M be the derivative of f at (1, 2). Then

Example 2

Let f : R2 → R2 be defined by f

      f ((1, 2) + h 1 ) − f ( 1 ) − M (h 1 ) 0 2 0   lim =0 1 h→0 h 0   f (1 + h, 2) − f (1, 2) − hM ( 1 ) 0 =0 lim h→0 h     f( 1 + h ) − f( 1 )   2 2 1 lim − M( ) = 0 0 h→0 h 80

28

Rigorously

so   f (1 + h, 2) − f (1, 2) 1 M( ) = lim 0 h→0 h   f1 (1 + h, 2) − f1 (1, 2)   1   h M( ) = lim  f (1 + h, 2) − f2 (1, 2)  0 2 h→0 h But each of the remaining quantities are derivatives of one variable functions! In particular, we   have that d   (f (x, 2)) 1 1  x=1  M( ) =  dx . We call these kinds of quantities partial deriva d 0 (f2 (x, 2)) x=1 dx tives because they are part of the derivative. We will learn more about partial derivatives in the next section. Without copying the work in the example above (if you can) try to find M (0, 1).     f ((1, 2) + h 0 ) − f (1, 2) − M (h 0 ) 1 1   lim =0 0 h→0 h 1   0 f (1, 2 + h) − f (1, 2) − hM ( ) 1 lim =0 h→0 h   f (1, 2 + h) − f (1, 2) 0 lim − M( ) =0 1 h→0 h so   f (1, 2 + h) − f (1, 2) 0 M( ) = lim 1 h→0 h   f1 (1, 2 + h) − f1 (1, 2)   0   h M( ) = lim  f (1, 2 + h) − f2 (1, 2)  1 2 h→0 h   d   (f (1, y)) y=2 0  dy 1  M( )= d  1 (f2 (1, y)) y=2 dy This question example show that the matrix of the derivative  and the previous  d d (f1 (x, 2)) x=1 (f1 (1, y)) y=2   dy of f at (1, 2) is  dx  d d (f2 (x, 2)) x=1 (f2 (1, y)) y=2 dx dy Question 3

Use the results of this  question  2 and  the previous example to find the x x + y2 matrix of the derivative of f = at the point (1, 2). y xy 81

28

Rigorously

Solution Hint:

In this case f1 (x, y) = x2 + y 2 and f2 (x, y) = xy

Hint:

d  dx (f1 (x, 2)) x=1 By the result of the last two exercises, the matrix of the derivative is  d (f2 (x, 2)) x=1 dx 

Hint: f1 (x, 2) = x2 + 22 f1 (1, y) = 12 + y 2 f2 (x, 2) = 2x f2 (1, y) = y

Hint: d (f1 (x, 2)) x=1 dx d (f2 (x, 2)) x=1 dx d (f1 (1, y)) y=2 dy d (f2 (1, y)) y=2 dy

 d x2 + 22 x=1 = 2x x=1 = 2 dx d = (2x) x=1 = 2 x=1 = 2 dx  d = 12 + y 2 y=2 = 2y y=2 = 4 dy d = (y) y=2 = 1 x=1 = 1 dy =

 Hint:

82

Thus the matrix of the derivative is

2 2

4 1



 d (f1 (1, y)) y=2  dy  d (f2 (1, y)) y=2 dx

29

Partial Derivatives

The entries in the Jacobian matrix are partial derivatives 1

There is a familiar looking formula for the derivative of a differentiable function: Let f : Rn → Rm be a differentiable function. Then

Theorem 1

f (p + h~v ) − f (p) D(f ) p (~v ) = lim h→0 h Prove this theorem By the definition of the derivative, we have that |f (p + h~v ) − f (p) − Df (p)(h~v )| =0 h→0 |h~v | f (p + h~v ) − f (p) − hDf (p)(~v ) 1 = 0 since 1 is a constant lim |~v | h→0 h |~v | f (p + h~v ) − f (p) 1 lim − Df (p)(~v ) = 0 since is a constant h→0 h |~v | lim

We conclude that f (p + h~v ) − f (p) D(f ) p (~v ) = lim h→0 h Question 2

Let f : R2 → R2 be defined by f (x, y) = (x2 − y 2 , 2xy).

Solution Hint:

Df (3,4)



−1 2

 = lim

h→0

= lim

h→0

= lim

h→0

= lim

h→0

1

   −1 f (3, 4) + h − f (3, 4) 2 h f (3 − h, 4 + 2h) − f (3, 4) h     2 1 (3 − h) − (4 + 2h)2 −7 − 24 2(3 − h)(4 + 2h) h   2 2 1 (3 − h) − (4 + 2h) + 7 h 2(3 − h)(4 + 2h) − 24

YouTube link: http://www.youtube.com/watch?v=HCAb4uUZzjU

83

29

Partial Derivatives

Hint:  −22h − 3h2   h = lim   4h − 4h2 h→0 h   −22 − 3h = lim 4 − 4h h→0   −22 = 4 

Using the theorem above, compute Df (3,4)



−1 2



Since the unit directions are especially important we define: Definition 3 Let f : Rn → R be a (not necessarily differentiable) function. We define its partial derivative with respect to xi by ∂f f (p + h~ei ) − f (p) (p) = fxi (p) := lim h→0 ∂xi h In other words, in the ~ei direction.

∂f (p) is the instantaneous rate of change in f by moving only ∂xi

Example 4 There is really only a good visualization of the partial derivatives of a map f : R2 → R, because this is really the only type of higher dimensional function we can effectively graph. Computing partial derivatives is no harder than computing derivatives of single variable functions. You take a partial derivative of a function with respect to xi just by treating all other variables as constants, and taking the derivative with respect to xi . Question 5

Let f : R2 → R be defined by f (x, y) = x sin(y).

Solution ∂ (x sin(y)) (a,b) ∂x

Hint:

We are trying to compute

Hint:

We just differentiate as if y were a constant, so

Hint:

fx (a, b) = sin(b)

fx (a, b) = sin(b) Solution Hint:

84

We are trying to compute

∂ (x sin(y)) (a,b) ∂y

∂ (x sin(y)) (a,b) = sin(y) (a,b) ∂x

29

Hint:

We just differentiate as if x were a constant, so

Hint:

fx (a, b) = a cos(b)

Partial Derivatives

∂ (x sin(y)) (a,b) = x cos(y) (a,b) ∂y

fy (a, b) = a(cos(b))

We have already proven the following theorem in the special case n = m = 2 in the previous activity. Proving it in the general case requires no new ideas: only better notational bookkeeping. Let f : Rn → Rm be a function with functions fi : Rn →  component  f1 (p)  f2 (p)      R, for i = 1, 2, 3, ..., m. In other words, f (p) =  f3 (p) . If f is differentiable at  ..   . 

Theorem 6

fm (p) p, then its Jacobian matrix at p is  ∂f1 ∂f1  ∂x1 (p) ∂x2 (p) · · ·  ∂f ∂f2  2 (p) (p) · · ·   ∂x1 ∂x2  .. .. ..  . . .   ∂fm ∂fm (p) (p) · · · ∂x1 ∂x2

 ∂f1 (p)  ∂xn  ∂f2  (p)   ∂xn  ..  .   ∂fm (p) ∂xn

More compactly, we might write 

∂fi (p) ∂xj



Try to prove this theorem. Using the more compact notation will be helpful. Follow along the proof we developed together in the last section! By the definition of the derivative, we have |f (p + h~ ei ) − f (p) − M (h~ ei )| =0 h→0 |h~ ei | |f (p + h~ ei ) − f (p) − hM (~ ei )| lim =0 h→0 |h| f (p + h~ ei ) − f (p) − hM (~ ei ) lim =0 h→0 h f (p + h~ ei ) − f (p) lim − M (~ ei ) = 0 h→0 h lim

f (p + h~ ei ) − f (p) = M (~ei ). But for this to be true, the j th row of each h side must be equal, so So lim

h→0

85

29

Partial Derivatives

fj (p + h~ ei ) − fj (p) = Mji h→0 h ∂fj But the quantity on the left hand side is ∂xi p lim

Question 7

Let f : R3 → R2 be defined by f (x, y, z) = (x2 + y + z 3 , xy + yz 2 ).

Solution ∂f1 ∂y ∂f2 ∂y

∂f1  ∂x The Jacobian Matrix is  ∂f 2 ∂x 

Hint:

 ∂f1 ∂z  ∂f2  ∂z

∂f2 ∂ Hint: As an example, = xy + yz 2 = 2yz. Remember that we just differentiate ∂z ∂z with respect to z, treating x and y as constants. Hint: ∂f1 ∂x ∂f1 ∂y ∂f1 ∂z ∂f2 ∂x ∂f2 ∂y ∂f2 ∂z

Hint:

The Jacobian matrix is

 2x y

= 2x =1 = 3z 2 =y = x + z2 = 2yz

1 x + z2

3z 2 2yz



What is the Jacobian Matrix of f ? This should be a matrix valued function of x, y, z.

The formula for the derivative f (p + h~v ) − f (p) h→0 h

Df (p)(~v ) = lim

looks a lot more familiar than our definition. You might be asking why we didn’t take this formula as our definition of the derivative. After all, we usually take something that looks like this as our definition in single variable calculus. In the following two optional exercises you will find out why. Find a function f : R2 → R such that at (0, 0), the limit M (~v ) = lim

h→0

exists for every vector ~v ∈ R2 , but M is not a linear map. 86

f ((0, 0) + h~v ) − f (0, 0) h

29

Hint:

Partial Derivatives

Try showing that the function x3 f (x, y) = x2 + y 2  0  

if (x, y) 6= (0, 0) if (x, y) = (0, 0)

has the desired properties.

Let x3 f (x, y) = x2 + y 2  0   a Then for any vector ~v = , we have b  

if (x, y) 6= (0, 0) if (x, y) = (0, 0)

f ((0, 0) + h~v ) − f (0, 0) h f (ha, hb) = lim h→0 h h3 a3 = lim h→0 h(h2 a2 + h2 b2 ) a3 = lim 2 h→0 a + b2 a3 = 2 a + b2

M (~v ) = lim

h→0

  a3 a . This is certainly not a linear function from R2 → R So M = 2 b a + b2 So this formula cannot serve as a good definition of the derivative, because it does not have to produce linear functions. What if we require that the function is linear as well? Even then, it is no good: f ((0, 0) + h~v ) − f (0, 0) h→0 h exists for every vector ~v ∈ R2 and the function M defined this way is linear, but nevertheless, f is not differentiable at (0, 0).

Find a function f : R2 → R such that at (0, 0), M (~v ) = lim

Hint:

Try the function ( f (x, y) =

Let

1 0

if y = x2 and (x, y) 6= (0, 0) else

( 1 if y = x2 and (x, y) 6= (0, 0) f (x, y) = 0 else

  a Let ~v = . Then b 87

29

Partial Derivatives

f ((0, 0) + h~v ) − f (0, 0) h f (ha, hb) = lim h→0 h

M (~v ) = lim

h→0

2 Now, the intersection of the line √ t 7→ (ta, tb) with the parabola y = x happens b when tb = t2 a2 , i.e. when t = ± . So as long as we choose h smaller than that, |a| we know that (ha, hb) is not on the parabola y = x2 . Hence f (ha, hb) = 0 for small enough h. Thus M (~v ) = 0 for all ~v ∈ R2 . This definitely is a linear function, but f is not differentiable at (0, 0) using our definition, since ~ f ((0, 0) + ~h) − f (0, 0) − M (~h) f (h) lim = lim ~ ~ h→0 h→0 |~ |~h| h| f (t, t2 ) 2 ~ = does not exist, since taking h on the parabola y = x yields the limit lim t→0 t 1 lim , which diverges to ∞. t→0 |t|

88

30

The gradient

The gradient is a vector version of the derivative. In this section, we will focus on gaining a more ”geometric” understanding of derivatives of functions f : Rn → R. If f is such a function, the derivative Df (p) : Rn → R is a covector. So, by the definition of the dot product, we can reinterpret that derivative as the dot product > with the fixed vector Df (p) . Definition 1 The gradient of a differentiable function f : Rn → R is defined > by ∇f (p) = Df (p) . Equivalently, ∇f is the (unique) vector which makes the following equation true for all ~v ∈ Rn : ∇f (p) · v = Df (p)(v) Question 2

Solution 

Hint:

 ∂ sin(xyz 2 )   ∂x   ∂ 2  sin(xyz ) ∇f (x, y, z) =    ∂y   ∂ 2 sin(xyz ) ∂z

Hint:

 yz 2 cos(xyz 2 ) 2 2 ∇f (x, y, z) =  xz cos(xyz )  2xyz cos(xyz 2 ) 

If f : R3 → R is defined by f (x, y, z) = sin(xyz 2 ), what is ∇f (x, y, z)?

We can now use what we know about the geometry of the dot product to understand some interesting things about the derivative. ~v In a sentence, how does the vector relate to the vector ~v ? |~v | ~v is the unit vector which points in the same direction as ~v |~v | Theorem 3

Let f : Rn → R, and p ∈ Rn . Let η =

∇f (p) |∇f (p)|

If |~v | = 1, then Df (p)(~v ) ≤ Df (p)(η) More geometrically, this theorem says that ∇f (p) points in the direction of “greatest increase” for the function f . More poetically, ∇f always points “straight up the mountain”. Prove this theorem. Df (p)(~v ) = ∇f (p) · ~v ≤ |~v ||∇f (p)| by Cauchy-Schwarz = |∇f (p)| since |~v | = 1 89

30

The gradient

On the other hand, Df p (~η ) = ∇f (p) · ~η = ∇f (p) ·

∇f (p) |∇f (p)| 2

|∇f (p)| |∇f (p)| = |∇f (p)| =

The inequality follows. One of the ways that we learned to visualize functions was via contour plots. We will see that there is a very nice relationship between contours and gradient vectors. Let’s start with the two dimensional case. Let f : R2 → R, and consider the contour C = {(x, y) ∈ R2 : f (x, y) = c} for some c ∈ R. Question 4

Let ~v be a tangent vector to C at a point p ∈ C

Solution Hint: Since ~v is pointing in the direction of a level curve of f , f should not be changing as you move in the direction ~v Hint:

So Df (p)(~v ) = 0.

Df (p)(~v ) = 0 Note: You should be able to answer this from an intuitive point of view, but we will not develop the formal tool to prove this (the implicit function theorem1 ) in this course.

In general, if f : Rn → R, and the contour C = {p ∈ Rn : f (p) = c} for some c ∈ R, then for every tangent vector ~v to C, we will have Df p(~v ) = 0. Intuitively this is true because moving a small amount in the direction of ~v will not change the value of the function much, since you are staying as close as possible to the contour where the function is constant. Accepting this, we have the following: Theorem 5 If f : Rn → R, and the contour C = {p ∈ Rn : f (p) = c} for some c ∈ R, then for every tangent vector ~v to C, we will have ∇f (p) · v = 0. In other words, ∇f (p) is perpendicular to the contour. Question 6 Write an equation for the tangent plane to the surface x2 +xy+4z 2 = 1 at the point (1, 0, 0). Solution Hint: Our general strategy will be to find a vector which is perpendicular to the plane. Writing down what that means in terms of dot products should yield the equation. 1

90

http://en.wikipedia.org/wiki/Implicit_function_theorem

30

The gradient

Hint:

This surface is a level surface of f (x, y, z) = x2 +xy +4z 2 , namely f (x, y, z) = 1.

Hint:

 ∂f  ∂x   ∂f   ∇f =   ∂y    ∂f ∂z

Hint:

 2x + y ∇f =  x  8z

Hint:

  2 ∇f (1, 0, 0) = 1 0





Hint: For a point (x, y, z) to be in the tangent plane, we would need that (x, y, z) − (1, 0, 0) is perpendicular to ∇f (1, 0, 0). 

Hint:

   x−1 2 So we need  y  · 1 = 0 z 0

Hint:

This says that the equation of the plane is 2x − 2 + y = 0

2x + y − 2 = 0

91

31

One forms

One forms are covector fields. In this section we just want to introduce you to some new notation and terminology which will be helpful to keep in mind for the next course, which will cover multivariable integration theory. As we observed in the last section, the derivative of a function f : Rn → R n n assigns a covector to each point in R . In particular, Df p : R → R is the   ∂f ∂f ∂f covector whose matrix is the row · · · ∂x1 p ∂x2 p ∂xn p Definition 1 A covector field, also known as a differential 1-form, is a function which takes points in Rn and returns a covector on Rn . In other words, it is a covector valued function. We can always   write any covector field ω as ω(x) = f1 (x) f2 (x) · · · fn (x) for n functions fi : Rn → R. The derivative of a function f : Rn → R is the quintessential example of a 1-form on Rn . Question 2

Let f : R3 → R be the function f (x, y, z) = y.

Solution Hint:

 The Jacobian of f is 0

1

 0 everywhere.

What is the matrix for Df at the point (a, b, c)?

Generalizing the result of the previous question, we see that if πi : Rn → R is defined by πi (x1 , x2 , . . . , xn ) = xi , then D(πi ) will be the row 0 0 · · · 0 1 0 · · · where the 1 appears in the ith slot. We introduce the notation dxi for the covector field   D(πi ). So we can rewrite any covector field ω(x) = f1 (x) f2 (x) · · · fn (x) for n functions fi : Rn → R in the form ω(x) = f1 (x)dx1 + f2 (x)dx2 + · · · + fn (x)dxn . It turns out that 1-forms, not functions, are the appropriate objects to integrate along curves in Rn . The sequel to this course will focus on the integration of differential forms: we will not touch on it in this course.

92

 0 ,

32

Numerical integration

Integrate a covector field. Exercise 1 Suppose we have a one-form expressed as a Python function, e.g., omega (which we will often write as ω) which takes a point (expressed as a list) and returns a 1 × n matrix. For example, perhaps we have that omega([7,2,5]) is [[5,3,2]]. Let’s only consider the case n = 2, and suppose that ω is the derivative of some mystery function f : R2 → R. If we have access to omega and we know that f (3, 2) = 5, can we approximate f (4, 3)? How might we go about this? We can take a path from (3, 2) to (4, 3), and break it up into small pieces; on each piece, we can use the derivative to approximate how a small change to the input will affect the output. And repeat. Do this in Python. Solution 1 2 3 4 5 6 7

Python # suppose the derivative of f is omega, and f(3,2) = 5. # so omega([3,2]) is (perhaps) [[-4,3]]. # # integrate(omega) is an approximation to the value of f at (4,3). # def integrate(omega): return # the value

8 9 10

def validator(): return abs(integrate( lambda p: [[2*p[0] - p[1], -p[0] + 1]] ) - 7.0) < 0.05

How did you move from (3, 2) to (4, 3)? Did it matter which path you walked along?

93

33

Python

We approximate derivatives in Python. There are two different perspectives on the derivative available to us. Suppose f : Rn → Rm is a differentiable function, and we have a point p ∈ Rn . The first perspective on the derivative is the total derivative, which is linear map Df (p) which sends the vector ~v to Df (p)(~v ), recording how much an infinitesimal change in the ~v direction in Rn will affect the output of f in Rm . The second perspective on the derivative is the Jacobian matrix, which is the matrix of partials given by   ∂f1 ∂f1 ∂f1  ∂x1 (p) ∂x2 (p) · · · ∂xn (p)    ∂f ∂f2 ∂f2   2 (p) (p) · · · (p)   .  ∂x1 ∂x2 ∂xn   .. .. .. ..   . . . .     ∂fm ∂fm ∂fm (p) (p) · · · (p) ∂x1 ∂x2 ∂xn Observation 1 Df (p).

The Jacobian matrix is the matrix representing the linear map

This observation can be “seen” with some Python code.

94

34

Derivative

Code the total derivative. Exercise 1 Let epsilon be a small, but positive number. Suppose f : R → R has been coded as a Python function f which takes a real number and returns a real number. Seeing as f 0 (x) = lim

h→0

f (x + h) − f (x) , h

can you find a Python function which approximates f 0 (x)? Given a Python function f which takes a real number and returns a real number, we can approximate f 0 (x) by using epsilon. Write a Python function derivative which takes a function f and returns an approximation to its derivative. Solution Hint:

1 2 3 4

To approximate this, use (f(x+epsilon) - f(x))/epsilon.

Python epsilon = 0.0001 def derivative(f): def df(x): return (f(blah blah) - f(blah blah)) / blah blah return df

5 6 7 8 9 10 11 12 13 14

def validator(): df = derivative(lambda x: 1+x**2+x**3) if abs(df(2) - 16) > 0.01: return False df = derivative(lambda x: (1+x)**4) if abs(df(-2.642) - -17.708405152) > 0.01: return False return True

Great work! Now let’s do this in a multivariable setting. A function f : Rn → Rm should be stored as a Python function which takes a list with n entries and returns a list with m entries. Solution Hint:

Implement f (x, y, z) = (xy, x + z) as a Python function. You can get away with

def f(v): return [v[0]*v[1],v[0] + v[2]] Python 1 2 3 4 5

def f(v): x = v[0] y = v[1] z = v[2] return # such and such

95

34

Derivative

6 7 8 9 10 11 12

def validator(): if f([3,2,7])[0] != 6: return False if f([3,2,7])[1] != 10: return False return True

Now we provide you with a function add vector which takes two vectors ~v and w ~ and returns ~v + w, ~ and a function scale vector which takes a scalar c and a vector ~v and returns the vector c~v . Finally, vector length(v) computes the length of the vector ~v . Given all of this preamble, write a function D which takes a Python function f : Rn → Rm and returns the function Df : Rn → Lin(Rn , Rm ) which takes a point p ∈ Rn and returns (an approximation to) the linear map Df (p) : Rn → Rm . Solution Hint: def D(ff): def Df(p): f = ff # band-aid over a Python interpreter bug def L(v): return scale_vector( 1/(epsilon), add_vector( f(add_vector(p, scale_vector(epsilon, v))), scale_vector(-1, f(p)) ) ) return L return Df

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Python epsilon = 0.0001 n = 3 m = 2 def add_vector(v,w): return [sum(v) for v in zip(v,w)] def scale_vector(c,v): return [c*x for x in v] def vector_length(v): return sum([x**2 for x in v])**0.5 def D(f): def Df(p): f = f # band-aid over a Python interpreter bug def L(v): # Try "f(p + blah blah) - f(p)" and so on... return L(v) where L = Df(p) return L return Df

18 19 20 21 22 23 24

def validator(): # f(x,y,z) = (3*x^2 + 2*x*y*z, x*y^3*z^2) Df = D(lambda v: [3*v[0]*v[0] + 2*v[0]*v[1]*v[2], v[0]*(v[1]**3)*(v[2]**2)]) Dfp = Df([2,3,4]) Dfpv = Dfp([3,2,1])

96

34

25 26 27 28 29

Derivative

if abs(Dfpv[0] - 152) > 1.0: return False if abs(Dfpv[1] - 3456) > 10.0: return False return True

Note that Df(p) is a linear map, so we can represent that linear map as a matrix. We do so in the next activity.

97

35

Jacobian matrix

Code the Jacobian matrix. In the previous activity, we wrote some code to compute D. Armed with this, we can take a function f : Rn → Rm and a point p ∈ Rn and compute Df (p), the linear map which describes, infinitesimally, how wiggling the input to f will affect its output. Assuming f is differentiable, we have that Df (p) is a linear map, so we can write down a matrix for it. Let’s do so now. Exercise 1 To get started, we begin by computing partial derivatives in Python. To make things easy, let’s differentiate functions like def fi(v): return v[0] * (v[1]**2) In other words, our functions will send an n-tuple to a single real number. In this case where fi (x, y) = xy 2 , we should have that partial(fi,1)([2,3]) is close to 12, since  ∂ xy 2 = 2xy, ∂y and so at the point (2, 3), the derivative is 2 · 2 · 3 = 12. Solution Hint: def partial(fi,j): def derivative(p): p_shifted = p[:] p_shifted[j] += epsilon return (fi(p_shifted) - fi(p))/epsilon return derivative

1 2 3 4 5 6 7 8 9 10 11

Python epsilon = 0.0001 n = 2 # # fi is a function from R^n to R def partial(fi,j): def derivative(p): return # the partial derivative of fi in the j-th coordinate at p return derivative # # this should be close to 12 print partial(lambda v: v[0] * v[1]**2, 1)([2,3])

12 13 14

def validator(): return abs(partial(lambda v: v[0]**2 * v[1]**3, 0)([7,2]) - 112) < 0.01

If we have a function f : Rn → Rm , we’ll encode it as a Python function which takes a list with n entries, and returns a list with m-entries. Let’s write a Python helper for pulling out just the ith component of the output. 98

35

Jacobian matrix

Solution 1 2 3 4 5 6

Python # if f is a function from R^n to R^m, # then component(f,i) is the function R^n to R, # which just looks at the i-th entry of the output # def component(f,i): return lambda p: # the i-th component of the output

7 8 9

def validator(): return component(lambda v: [v[0],v[1]],1)([1,17]) == 17

Now we put it all together. For a function f : Rn → Rm , the Jacobian matrix is given by   ∂f1 ∂f1 ∂f1  ∂x1 (p) ∂x2 (p) · · · ∂xn (p)    ∂f ∂f2 ∂f2   2 (p) (p) · · · (p)     ∂x1 ∂x2 ∂xn   .. .. .. ..   . . . .     ∂fm ∂fm ∂fm (p) (p) · · · (p) ∂x1 ∂x2 ∂xn Solution Implement the function jacobian which takes a function f : Rn → Rm and n a point p ∈ R , and returns the Jacobian matrix of f at the point p. Hint:

You can write this matrix as

[[partial(component(f,i),j)(p) for j in range(n)] for i in range(m)]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Python epsilon = 0.0001 n = 3 # the dimension of the domain m = 2 # the dimension of the codomain def component(f,i): return lambda p: f(p)[i] def partial(fi,j): def derivative(p): p_shifted = p[:] p_shifted[j] += epsilon return (fi(p_shifted) - fi(p))/epsilon return derivative # # f is a function from R^n to R^m # jacobian(f,p) is its Jacobian matrix at the point p def jacobian(f,p): return # the Jacobian matrix

17 18 19 20

def validator(): m = jacobian(lambda v: [v[0]**2, (v[1]**3)*(v[0]**2)], [3,7]) return abs(m[1][0] - 2058) < 0.1

99

36

Relationship

Relate the Jacobian matrix and the total derivative. In the previous activities, we wrote some code to compute Df (p) and the Jacobian matrix of f at the point p. In this activity, we observe that these are related. Exercise 1

Try running the code below.

Solution 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Python epsilon = 0.0001 n = 3 # the dimension of the domain m = 2 # the dimension of the codomain def component(f,i): return lambda p: f(p)[i] def partial(fi,j): def derivative(p): p_shifted = p[:] p_shifted[j] += epsilon return (fi(p_shifted) - fi(p))/epsilon return derivative def jacobian(f,p): return [[partial(component(f,i),j)(p) for j in range(n)] for i in range(m)] def add_vector(v,w): return [sum(v) for v in zip(v,w)] def scale_vector(c,v): return [c*x for x in v] def vector_length(v): return sum([x**2 for x in v])**0.5 def D(ff): def Df(p): f = ff def L(v): return scale_vector( 1/(epsilon), add_vector( f(add_vector(p, scale_vector(epsilon, v))), scale_vector(-1, f(p)) ) ) return L return Df # def dot_product(v,w): return sum([x[0] * x[1] for x in zip(v,w)]) def apply_matrix(m,v): return [dot_product(row,v) for row in m] # f = lambda p: [p[0] + p[0]*p[1], p[1] * p[2]**2] p = [1,2,0] v = [2,2,1] print apply_matrix(jacobian(f,p),v) print D(f)(p)(v)

40 41

def validator():

100

36

42 43

Relationship

# I just want them to try running this return True

In this case, we set f (x, y, z) = (x + xy, yz 2 ), and we computed Df (p)(~v ) in two different ways. The two different methods were close—but not exactly the same. Why are they not exactly the same? The D function is computing Df (p) by comparing f (p + ~v ) to f (p). In contrast, the jacobian function computes Df (p) by computing partial derivatives, so we are not actually computing f (p+~v ) in that case, but rather, f (p+~ei ) for various i’s. That the way f changes when wiggling each component separately has anything to do with what happens to f when we wiggle the inputs together boils down to the assumption that f be differentiable. This relationship is true infinitesimally, but here we’re working with  = 0.0001, so it is not surprising that it is not true on the nose.

101

37

The Chain Rule

Differentiating a composition of functions is the same as composing the derivatives of the functions. The chain rule of single variable calculus tells you how the derivative of a composition of functions relates to the derivatives of each of the original functions. The chain rule of multivariable calculus will work analogously. Let f : R2 → R3 and g : R3 → R. Theonly things  you know about f 4 −1 are that f (2, 3) = (3, 0, 0) and Df (2, 3) has matrix 2 3 . The only thing you 0 1   know about g is that g(3, 0, 0) = 4 and Dg(3, 0, 0) has matrix 4 5 6 .

Question 1

Solution    a g(f (2 + a, 3 + b)) ≈ g f (2, 3) + Df (2, 3) using the linear approximab tion to f at (2, 3) Hint:

Hint:       4 −1   a a  3 g f (2, 3) + Df (2, 3) = g (3, 0, 0) + 2 b b 0 1    4a − b = g (3, 0, 0) + 2a + 3b b   4a − b ≈ g(3, 0, 0) + Dg(3, 0, 0) 2a + 3b using the linear approximation to g at (3, 0, 0) b     4a − b ≈ 4 + 4 5 6 2a + 3b b 

= 4 + (16a − 4b) + (10a + 15b) + (6b) = 4 + 26a + 17b Assuming a, b ∈ R2 are small, (g ◦ f )(2 + a, 3 + b) ≈ 4 + 26a + 17b So the matrix of D(g ◦ f )(2, 3) is

Solution Hint:

Dg f (2,3) ◦ Df (2,3) has matrix



4

5

  4 6 2 0

 −1  3  = 4(4) + 5(2) + 6(0) 1   = 26 17

102

4(−1) + 5(3) + 6(1)



37

The Chain Rule

Notice that D(g ◦ f )(2, 3) is the same as Dg f (2,3) ◦ Df (2,3) ! You should really check this to make sure. Look at the hint to see how to do the composition if you need help.

The heuristic approximations in the last question lead us to expect the following theorem: Theorem 2 Let f : Rn → Rm and g : Rm → Rk be differentiable functions, and p ∈ Rn . Then D(g ◦ f )(p) = Dg(f (p)) ◦ Df (p) In other words, the derivative of a composition of functions is the composition of the derivatives of the functions. The trickiest part of the above theorem is remembering that you need to apply Dg at the point f (p). The proof of this theorem is a little bit beyond what we want to require you to think about in this course, but the essential idea of the proof is just that (g ◦ f )(p + ~h) ≈ g(f (p) + Df (p)(~h)) ≈ g(f (p)) + Dg(f (p))(Df (p)(~h)) You should understand this essential idea, even if you do not understand the full proof. We cover the full proof in an optional section after this one. Question 3 Let f : R2 → R2 be defined by f (r, t) = (r cos(t), r sin(t)). Let g : R2 → R be defined by g(x, y) = x2 + y 2 . Don’t let the choice of variable names scare you. Solution Hint:



∂  ∂r r cos(t)  ∂ r sin(t) ∂r

Hint:

 cos(t) sin(t)

 ∂ r cos(t) ∂t  ∂ r sin(t) ∂t

−r sin(t) r cos(t)



What is the Jacobian of f at (r, t)? Solution Hint:



∂ 2 (x + y 2 ) ∂x

∂ 2 (x + y 2 ) ∂y



Hint: 

2x

2y



What is the Jacobian of g at (x, y)?

103

37

The Chain Rule

Solution Hint: above.

Just plug (r cos(t), r sin(t)) into the formula for the Jacobian of g you obtained

Hint:

The answer is  2r cos(t)

2r sin(t)



What is the matrix of Dg(f (r, t))? Solution Hint: 

2r cos(t)

  cos(t) 2r sin(t) sin(t)

  −r sin(t) = 2r cos2 (t) + 2r sin2 (t) r cos(t)   = 2r 0

−2r2 cos(t) sin(t) + 2r2 sin(t) cos(t)

What is the matrix of Dg(f (r, t)) ◦ Df (r, t)? Solution Hint: (g ◦ f )(r, t) = g(r cos(t), r sin(t)) = (r cos(t))2 + (r sin(t))2 = r2 (cos2 (t) + sin2 (t)) = r2 Compute the composite directly: (g ◦ f )(r, t) =r2 Solution

Compute D(g ◦ f )(r, t) directly from the formula for (g ◦ f ).

This example has demonstrated the chain rule in action! Computing the derivative of the composite was the same as composing the derivatives.

The product rule from single variable calculus can be proved by invoking the chain rule for multivariable functions. We supply a basic outline of the proof below: can you complete this outline to give a full proof? Let f, g : R → R be two differentiable functions. Let p : R → R be defined by p(t) = f (t)g(t). Let Both : R → R2 be defined by Both(t) = (f (t), g(t)). Finally let Multiply : R2 → R be defined by Multiply(x, y) = xy. Then we can differentiate Multiply(Both(t)) = p(t) using the multivariable chain rule. This should result in the product rule.  0  f (t) D (Both) has the matrix 0 at the point t ∈ R. g (t)   D (Multiply) has the matrix y x at the point (x, y) ∈ R2 So p0 (t) = D (Multiply) Both(t) ◦ D (Both) t by the multivariable chain rule. 104



37

The Chain Rule

So p0 (t) = D (Multiply) Both(t) ◦ D (Both) t = D (Multiply) (f (t),g(t)) ◦ D (Both) t     f 0 (t) = g(t) f (t) g 0 (t) = g(t)f 0 (t) + f (t)g 0 (t) This is the product rule from single variable differential calculus.

105

38

Proof of the chain rule

This section is optional This section is optional Before we beginning the proof of the chain rule, we need to introduce a new piece of machinery: Let L : Rn → Rm be a linear map. Let S n−1 = {~v ∈ Rn : |~v | = 1} be the unit sphere in Rn . Then there is a function F : S n−1 → Rm defined on this sphere which takes a vector and returns the length of its image under L, F (~v ) = |L(~v )|. Definition 1 The maximum value of the function F is called the operator norm of L, and is written |L|op . The fact that the operator norm of a linear transformation exists is a kind of deep (for this course) piece of analysis. It follows from the fact that the sphere is compact1 , and continuous functions on compact spaces must achieve a maximum value. For example, every continuous function on the closed interval [0, 1] ⊂ R has a maximum, although not every continuous function on the open interval (0, 1) does. The essential difference between these two intervals is that the first one is compact, while the second one is not. The essential property of the operator norm is that |L(~v )| ≤ |L|op |~v | for each vector ~v ~v is a unit ~v ∈ Rn . This is true just because we have L( ) ≤ |L|op because |~v | |~v | vector, and by the definition of the operator norm. This fact will be essential to us as we prove the chain rule. Let f : Rn → Rm and g : Rm → Rk be differentiable functions. We want to show that D(g ◦ f )(p) = Dg(f (p)) ◦ Df (p). All we know at the beginning of the day is that f (p + ~h) = f (p) + Df (p)(~h) + fError(~h) with lim

~ h→0

fError(~h) |~h|

=0

and g(f (p) + ~u) = g(f (p)) + Dg(f (p))(~u) + gError(~u) with lim

~ u→0

1

|gError(~u)| =0 |~u|

http://en.wikipedia.org/wiki/Compact_space

106

38

Proof of the chain rule

We will simply compose these two formulas for f and g, and try to get some control on the complicated error term which results.     g f (p + ~h) = g f (p) + Df (p)(~h) + fErrorp (~h)   = g(f (p)) + Dg(f (p)) Df (p)(~h) + fErrorp (~h)   + gError Df (p)(~h) + fErrorp (~h) = (g ◦ f )(p) + Dg(f (p)) ◦ Df (p)(~h)     + Dg(f (p)) fErrorp (~h) + gError Df (p)(~h) + fErrorp (~h)

This looks pretty horrible, but at least we can see the error term in red that we have to get some control over. In particular we will have proven the chain rule if we can show that

lim

    Dg(f (p)) fErrorp (~h) + gError Df (p)(~h) + fErrorp (~h) |~h|

~ h→0

=0

Since

       0 ≤ Dg(f (p)) fErrorp (~h) + gError Df (p)(~h) + fErrorp (~h) ≤ Dg(f (p)) fErrorp (~h) + gError Df (p)(~h)

by the triangle inequality, we will have the result if we can prove separately that   Dg(f (p)) fErrorp (~h) lim =0 ~ ~ h→0 h and

lim

  gError Df (p)(~h) + fErrorp (~h)

=0 ~h Lets prove the first limit. This is where operator norms enter the picture.   Dg(f (p)) fErrorp (~h) |Dg|op |fError(~h)| ≤ ~ |~h| h   Dg(f (p)) fErrorp (~h) |fError| ~ Since → 0 as h → 0, then we see that must ~ ~h h as well, since it is bounded above by a constant multiple of something which goes ~ h→0

107

38

Proof of the chain rule

to 0 (and bounded below by 0). For the other part of the error,   gError Df (p)(~h) + fError(~h) ~h   gError Df (p)(~h) + fError(~h) Df (p)(~h) + fError(~h) ) = |~h| Df (p)(~h) + fErrorp (~h)

The first factor in this expression goes to 0 as ~h → 0 because g is differentiable. So all we need to do is make sure that the second factor is bounded. Df (p)(~h) + fErrorp (~h)





|~h| Df (p)(~h) |~h|

+

|Df (p)|op ~h |~h|

= |Df (p)|op +

fError(~h)

+

by the triangle inequality |~h| fError(~h)

|~h| fError(~h) |~h|

Now the second term in this expression goes to 0 as ~h → 0 since f is differ1 entiable. So the whole expression is bounded by, say, |Df (p)|op + if ~h is small 2 enough. Now we are done! We have successfully shown that the nasty error term Error satisfies Error(~h) =0 lim ~ h→0 |~h| Thus g ◦ f is differentiable at p, and its derivative is given by Dg(f (p)) ◦ Df (p). QED

108

39

End of Week Practice

Practice doing computations. This section just contains practice problems on the material we have learned this week. These problems do not have detailed hints: clicking on the hint will immediately reveal the answer. Use them to test your knowledge. If you are confused about how to do any of these problems, ask your peers in one of the forums. 1

1

YouTube link: http://www.youtube.com/watch?v=4_Yuldynomc

109

40

Jacobian practice

Practice computing the Jacobian. Question 1

Compute the Jacobian of f : R2 → R3 defined by f (x, y) = (sin(xy), x2 y 3 , x3 ).

Solution 

Hint:

y cos(xy) J =  2xy 3 3x2

 x cos(xy) 3x2 y 2  0

Question 2 Compute the Jacobian of f : R4 − {y = 0} → R1 defined by x f (x, y, z, t) = x2 yz 3 t4 + . y Solution Hint:

 1 3 4 J = 2xyz t + y

x2 z 3 t4 −

x y2

3x2 yz 2 t4

4x2 yz 3 t3



Question 3 Compute the Jacobian of f : R2 − {(0, 0)} → R2 defined by f (x, y) = x y ( 2 , ). x + y 2 x2 + y 2 Solution y 2 − x2  (x2 + y 2 )2 J =  −2xy (x2 + y 2 )2 

Hint:

Question 4

 −2xy (x2 + y 2 )2   x2 − y 2  (x2 + y 2 )2

Compute the Jacobian of f : R → R4 defined by f (t) = (cos(t), sin(t), t, t2 ).

Solution 

Hint:

 − sin(t)  cos(t)   J =  1  2t

110

41

Gradient practice

Practice computing gradient vectors. Question 1

Compute the ∇f where f : R3 → R defined by f (x, y, z) = x2 + xyz.

Solution  2x + yz ∇f =  xz  xy 

Hint:

Question 2

Compute the ∇f where f : R4 → R defined by f (x, y, z, t) = cos(xy) sin(zt).

Solution 

Hint:

 −y sin(xy) sin(zt) −x sin(xy) sin(zt)  ∇f =   t cos(xy) cos(zt)  z cos(xy) cos(zt)

Question 3

Compute the ∇f where f : R2 − (0, 0) → R defined by f (x, y) =

Solution  1  y  ∇f =  −x  2 y 

Hint:

111

x . y

42

Linear approximation practice

Practice doing some linear approximations. Question 1

Let f : R2 → R2 be given by f (x, y) = (x2 − y 2 , 2xy).

Solution  Hint:

4.4 14.2



Use the linear approximation to f at (3, 2) to approximate f (3.1, 2.3). Give your answer as a column vector.

Question 2

Let f : R1 → R3 be given by f (t) = (t, t2 , t3 ).

Solution  Hint:

 1.1 1.2 1.3

Use the linear approximation to f at 1 to approximate f (1.1). Give your answer as a column vector.

Question 3

Let f : R3 → R2 be given by f (x, y, z) = (xy, yz).

Solution  Hint:

 2.1 6.1

Use the linear approximation to f at (1, 2, 3) to approximate f (1.1, 1.9, 3.2). Give your answer as a column vector.

112

43

Increasing and decreasing

Consider whether a function is increasing or decreasing. Question 1

Let f : R2 → R be given by f (x, y) = (x + 2y)2 + y 3

Solution Hint: By computing the directional derivative in that direction, we see that f is decreasing.   1 . At the point (−1, 5), is f increasing or decreasing in the direction −1 (a)

Increasing

(b)

Decreasing

Question 2

X

Let f : R3 → R be given by f (x, y, z) =

x+y z

Solution Hint: By computing the directional derivative in that direction, we see that f is decreasing. You could also see that f is decreasing by noting that the numerator is left unchanged when x and y are increased equally in opposite directions, but the denominator is increasing.   1 At the point (3, 2, 6), is f increasing or decreasing in the direction −1. 2 (a)

Increasing

(b)

Decreasing

Question 3

X

Let f : R4 → R be given by f (x, y, z, t) =

xy 2 + z2 t

Solution Hint: By computing the directional derivative in that direction, we see that f is decreasing.   1 1  At the point (0, 1, 5, −1), is f increasing or decreasing in the direction  −2. 3 (a)

Increasing

(b)

Decreasing

X

113

44

Stationary points

Find stationary points. Question 1 Let f : R2 → R2 be defined by f (x, y) = (x2 − x − y 2 , 2xy − y). There is one stationary point of f (one place where the derivative vanishes). What is this stationary point? Give your answer as a column vector

Question 2 Let f : R3 → R be defined by f (x, y, z) = x2 + xy + xz + z 2 . There is one stationary point of f (one place where the derivative vanishes). What is this stationary point? Give your answer as a column vector

Question 3

Let f : R2 → R3 be given by f (x, y) = (x2 , y −x,

p

x2 − 4x + 4 + y 2 )

Question 4 f is differentiable everywhere except for one point. What is that point? Give your answer as a column vector

114

45

Hypersurfaces

Find tangent planes and lines. Question 1 Hint: 6(x − 3) − 4(y − 2) = 0 Let f : R2 → R be defined by f (x, y) = x2 − y 2 . Question 2 The equation of the tangent line to the curve f (x, y) = 5 at the point (3, 2) is 0 = 6(x − 3) − 4(y − 2)

Question 3 Hint: 4(x − 3) − 2y − 6(z − 1) = 0 Let f : R3 → R be defined by f (x, y, z) = (x − z)2 − (y + z)2 . Question 4 The equation of the tangent plane to the surface f (x, y, z) = 3 at the point (3, 0, 1) is 0 = 4(x − 3) − 2y − 6(z − 1)

115

46

Use the chain rule

Compute using the chain rule. x Let f : R2 → R2 be defined by f (x, y) = (x2 y + x, ). Let g : R2 → y R3 be defined by g(x, y) = (x + y, x − y, xy). Question 1

Solution 1  b  2ab + 1 − 1  b  2a 3a2 + b  Hint:

2ab + 1 +

 a b2  a a2 + 2   b −a2  a2 −

b2

Use the chain rule to find the Jacobian of (g ◦ f ) at the point (a, b).

Question 2 Let f : R2 → R be defined by f (x, y) = x3 + y 3 . Let g : R → R2 be defined by g(x, y) = (x, sin(x)). Solution  Hint:

3a2 2 3a cos(a3 + b3 )

3b2 2 3b cos(a3 + b3 )



Use the chain rule to find the Jacobian of (g ◦ f ) at the point (a, b).

116

47

Abstraction

Not every vector space consists of lists of numbers. So far, we’ve been thinking of all of our vector spaces as being Rn . We now relax this condition by providing a definition of a vector space in general.

117

48

Vector spaces

Vector spaces are sets with a notion of addition and scaling. Until now, we have only dealt with the spaces Rn . We now begin the journey of understanding more general spaces. The crucial structure we need to talk about linear maps on Rn are addition and scalar multiplication. Addition is a function which takes a pair of vectors ~v , w ~ ∈ Rn n and returns a new vector ~v + w ~ ∈ R . Scalar multiplication is function which takes a vector ~v ∈ Rn and a scalar c ∈ R, and returns a new vector cv. Definition 1 A vector space is a set V equipped with a notion of addition, which is a function that takes a pair of vectors ~v , w ~ ∈ V and returns a new vector ~v + w, ~ and a notion of scalar multiplication, which is a function that takes a scalar c ∈ R and a vector ~v ∈ V and returns a new vector c~v ∈ V . These operations are subject to the following requirements: For each ~v , w ~ ∈ V , ~v + w ~ =w ~ + ~v

Commutativity Associativity

For each ~v , w, ~ ~u ∈ V , ~v + (w ~ + ~u) = (~v + w) ~ + ~u

Additive identity Additive inverse

There is a vector called ~0 ∈ V with ~v + ~0 = ~v for each ~v ∈ V . For each ~v ∈ V there is a vector w ~ ∈ V with v + w = ~0

Multiplicative identity For each ~v ∈ V , 1~v = ~v (here 1 is really the real number 1 ∈ R, and 1~v is the scalar product of 1 with v) Distributivity of scalar multiplication over vector addition V and c ∈ R, c(~v + w) ~ = c~v + cw ~ Distributivity of vector multiplication under scalar addition R and ~v ∈ V , (a + b)~v = a~v + b~v

For each ~v , w ~∈ For each a, b ∈

Let’s list off some nice examples of vector spaces. Example 2 Our old friend Rn is a vector space with the notions of vector addition and scalar multiplication we introduced in Week 1. Example 3 Let Poly2 be the set of all polynomials of degree at most 2 in one variable. For example 1 + 2x ∈ Poly2 and 3 + x2 ∈ Poly2 . Then the usual way of adding polynomials and multiplying them by constants turns Poly2 into a vector space. For example 2 (1 + 2x) + (3 + x2 ) = 5 + 4x + x2 . The next example will probably be the most important example for us. Thinking of matrices as points in a vector space will be useful for us when we start thinking about the second derivative. Example 4 Let Matnm be the collection of all n × m matrixes. Then Matnm is a vector space with the usual notion of matrix addition and scalar multiplication. The following is an important, but very small, first step into the world of functional analysis. 118

48

Vector spaces

Example 5 Let C([0, 1]) be the set of all continuous functions from [0, 1] to R. Then C([0, 1]) is a vector space with addition and scalar multiplication defined pointwise (f + g is the function whose value at x is f (x) + g(x), and cf is the function whose value at x is cf (x)). Realizing that solution sets of certain differential equations form vector spaces is important. Let V be the set of all smooth functions f : R → R which satisfy the differential d2 f df equation +3 + 4f (x) = 0. Addition is the usual addition of functions, 2 dx dx meaning that f + g denotes the function which sends x to f (x) + g(x). Scalar multiplication cf means the function that sends x to c f (x). With this, we can show that V is a vector space. df d2 f +3 + 4f (x) = 0? Is this What if we change the differential equation to dx2 dx still a vector space? Why or why not? We already know that function addition and scalar multiplication of functions satisfy all of the axioms of a vector space: what we do not know is whether function addition and scalar multiplication are well defined for solutions to this differential equation. We need to check that if f and g are solutions, and c ∈ R, then f + g and cf are as well. d2 d d2 f d2 g df df (f + g) + 3 (f + g) + 4(f + g)(x) = ‘+ 2 +3 +3 + 4f (x) + 4g(x) 2 2 dx dx dx dx dx dx d2 f df d2 g dg = + 3 + 4f (x) + +3 + 4g(x) 2 2 dx dx dx dx =0 So f + g is a solution to the DE if f and g are. d d2 f df d2 (cf (x)) + 3 (cf (x)) + 4cf (x) = c( 2 + 3 + 4f (x)) = 0 2 dx dx dx dx So cf is a solution to the DE if f is, and c ∈ R So addition and scalar multiplication are well defined on V , giving V the structure of a vector space. We are dealing with a level of abstraction here that you may not have met before. It is worthwhile taking some time to prove certain “obvious” (though they are not so obvious) statements formally from the axioms: Prove that in any vector space, there is a unique additive identity. In other words, if V is a vector space, and there are two elements ~0 ∈ V and ~00 ∈ V so that for each ~v ∈ V , ~v + ~0 = ~v and ~v + ~00 = ~v , then ~0 = ~00 . Every line of your proof should be justified with a vector space axiom! We will start with ~0, and use our vector space axioms to construct a string of equalities ending in ~00 ~0 = ~0 + ~00 because ~00 is an additive identity = ~00 + ~0 by the commutativity of vector addition = ~00 because ~0 is an additive identity 119

48

Vector spaces

So ~0 = ~00 . Prove that each element of a vector space has a unique (only one) additive inverse. Let ~v ∈ V . Assume that both w ~ 1 and w ~ 2 are both additive inverses of ~v . We will show that w ~1 = w ~2 w ~1 = w ~ 1 + ~0 by the definition of the additive identity =w ~ 1 + (~v + w ~ 2 ) because ~v and w ~ 2 are additive inverses = (w ~ + ~v ) + w ~ 2 by associativity of vector addition = ~0 + w ~ 2 because ~v and w ~ 1 are additive inverses =w ~ 2 by the definition of the additive identity Let V be a vector space. Prove that 0~v = ~0 for every ~v ∈ V . (Note: 0 means different things on the different sides of the equation! On the left hand side, it is the scalar 0 ∈ R, whereas on the right hand side it is the zero vector ~0 ∈ V ) 0~v = (0 + 0)~v nothing funny here: 0 + 0 = 0 = 0~v + 0~v by the distributivity of vector multiplication under scalar addition

So 0~v = 0~v + 0~v . Now let w ~ be the additive inverse of 0~v , and add it to both sides of the equation: 0~v + w = (0~v + 0~v ) + w ~ 0~v + w = 0~v + (0~v + w) ~ by the associativity of vector addition ~0 = 0~v + ~0 by the definition of additive inverses ~0 = 0~v by the definition of the additive identity QED Let V be a vector space. Prove that a~0 = ~0 for every a ∈ R a~0 = a(~0 + ~0) by definition of the additive identity = a~0 + a~0 by the distributivity of scalar multiplication over vector adddition So a~0 = a~0 + a~0 Let w ~ be the additive inverse of a~0. Adding w ~ to both sides we have   a~0 + w ~ = a~0 + a~0 + w ~   a~0 + w ~ = a~0 + ~0 + w ~ by associativity of vector addition ~0 = a~0 + ~0by definition of additive inverses ~0 = a~0 by definition of the additive identity 120

48

Vector spaces

QED Let V be a vector space. Prove that (−1)~v is the additive inverse of ~v for every ~v ∈ R. The proof of this uses the “rat poison principle”: if you want to show that something is rat poison, try feeding it to a rat! In this case we want to see if (−1)~v is the additive inverse of ~v , so we should try adding it to ~v . (−1)~v + ~v = (−1)~v + 1~v Multiplicative identity property = (−1 + 1)~v distributivity of vector multiplication over scalar addition = ~0 by one of the theorems

= 0~v So, indeed, (−1)~v is an additive inverse of ~v . We already proved uniqueness of additive inverses above, so we are done. We will often simply write −~v for the additive inverse of ~v in the future.

121

49

Linear maps, redux

Linear maps respect scalar multiplication and vector addition. Definition 1 linear map if

Let V and W be two vector spaces. A function L : V → W is a

Respects vector addition

For all ~v1 , ~v2 ∈ V , L(~v1 + ~v2 ) = L(~v1 ) + L(~v2 )

Respects scalar multiplication

For all c ∈ R and ~v ∈ V , L(c~v ) = cL(~v )

If the domain and codomain of a linear map are both V , then we may call it a linear operator to emphasize this fact. For instance, you might hear someone say “L : V → V is a linear operator.” Let V be the space of all polynomials in one variable x, and W = R. For each real number a ∈ R, define the function Evala : V → W defined by Evala (p) = p(a). Show that Evalc is a linear map. Let p1 , p2 ∈ V . Then Evala (p1 + p2 ) = p1 (a) + p2 (a) = Evala (p1 ) + Evala (p2 ). Also if c ∈ R, then Evala (cp) = cp(a) = cEvala (p). So Evala is a linear map. To make sure linear maps work the way we expect them to in this new context, and to flex our brains a little bit, let’s prove some facts about linear functions: Let L : V → W be a linear map. Show that L(~0) = ~0. (Note: ~0 means different things on either side of the equation. On the LHS it means the additive identity of V , while on the RHS it means the additive identity of W ). L(~0) = L(0~0) we proved in the last section that 0~v = ~0 for any ~v = 0L(~0) because L respects scalar multiplication = ~0 by the same reasoning quoted above Another way to do this would be by starting with L(~0) = L(~0 + ~0) and using the fact that L respects vector addition. Try this proof out too! Let V and W be vector spaces, and define a function Zero : V → W by Zero(~v ) = ~0 for all ~v ∈ V . Show that Zero is a linear function. Let ~v1 , ~v2 ∈ V . Then Zero(~v1 + ~v2 ) = ~0 = ~0 + ~0 = Zero(~v1 ) + Zero(~v2 ) So Zero respects vector addition Let ~v ∈ V and c ∈ R. Zero(c~v ) = ~0 = c~0 = cZero(~v ) So Zero respects scalar multiplication. 122

50

Python

Certain Python functions form a vector space? Let P be the collection of all Python functions f with the properties that •

f accepts a single numeric parameter,



f returns a single numeric parameter, and



no matter what number x is, the function call f(x) successfully returns a number.

We’ll say that two Python functions are “equal” if they produce the same outputs for the same inputs. Now the collection P (arguably) forms a vector space. I say “arguably” because “numbers” in Python aren’t real numbers, but let’s just play along and pretend that they are. Question 1

What function plays the role of ~0 in P?

Solution Python 1 2

def zero(x): # return ?

3 4 5

def validator(): return (zero(17) == 0)

Suppose we have two functions f and g. What is their sum? Solution 1 2

Python def vector_sum(f,g): # return a new Python function which is the sum of f and g

3 4 5

def validator(): return (vector_sum(lambda x: x**2, lambda x: x**3)(3) == 36)

Now suppose we have a function f and a scalar c. What is c times f? Solution 1 2

Python def scalar_multiple(c,f): # return a new Python function which is c*f

3 4 5

def validator(): return (scalar_multiple(17, lambda x: x**2)(2) == 68)

Now suppose we have a function f and a point a. The map evaluation map sends f ∈ P to the value f(a). Solution

123

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Python

Python def scalar_multiple(c,f): return (lambda x: c * f(x)) def vector_sum(f,g): return (lambda x: f(x) + g(x)) def evaluation_map(a,f): # return the value of f at the point a # # Now note that evaluation_map(a,vector_sum(f,g)) = evaluation_map(a,f) + evaluation_map(a,g) # f = lambda x: x**2 g = lambda x: x**3 a = 3 print evaluation_map(a,vector_sum(f,g)) print evaluation_map(a,f) + evaluation_map(a,g)

15 16 17

def validator(): return (evaluation_map(17,(lambda x: x**2)) == 289)

This is an example of the fact that “evaluation at a” is a linear map from P to the underlying number system. Finally, some food for thought: in a little while, we’ll be thinking about “dimension.” Keep in mind the following question: what is the dimension of the vector space P? You may want to think about this in the ideal world where “Python functions” take honest real numbers as their input and outputs.

124

51

Bases

Basis vectors span the space without redundancy. In our study of the vector spaces Rn , we have  relied quite heavily on the “standard     0   0 1 0 0   1 0  ..  1             basis vectors” ~e1 = 0 , ~e2 = 0 , ~e3 = 0 , . . . , ~en =  . . We’ll write E\ for    ..   ..  0  ..  . . . 1 0 0 0 the collection of vectors (~e1 , ~e2 , . . . , ~en ). A great feature of these vectors is that they span all of Rn : every vector ~v ∈ Rn can be written in the form ~v = x1~e1 + x2~e2 + · · · + xn~en . What is even better is that this representation is unique: if I also have that ~v = y1~e1 + y2~e2 + · · · + yn~en , then x1 = y1 , x2 = y2 , . . . , xn = yn . Our goal in this section will be to find similarly nice sets of vectors in an abstract vector space. Question 1 the form

A linear combination of two vectors ~v and w ~ is an expression of α~v + β w ~

for some numbers α, β ∈ R.

    1 3 ~ = 5? Which of the following vectors is a linear combination of ~v = 2 and w 1 1

Solution   1 (a) −8 −1   −1 (b)  8 1

X

So not every vector in R3 is a linear combination of ~v and w. ~ Vectors which are a linear combination of ~v and w ~ are said to be in the “span” of ~v and w. ~ Let’s make this more general for more than just two vectors.

Definition 2 The span of an ordered list of vectors (~v1 , ~v2 , . . . , ~vn ) is the set of all linear combinations of the ~vi . Span(~v1 , ~v2 , . . . , ~vn ) = {a1~v1 + a2~v2 + · · · + an~vn : ai ∈ R} 125

51

Bases

Question 3 Solution (a) Yes. (b) No.

Do

    1 1 and together span the vector space R2 ? 1 0

X

  1 Indeed, every vector in R can be written as a linear combination of and 1   1 . Prove it! 0       1 0 1 Since we already know that and span R2 , it is enough to show that 0 1 1     1 0 and span these two vectors, i.e. we need only show that is in the span of 0 1 these two  vectors.      0 1 1 But = + −1 , so we are done. 1 1 0 To be a bit more explicit, we can write any vector       x 1 0 =x +y y 0 1       1 1 1 =x +y + −1 0 1 0     1 1 = (x − y) +y 0 1     1 1 So we have expressed any vector as a linear combination of and . 1 0 2

Question 4

Can every polynomial of degree at most 2 be written in the form α(1) + β(x − 1) + γ(x − 1)2 ?

Solution (a) Yes. (b) No.

X

In other words: the polynomials 1, x − 1, and (x − 1)2 span the vector space of polynomials of degree at most 2. Prove it! Let p(x) = a0 + a1 x + a2 x2 . Then p(x) = a0 + a1 [(x − 1) + 1] + a2 [(x − 1) + 1]2 = a0 + a1 (x − 1) + a1 + a2 [(x − 1)2 + 2(x − 1) + 1] = (a0 + a1 + a2 )1 + (a1 + 2a2 )(x − 1) + a2 (x − 1)2 so we have expressed every polynomial of degree at most 2 as a linear combination of 1, (x − 1) and (x − 1)2 . You could also solve this problem by appealing to Taylor’s theorem in one variable calculus. Can you see how?

126

52

Dimension

Basis vectors span the space without redundancy. Definition 1 A vector space is called finite dimensional if it has a finite list of spanning vectors. A space which is not finite dimensional is called infinite dimensional. Question 2

The space P of all polynomials in one variable x is:

Solution (a)

Infinite dimensional.

(b)

Finite dimensional.

X

Can you prove it? Suppose that P were finite dimensional, and then deduce a contradiction to show that it is impossible. Let p1 , p2 , . . . , pn be a finite list of vectors. Since this list of polynomials is finite they must be bounded in degree, i.e. the degree of pi must be less than some k for each i. But a linear combination of polynomials of degree at most k is also of degree at most k. So the polynomial xk+1 6∈ Span(p1 , p2 , . . . , pn ). Thus no finite list of polynomials spans all of P . So P is infinite dimensional.

Definition 3 Let V be a vector space. An ordered list of vectors (v1 , v2 , . . . , vn ) where all the vi ∈ V is linearly independent if a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn implies that a1 = b1 , a2 = b2 , . . . , an = bn . In other words, every vector in the span of (v1 , v2 , . . . , vn ) can be expressed as a linear combination of the vi in only one way. If the set of vectors is not linearly independent it is linearly dependent. Show that the following alternative definition for linear independence is equivalent to our definition: Definition 4 Let V be a vector space. An ordered list of vectors (v1 , v2 , . . . , vn ) where all the vi ∈ V is called linearly independent if a1 v1 + a2 v2 + · · · + an vn = ~0 implies that ai = 0 for all i = 1, 2, 3, . . . , n. Let us say that our original definition is of being linearly independent in the first sense, while this second definition is being linearly independent in the second sense. If a list of vectors (v1 , v2 , . . . , vn ) is linearly independent in the first sense, then if a1 v1 + a2 v2 + · · · + an vn = ~0 we have a1 v1 + a2 v2 + · · · + an vn = 0v1 + 0v2 + · · · + 0vn , so by the definition of linear independence in the first sense, we have a1 = a2 = · · · = an = 0. On the other hand, if (v1 , v2 , . . . , vn ) are linearly independent in the second sense, then if a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn we have (a1 − b1 )v1 + (a2 − b2 )v2 + · · · + (an − bn )vn = ~0, so ai − bi = 0 for each i. Thus ai = bi for each i, proving that the list was linearly independent in the first sense. Often this definition is easier to check, although it does not capture the “meaning” of linear independence as well as the first definition. 127

52

Dimension

Prove that any ordered list of vectors containing the zero vector is linearly dependent. We can see immediately from the second definition that since 1~0 = ~0, but 1 6= 0, that the list cannot be linearly independent Prove that an ordered list of length 2 (i.e. (v1 , v2 )) is linearly dependent if and only if one vector is a scalar multiple of the other. For v1 and v2 to be linearly dependent there must be two scalars a, b ∈ V with av1 + bv2 = 0 with at least one of a or b nonzero. Let us assume (without loss of generality) that a 6= 0. Then −b v2 . Thus one vector is a scalar multiple of the other. av1 = −bv2 , so v1 = a Theorem 5 If (v~1 , v~2 , v~3 , . . . , v~n ) is linearly dependent in V and v~1 6= ~0, then one of the vectors vj is in the span of v~1 , v~2 , . . . , vj−1 ~ Prove this theorem. Since (v~1 , v~2 , v~3 , . . . , v~n ) is linearly dependent, by definition there are scalars ai ∈ R with a1 v~1 + a2 v~2 + · · · + an v~n = 0, and not all of the aj = 0. Let j be the largest element of 2, 3, . . . , n so that aj is not equal to 0. Then we have a1 a2 a3 aj−1 vj = − v~1 − v~2 − v~3 − · · · − vj−1 ~ . So vj is in the span of v~1 , v~2 , . . . , vj−1 ~ . aj aj aj aj−1 If 2 vectors v~1 , v~2 span V , is it possible that the three vectors w~1 , w~2 , w~3 are linearly independent? Warning 6

This is harder to prove than you might think!

No! Assume to the contrary that w~1 , w~2 , w~3 are linearly independent. Since the list (v1 , v2 ) spans V , the list (w1 , v1 , v2 ) is linearly dependant. Thus by the previous theorem, either v1 is in the span of w1 , or v2 is in the span of (w1 , v1 ). In either case we get that (w1 , v) spans V , where v is either v1 or v2 . Now apply the same trick: (w2 , w1 , v) must span V . So by the previous theorem, either w1 is in the span of w2 or v is in the span of w2 , w1 . w2 cannot be in the span of w1 because the w’s are linearly independent. So v is in the span of w2 , w1 . So (w2 , w1 ) spans V . But then w3 is in the span of (w2 , w1 ), contradicting the fact that it is linearly independent from those two vectors. We have arrived at our contradiction. Therefore, w1 , w2 , w3 cannot be linearly independent. This problem generalizes: Theorem 7 The length of a linearly independent list of spanning vectors is less than the length of any spanning list of vectors. Prove this theorem We will follow the same procedure that we did above. Assume (v1 , v2 , . . . , vn ) is a list of vectors which spans V , and (w1 , w2 , . . . , wm ) is a linearly independent list of vectors. We must show that m < n. (w1 , v1 , v2 , . . . , vn ) is linearly dependent since w1 is in the span of the vi . By the theorem above, we can remove on of the vi and still have a spanning list of length n. Repeating this, we can always add one w vector to the beginning of the list, while deleting a v vector from the end of the list. This maintains a list of length n which spans all of V . We know that it must be a v which gets deleted, because the ws are all linearly independent. If m > n, then at the nth stage of this process we 128

52

Dimension

obtain that (w1 , w2 , . . . , wn ) spans all of V , which contradicts the fact that wn+1 is supposed to be linearly independent from the rest of the w. Definition 8 An ordered list of vectors B = (v~1 , v~2 , v~3 , . . . , v~n ) is called a basis of the vector space V if B is both spans V and is linearly independent. Let V be a finite dimensional vector space. Show that V has a basis. Let (v1 , v2 , . . . , vn ) be a spanning list of vectors (which exists and is finite since V is finite dimensional). If this list is linearly dependent we can go through the following process: For each i if vi ∈ Span(v1 , v2 , . . . , vi−1 ), delete vi from the list. Note that this also covers the 1st case: if v1 = 0 delete it from the list. At the end of this process, we have a list of vectors which span V , and also no vector is the span of all the previos vectors. By the theorem above, the list is linearly independent. So this new list is a basis for V . Note: Let V be a finite dimensional vector space. Then every basis of V has the same length. In other words, if v~1 , v~2 , . . . , v~n is a basis and w~1 , w~2 , . . . , w~m is a basis, then n = m. This follows because we have already proven that n ≤ m and m≤n Definition 9 We say that a finite dimensional vector space has dimension n if it has a basis of length n. Let p1 , p2 , . . . , pn , pn+1 be polynomials in the space Pn of all polynomials of degree at most n. Assume pi (3) = 0 for i = 1, 2, . . . , n. Is it possible that p1 , p2 , . . . , pn , pn+1 are all linearly independent? Why or why not? No. If p1 , p2 , . . . , pn , pn+1 were all linearly independent then they would form a basis of Pn , since Pn has dimension n + 1. But every polynomial in the span of the pi must evaluate to 0 at x = 3, while some polynomials in Pn do not evaluate to 0 at x = 3, (for example, the polynomial x).

129

53

Matrix of a linear map

Matrices record where basis vectors go. In our first brush with linear algebra, we only dealt with linear maps between the spaces spaces Rn for varying n. For those maps and those spaces, the convenient standard basis allowed us to record linear maps using the finite data of a matrix. In this section we will see that a similar story plays out for maps between finite dimensional vector spaces: they too can be described by a matrix, but only after making a choice of “basis” on the domain and codomain. Question 1 Let V be a vector space with basis (~v1 , ~v2 , ~v3 ) and W be a vector space with basis (w ~ 1, w ~ 2 ). Suppose there is a linear map L : V → W for which L(~v1 ) = 3w ~ 1 + 2w ~ 2, L(~v2 ) = 3w ~ 1 − 2w ~ 2 , and L(~v3 ) = w ~1 + w ~ 2. In light of all this, compute L(2~v1 ). But how will we write down our answer? Where does L(2~v1 ) live? Solution (a)

In W .

(b)

In V .

X

And since we know that L(2~v1 ) ∈ W , we’ll write our answer as αw ~ 1 + βw ~ 2 for some numbers α and β. So say L(2~v1 ) = αw ~ 1 + βw ~ 2. Solution

In this case, α = 6.

Solution

And β = 4.

Next compute L(2~v1 + 3~v2 − 4~v3 ) = αw ~ 1 + βw ~ 2. Solution Hint:

Use the fact that L(2~v1 + 3~v2 − 4~v3 ) = L(2~v1 ) + L(3~v2 ) − L(4~v3 ).

Hint:

Further use the fact that L(2~v1 )+ L(3~v2 ) −L(4~v3 ) = 2 L(~v1 )+3 L(~v2 )−4 L(~v3 ).

In this case, α = 11 but β = −6.

What we are seeing is an instance of the following observation. Observation 2 If L : V → W is a linear map, and you know the value of L(vi ) for each vector in the basis (~v1 , ~v2 , . . . , ~vn ) of V , then you can compute L(~v ) for any ~v ∈ V . And by “compute,” I mean you can write down L(~v ) in terms of a basis of W . 130

53

Matrix of a linear map

Definition 3 Let L : V → W be a linear map between finite dimensional vector spaces, let BV = (~v1 , ~v2 , . . . , ~vn ) be a basis for V , and let BW = (w ~ 1, w ~ 2, . . . , w ~ m) be a basis for W . Then L(~ vi ) = ai,1 w ~ 1 + ai,2 w ~ 2 + · · · + ai,m w ~ m. Then the matrix with respect to the bases BV and BW is the matrix M whose entry in the ith column and j th row is ai,j            ! 1 0 0 1 1 Question 4 Let B1 = , and B2 = 0 , 0 , 1  be 1 E 0 E 2 2 0 E 1 E 0 E 3 3 3 2 3 bases for R and R , respectively. Solution Hint:

The first column of the matrix will be

  1 but written with respect to the 1 E 2

basis B2 . Remember the order of vectors in the basis matters. Hint:

    ! 2 1 L = 1 1 E 2 0 E 3       1 0 0 = 2 0 + 0 0 + 1 1 0 E 1 E 0 E 3

Hint:

3

3

  2 So the first column of the matrix is 0 . 1 B 2

Hint:

  1 Similarly, the second column is 0 . 1 B 2

Hint:

 2 So the matrix of this linear map is 0 1

 1 0 1

    ! x+y x What is the matrix for the linear map L =  x  with respect to the y E 2 0 E3 bases B1 and B2 ?

131

53

Matrix of a linear map

Question 5 Let P2 be the space of polynomials of degree at most 2. Let B0 = (1, x, x2 ) and B1 = (1, (x − 1), (x − 1)2 ). Consider the map L : P2 → R given by L(p) = p(1). Solution Hint: L(1) = 1 L(x) = 1 L(x2 ) = 12 = 1  The matrix of L with respect to B0 is 1

Hint:

1

1



What is the matrix of this linear map with respect to the basis B0 ? Solution Hint: L(1) = 1 L(x − 1) = 1 − 1 = 0 L((x − 1)2 ) = (1 − 1)2 = 0  The matrix of L with respect to B1 is 1

Hint:

0

0



What is the matrix of this linear map with respect to the basis B1 ?

Let P3 be the space of polynomials of degree at most 3. Let B = d p(x). This map (1, x, x , x ). Consider the map L : P3 → P3 given by L(p(x)) = dx is linear (why?).

Question 6 2

3

Solution Hint: L is linear because the derivative of a sum of two functions is the sum of the derivative of the two functions, and since the derivative of a constant times a function is the constant times the derivative of the function. Hint: L(1) = 0 L(x) = 1 L(x2 ) = 2x L(x3 ) = 3x2

Hint:

 0 0  The matrix of L is  0 0

1 0 0 0

0 2 0 0

 0 0  3 0

What is the matrix for L with respect to the basis B?

132

54

Subspaces

A subspace is a subset of a vector space which is also a vector space. Definition 1 A subset U of a vector space V is a subspace of V if U is a vector space with respect to the scalar multiplication and the vector addition inherited from V . Question 2

Which of the following is a subspace of R2 ?

Solution Hint:

The vectors

        0 1 0 1 is not + are both on the line `, but the sum and 1 2 1 2

on the line `. Hint:

So ` is not a subspace.

Hint:

The set P consists of a single vector.

Hint:

  0 But the vector in P is not the origin . 0

Hint:

So

Hint:

So P is not a subspace.

Hint:

By process of elimination, the x-axis must be a subspace. Is it really?

Hint:

Yes, if I multiply any vector

Hint:

And if I add together two vectors of the form

    1 1 ∈ P but 10 · 6∈ P . 2 2

  x by a scalar, it is still on the x-axis. 0

  x , the result is still on the 0

x-axis. (a) (b) (c)

The x-axis.

X   1 The set P = . 2    x The line ` = ∈ R2 : y = x + 1 . y

Let’s look at some more examples! Which of the following is a subspace of R2 ? Solution

133

54

Subspaces

    1 1 is not in A, so a ∈ A, but −1 · 0 0 scalar multiple of something in A need not be in A.

Hint:

The set A is not a subspace because

Hint:

The set C is not a subspace because even though it is closed   under scalar 1 1 are and multiplication (check this!) it is not closed under vector addition, since 2 −2   2 is not (draw a picture of this example!). both in C, but their sum 0

Hint:

As the only choice left, B must be a subspace.   2 The reason is that it is just the span of the vector , and as such, is closed under 1 scalar multiplication and vector addition.   x : x > 0 and y > 0} y   x The set B = : x = 2y X y   x The set C = { : |y| < |x|} y The set A = {

(a) (b) (c)

Solution Hint:

Question 3

What about the line y = 3? Does it form a subspace of R2 ?

Solution (a)

Yes.

(b)

No.

X

  0 That’s right; the tip of the vector is on that line, but the scalar multiple of that 3     0 0 vector, like 2 · = , is not on the line. 3 6 So when do the points on a line in R2 form a subspace? (a)

When the line passes through the point (0, 0).

(b)

When the line is parallel to the x-axis.

X

This is an important observation. Observation 4 vector” is in U .

134

Suppose U is a subspace of a vector space V . Then the “zero

55

Kernel

A kernel is everything sent to zero. There are some special subspaces that we will want to pay attention to. Theorem 1 defined by

If L : V → W is a linear transformation, then the kernel of L, ker(L) = {~v ∈ V : L(~v ) = ~0}

is a subspace of V . You may also hear this referred to as the null space of L. Prove this theorem that ker L is a subspace. We only need to show that ker(L) is closed under scalar multiplication and vector addition. For any ~v ∈ ker(L) and c ∈ R, L(c~v ) = cL(~v ) = c~0 = ~0 so c~v ∈ ker(L). If ~v , w ~ ∈ ker(L), then L(~v + w) ~ = L(~v ) + L(w) ~ = ~0 + ~0 = ~0 so ~v + w ~ ∈ ker(L) Thus ker(L) is a subspace! Question 2

Let L : R3 → R2 be the linear map whose matrix is

 2 1

3 0

 1 −1

Solution Hint:   1 −1 1

Just by evaluating all three, we see the only one which gets sent to

Which of the following vectors is in the kernel of L?   1 (a) −1 X 1   3 (b) 2 0

135

  0 by L is 0

55

(c)

Kernel   0 0 2

Theorem 3

A linear map L : V → W is injective if and only if ker(L) = {~0}.

Definition 4 The word “injective” is an adjective meaning the same thing as “one to one.” In other words, a function f : A → B is injective if f (a1 ) = f (a2 ) implies a1 = a2 . Prove this theorem. Let L be injective. Then L(~v ) = ~0 implies L(~v ) = L(~0). Since L is injective, this implies ~v = ~0. Thus the only element of the kernel is ~0. On the other hand, if ker(L) = {~0}, then if L(v~1 ) = L(v~2 ), then L(v~1 − v~2 ) = ~0, so v~1 − v~2 is in the null space, and hence must be equal to ~0. But then we can conclude that v~1 = v~2 Definition 5

The dimension of the kernel of L is the nullity of L.

Be careful to observe that ker L is a subspace, while dim ker L is a number, so the nullity of L is just a number.

136

56

Image

The “image” is every actual output. Definition 1

If L : V → W is a linear transformation, then the image of L is Imag(L) = {w ~ ∈ W : ∃~v ∈ V, L(~v ) = w} ~

. Remember to read ∃ as “there exists.” Warning 2 Some people may call this the “range.” Some other people use the word “range” for what we’ve been calling the codomain. The result is that, in my opinion, the word “range” is now overused, so we give up and never use the word. Question 3

Suppose L : R2 → R3 , and suppose that     3 1 L = 2 , and 0 1     1 0 L = 1 . 1 1

  2  What is a vector ~v ∈ R so that L(~v ) = 1? 0 2

Solution Hint:

      2 3 1 Use the fact that 1 = 2 − 1. 0 1 1

Hint:

  2 In other words, 1 = L(~e1 ) − L(~e2 ). 0

Hint:

By linearity of L, we have L(~e1 ) − L(~e2 ) = L(~e1 − ~e2 ).

Hint:

    2 1   And so a vector in the domain which is sent to 1 is the vector . −1 0

This is a special case of a general fact: if we have two vectors in the image, then their sum is in the image, too. Theorem 4

The image of a linear map is a subspace of the codomain. 137

56

Image

Prove this. If w ∈ Imag(L), then there is a v ∈ V with L(v) = w. L(cv) = cL(v) = cw, so cw ∈ Imag(L) for any c ∈ R. Thus Imag(L) is closed under scalar multiplication. If w1 , w2 ∈ Imag(L), then there are v1 , v2 ∈ V with L(v1 ) = w1 and L(v2 ) = w2 . L(v1 + v2 ) = L(v1 ) + L(v2 ) = w1 + w2 , so w1 + w2 ∈ Imag(L). Thus Imag(L) is closed under vector addition. We finish with some terminology. Definition 5

The dimension of the image of L is the rank of L.

Be careful to observe that the image of L is a subspace, while the dimension of the image of L is a number, so the rank of L is just a number. Consider the linear map L : R2 → R3 given by the matrix   2 1 4 2 . 6 3

Question 6

Solution Not every vector in R3 is the image of L.

Hint: Hint:

Let’s think about which vectors are in the image of L.   2 Question 7 Is 4 in the image of L? 6 Solution (a)

Yes.

(b)

No.

X

    2 1 In fact, 4 = L . 0 6   1 Is 1 in the image of L? 1 Solution (a)

Yes.

(b)

No.

X 

 x But how can we tell? The only things in the image of L are vectors of the form 2x 3x    1 for some x ∈ R. This is the span of 2. 3 So what is the dimension of the vector space spanned by this single vector?

138

56

Image

Solution (a)

0

(b)

1

(c)

2

X

And so the rank is one.

The rank of L is 1.

139

57

Rank nullity theorem

Rank plus nullity is the dimension of the domain. Theorem 1 (Rank-Nullity) If L : V → W is a linear transformation, then the sum of the dimension of the kernel of L and the dimension of the image of L is the dimension of V . The dimension of the kernel is sometimes called the “nullity” of L, and the dimension of the image is sometimes called the “rank” of L. Hence the name “rank-nullity” theorem. Prove this theorem Warning 2

This is hard!

Let v1 , v2 , v3 , ..., vm be a basis of ker(L). We can extend this to a basis of V , v1 , v2 , ..., vn , u1 , u2 , ..., uk . We need to show that We will be done if we can show that L(u1 ), L(u2 ), ..., L(uk ) form a basis of Im(L). Let w ∈ Im(L). Then w = L(v) for some v ∈ V . Since v1 , v2 , ..., vn , u1 , u2 , ..., uk is a basis of V , we can write

w = L(a1 v1 + a2 v2 + ... + an vn + b1 u1 + ... + bk uk ) = a1 L(v1 ) + ... + an L(vn ) + b1 L(u1 ) + ... + bk L(uk ) = b1 L(u1 ) + ... + bk L(uk )

So Im(L) is spanned by the L(ui ). Now we need to see that the L(ui ) are linearly independent. Assume b1 L(u1 ) + b2 L(u2 ) + ... + bk L(uk ) = 0. Then L(b1 u1 + ... + bk uk ) = 0. Then b1 u1 + ... + bk uk would be in the null space of L. But the ui were chosen specifically to be linearly independent of all of the vectors in the null space. So b1 = b2 = ... = bk = 0. Thus the L(ui ) are linearly independent and we are done.

140

58

Eigenvectors

Eigenvectors are mapped to multiples of themselves. Definition 1 Let L : V → V be a linear map. A vector ~v ∈ V is called an eigenvector of L if L(~v ) = λ~v for some λ ∈ R. A constant λ ∈ R is called an eigenvalue of L if there is a nonzero eigenvector ~v with L(~v ) = λ~v . Geometrically, eigenvectors of L are those vectors whose direction is not changed (or at worst, negated!) when they are transformed by L. Let’s try some examples. Suppose L : R2 → R2 is the linear map represented by the matrix   3 2 . 4 1

Question 2

Which of these vectors is an eigenvector of L? Solution  Hint:

Question 3

What is L

1 −1

 ?

Solution  Hint:

We want to compute

Hint:

In this case,



Is

3 4

2 1



3 4

2 1



 1 . −1

   1 1 = . −1 3

    1 1 a multiple of ? 3 −1

Solution (a)

No.

X

(b)

Yes.

X 

Consequently,

(a) (b)

   1 1 is not an eigenvector. The eigenvector must be . −1 1

  1 X 1   1 −1

        1 5 1 1 That’s right! Note that L is = 5· , and so is an eigenvector. 1 5 1 1 141

58

Eigenvectors

Solution  Hint:

Try computing L

Hint:

In this case, L



1 −2

 .

   1 −1 = . −2 2

   −1 1 . =λ· Find λ ∈ R so that 2 −2 

Hint:

Question 4

Solution Hint:

The sign is opposite on both sides of the equation.

Hint:

So try λ = −1.

λ = −1 

 1 And so −1 is an eigenvalue, with eigenvector . −2 Which of the following is another eigenvector?   1 (a) X −2   2 (b) −1

Rock on! We check that       1 −1 1 L = = −1 · −2 2 −2  and so

142

 1 is an eigenvector with eigenvalue −1. −2

59

Eigenvalues

Eigenvalues measure how much eigenvectors are scaled. Definition 1 Let L : V → V be a linear operator (NB: linear maps with the same domain and codomain are called linear operators). The set of all eigenvalues of L is the spectrum of L. Let’s try finding the spectrum.   1 2 Question 2 Let L : R → R be the linear map whose matrix is with 2 1 respect to the standard basis. L has two different eigenvalues. What are they?  λ1 Give your answer in the form of a matrix , where λ1 ≤ λ2 . λ2 2

2

Solution 

Hint:

1 For lambda to be an eigenvalue we need 2

Hint:

( x + 2y = λx This is the same as 2x + y = λy

2 1

    x x =λ y y

or ( (1 − λ)x + 2y = 0 2x + (1 − λ)y = 0

Hint: These are two lines passing through the origin. To have more than just the origin as a solution, we need that the slope of the two lines is the same. So 2 1−λ = 2 1−λ Hint: 1−λ 2 = 2 1−λ (1 − λ)2 = 4 1 − λ = ±2 lambda = −1 or 3 Hint:

Let us now check that these really are eigenvalues: 

If we let λ = −1, we have the equation 2x + 2y = 0. Check that

 1 is an eigenvector −1

with eigenvalue −1 If we let λ = 3, we have the equation 2x − 2y = 0. Check that with eigenvalue 3

143

  1 is an eigenvector 1

59

Eigenvalues

Question 3 Let’s try another example. Suppose F : R2 → R2 is the linear map represented by the matrix   0 −1 . 1 0 Which of these numbers is an eigenvalue of F ? Solution   x is an eigenvector. y

Hint:

Let’s suppose that

Hint:

Then there is some λ ∈ R so that

Hint:

But





−1 0

0 1

0 1

−1 0

    x x . =λ y y

    x −y = . y x



Hint:

   −y x And so =λ . x y

Hint:

This means that −y = λx and x = λy.

Hint:

Putting this together, −y = λ2 y and x = −λ2 x.

Hint: Since we are looking for a nonzero eigenvector (in order to have an eigenvalue), we must have that either x 6= 0 or y 6= 0. Hint:

Consequently, λ2 = −1.

Hint: But there is no real number λ ∈ R so that λ2 = −1, since the square of any real number is nonnegative. Hint: (a) (b) (c) (d)

Therefore, there is no real eigenvalue. There is no real eigenvalue. −1 √ 2 1

X

Perhaps surprisingly, not every linear operator from Rn to Rn has any real eigenvalues. Geometrically, what is this linear map F doing? Solution (a) Rotation by 90◦ counterclockwise. (b) Rotation by 90◦ clockwise. (c) Rotation by 180◦ .

X

This geometric fact also explains why there is no eigenvalue: what would be the corresponding eigenvector whose direction is unchanged by applying F ? Every vector is moved by a rotation! The additional fact that there are imaginary solutions to λ2 = −1 is hinting that i should have something to do with rotation, too.

144

60

Eigenspace

An eigenspace collects together all the eigenvectors for a given eigenvalue. Theorem 1 Let λ be an eigenvalue of a linear operator L : V → V . Then the set Eλ (L) = {v ∈ V : L(v) = λv} of all (including zero) eigenvectors with eigenvalue λ forms a subspace of V . This subspace is the eigenspace associated to the eigenvalue λ. Prove this theorem. We need to check that Eλ (L) is closed under scalar multiplication and vector addition If v ∈ Eλ (L), and c ∈ R, then L(cv) = cL(v) = cλv = λ(cv), so cv is also an eigenvector of L. If v1 , v2 ∈ Eλ (L), then L(v1 + v2 ) = λv1 + λv2 = λ(v1 + v2 ), so v1 + v2 is also an eigenvector of L. The kernel of L is the eigenspace of the eigenvalue 0.

145

61

Eigenbasis

An eigenbasis is a basis of eigenvectors. Observation 1 If (v1 , v2 , ..., vn ) is a basis of eigenvectors of a linear operator L, then the matrix of L with respect to that basis is diagonal, with the eigenvalues of L appearing along the diagonal. Theorem 2 Let L : V → W be a linear map. If v1 , v2 , ..., vn are nonzero eigenvectors of L with distinct eigenvalues λ1 , λ2 , ...λn , then (v1 , v2 , ..., vn ) are linearly independent. Prove this theorem. Assume to the contrary that the list is linearly dependent. Let vk be the first vector in the list which is in the span of the preceding vectors, so that the vectors (v1 , v2 , ..., vk−1 ) are linearly independent. Let a1 v1 + a2 v2 + ... + ak−1 vk−1 = vk . Then applying L to both sides of this equation we have a1 λ1 v1 + a2 λ2 v2 + ... + ak−1 λk−1 vk−1 = λk vk . If we multiply the first equation by λk we also have a1 λ1 v1 + a2 λ1 v2 + ... + ak−1 λ1 vk−1 = λ1 vk . Subtracting these two equations we have a1 (λk − λ1 )v1 + a2 (λk − λ2 )v2 + ... + a3 (λk − λk−1 )vk = 0. Since the vectors (v1 , v2 , ..., vk−1 ) are linearly independent, we must have that ai (λk − λi ) = 0. But λk 6= λi , so ai = 0 for each i. Looking back at where the ai came from, we see that this implies that vk = 0. This contradicts the assumption that the vj were all nonzero. So our assumption that the list was linearly dependent was absurd, hence the list is linearly independent. A corollary of this theorem is that if V is n dimensional and L : V → V has n distinct eigenvalues, then the eigenvectors of L form a basis of V . The matrix of the operator with respect to this basis is diagonal.

146

62

Python

We can find eigenvectors in Python. Let’s suppose I have an n × n example, suppose  6 M = 4 3

matrix M , expressed in Python as a list of lists. For 4 5 2

 3 2 = [[6,4,3],[4,5,2],[3,2,7]]. 7

Further suppose that the matrix M = (mij ) is symmetric, meaning that mij = mji . I’d like to compute an eigenvector of M quickly. Question 1

Here’s a procedure that I’d like you to code in Python:

(a)

Start with some vector ~v .

(b)

Replace ~v with the result of applying the linear map LM to the vector ~v .

(c)

Normalize ~v so that it has unit length.

(d)

Repeat many times.

You can try print eigenvector([[6,4,3],[4,5,2],[3,2,7]]) to see what happens in the case of the matrix above. Solution Python 1 2 3 4 5 6 7

def # v # # # #

eigenvector(M): start with a random vector v = [1] * len(M[0]) for many, many times replace v with Mv normalize v return v

8 9 10 11 12 13 14 15

def validator(): v = eigenvector([[6, 5, 5], [5, 2, 3],[5, 3, 8]]) if abs((v[1] / v[0]) - 0.6514182851) > 0.01: return False if abs((v[2] / v[0]) - 1.0603152077) > 0.01: return False return True

Can you use your program to find, numerically, an eigenvector of the matrix M?

147

63

Cayley-Hamilton theorem

Sometimes eigen-information reveals quite a bit about linear operators. We will not be proving—or even stating!—the Cayley-Hamilton theorem1 , but there is one very special case which provides a nice activity. This activity will force us to think about bases and about eigenvectors and eigenvalues. Here’s the setup: suppose L : R2 → R2 is a linear map, and it has an eigenvector ~u (with eigenvalue 2) and an eigenvector w ~ (with eigenvalue 3). Question 1 Now suppose ~v ∈ R2 is some arbitrary vector. How does L(L(~v )) compare to −6~v + 5 · L(~v )? Solution (a)

L(L(~v )) = −6~v + 5 · L(~v )

(b)

L(L(~v )) 6= −6~v + 5 · L(~v )

(c)

It cannot be determined from the information given.

X

Why is this the case? Solution The vectors ~ u and w ~ together form a basis for R2 .

Hint:

Can we write ~v as α~ u + β w? ~ (a)

Yes.

(b)

No.

X

What is L(~v ) in terms of α, β, ~u, and w? ~ Solution (a)

αL(~ u) + βL(w) ~

(b)

αL(w) ~ + βL(~ u)

X

But what is L(~u)? Solution Hint: Question 2 Solution 2. Consequently L(~ u) = 2~ u.

(a)

2~ u

(b)

3~ u

(c)

2w ~

(d)

3w ~

Remember that ~ u is an eigenvector with eigenvalue

X

And what is L(w)? ~ 1

http://en.wikipedia.org/wiki/CayleyHamilton_theorem

148

63

Cayley-Hamilton theorem

Solution Hint: Question 3 Solution 3. Consequently L(w) ~ = 3w. ~

(a)

2~ u

(b)

3~ u

(c)

2w ~

(d)

3w ~

Remember that w ~ is an eigenvector with eigenvalue

X

Using these facts, what is L(~v ) in terms of α, β, ~u, and w? ~ Solution (a)

2α ~ u + 3β w ~

(b)

3α ~ u + 2β w ~

(c)

2α w ~ + 3β ~ u

(d)

3α w ~ + 2β ~ u

X

Solution Hint:

Question 4

Solution

(a)

2α L(~ u) + 3β L(w) ~

(b)

3α L(~ u) + 2β L(w) ~

(c)

2α L(w) ~ + 3β L(~ u)

(d)

3α L(w) ~ + 2β L(~ u)

Hint:

Question 5

(a)

L(~ u) = 2~ u

(b)

L(~ u) = 3~ u

(c)

L(~ u) = 2w ~

(d)

L(~ u) = 3w ~

Hint:

L(w) ~ = 3w ~

(b)

L(w) ~ = 2w ~

(c)

L(w) ~ = 2~ u

(d)

L(w) ~ = 3~ u

X

Solution

But what is L(~ u)?

Solution

And what is L(w)? ~

X

Question 6

(a)

Using linearity of L, what is L(L(~v ))?

X

149

63

Cayley-Hamilton theorem

Hint: Try substituting the facts that L(~ u) = 2~ u and L(w) ~ = 3w ~ into 2α L(~ u) + 3β L(w). ~ What is L(L(~v ))? (a)

4α ~ u + 9β w ~

(b)

4α w ~ + 9β ~ u

(c)

9α ~ u + 4β w ~

(d)

9α w ~ + 4β ~ u

X

What is −6~v + 5 · L(~v ) in terms of α, β, ~u, and w? ~ Solution Hint:

Earlier we wrote ~v = α~ u + β w. ~

Hint:

Since L is a linear map, we have L(~v ) = αL(~ u) + βL(w). ~

(a)

−6 (α~ u + β w) ~ + 5αL(~ u) + 5βL(w) ~

(b)

−6 (α~ u + β w) ~ + 5βL(~ u) + 5αL(w) ~

(c)

−6 (α~ u + β w) ~ + 3αL(~ u) + 3βL(w) ~

(d)

−6 (α~ u + β w) ~ + 3βL(~ u) + 3αL(w) ~

X

Solution Hint:

Question 7

(a)

L(~ u) = 2~ u

(b)

L(~ u) = 3~ u

(c)

L(~ u) = 2w ~

(d)

L(~ u) = 3w ~

Hint:

L(w) ~ = 3w ~

(b)

L(w) ~ = 2w ~

(c)

L(w) ~ = 2~ u

(d)

L(w) ~ = 3~ u

But what is L(~ u)?

Solution

And what is L(w))? ~

X

Question 8

(a)

Solution

X

Hint: Try substituting the facts that L(~ u) = 2~ u and L(w) ~ = 3w ~ into −6 (α~ u + β w) ~ + 5αL(~ u) + 5βL(w). ~ Hint:

Then we get −6α~ u − 6β w ~ + (5 · 2)α~ u + (5 · 3)β w. ~

Hint:

But −6 + 10 = 4 and −6 + 15 = 9.

150

63

Hint:

Cayley-Hamilton theorem

Consequently, this simplifies to 4α ~ u + 9β w. ~

Now write −6~v + 5 L(~v ) but without referring to L. (a)

4α ~ u + 9β w ~

(b)

4α w ~ + 9β ~ u

(c)

9α ~ u + 4β w ~

(d)

9α w ~ + 4β ~ u

X

And so, after all this, we see that L(L(~v )) = −6~v + 5 L(~v ). What happens if you try this in higher dimensions? Suppose you have a map L : R3 → R3 and it has three eigenvectors with three different eigenvalues. Can you rewrite L(L(L(~v ))) in terms of ~v and L(~v ) and L(L(~v )) in that case?

151

64

Bilinear maps

Bilinear maps are linear in two vector variables separately. Definition 1 Let V, W and U be vector spaces. A bilinear map B : V ×W → U is a function of two vector variables which is linear in each variable separately. That is Additivity in the first slot For all ~v1 , ~v2 ∈ V and all w ~ ∈ W , we have B(~v1 + ~v2 , w) ~ = B(~v1 , w) ~ + B(~v2 , w) ~ Additivity in the second slot For all ~v ∈ V and all w ~ 1, w ~ 2 ∈ W , we have B(~v , w ~1 + w ~ 2 ) = B(~v , w ~ 1 ) + B(~v , w ~ 2 ). Scaling in each slot For all c ∈ R and all ~v inV and all w ~ ∈ W , we have B(c~v , w) ~ = B(~v , cw) ~ = cB(~v , w). ~ A bilinear map from V × V → R is called a bilinear form on V . We will mostly be focusing on bilinear forms on Rn , but we will sometimes need to work with more general bilinear maps. Example 2 The map B : Rn × Rn → R given by B(~v , w) ~ = ~v · w ~ is a bilinear form, since we confirmed that the dot product has these properties immediately after defining the dot product. Question 3 Rn × Rm can be identified with Rn+m . Is a bilinear map Rn × Rm → Rk linear when viewed as a map from Rn+m → Rk ? Solution (a)

No.

(b)

Yes.

X

You are correct: a bilinear map Rn × Rm → Rk is not necessarily a linear map when we identify Rn × Rm with Rn+m . Why? What  is anexample? For example, x z 2 2 the dot product dot : R × R → R defined by B( , ) = xz + yt is bilinear, y t but it is certainly not a linear map from R4 → R. Question 4 Let B : R2 × R3 → R be a bilinear mapping, and you know the following values of B:      1 1 • B , 0 = 2 0 0      0 1   • B , 1 =1 0 0      0 1   • B , 0 = −3 0 1 152

64

Bilinear maps

     1 0 • B , 0 = 2 1 0      0 0    , 1 =5 • B 1 0      0 0 • B , 0 = 4 1 1      4 3 What is B  , 2? 2 1 Solution Hint: We need to use the linearity in each slot to break this down into a computation involving only the basis vectors. Hint: 

       4     4 3 3 0 B , 2 = B  + , 2 2 0 2 1 1         4   4 3 0 =B , 2 + B  , 2 0 2 1 1

Hint:         4   4 1 0 = 3B  , 2 + 2B  , 2 0 1 1 1         4   4 1   0   = 3B  , 2 + 2B  , 2 0 1 1 1             1   0   0 1   1   1   = 3 4B  , 0 + 2B  , 1 +B , 0 0 0 0 0 0 1             1   0   0 0 0 0 + 2 4B  , 0 + 2B  , 1 + B  , 0 1 1 1 0 0 1 Hint: = 3 (4(2) + 2(1) + 1(−3)) + 2 (4(2) + 2(5) + 1(4)) = 21 + 44 = 65

153

64

Bilinear maps

     4 3 B , 2 = 65 2 1

Question 5

Hint:

If we set L(x) = B(x, 3), then L should be a linear map R → R.

Hint: But a linear map R → R is just multiplication, so B(x, 3) = αx for some number α.

Hint: β.

But a bilinear map is linear in both variables, so B(17, y) = βy for some number

Hint: So one way to get a bilinear map would be to set B(x, y) = 10xy. You can enter this as 10 * x * y.

Hint:

Can you think of other examples?

Hint: Sure! Another way to get a bilinear map would be to set B(x, y) = 13xy. You can enter this as 13 * x * y. Hint: In general, if B : R × R → R is bilinear, then it must be B(x, y) = λxy for some λ ∈ R.

Write a nonzero bilinear map B : R × R → R. Solution

154

B(x, y) =

65

Tensor products

Bilinear forms comprise a vector space of tensors. The set of all bilinear maps from V × W → R has the structure of a vector space: we can add such maps, and multiply them by scalars. Definition 1 We define V ∗ ⊗ W ∗ to be the vector space of all bilinear maps from V × W → R. This is the tensor product of the dual spaces V ∗ and W ∗ . Hopefully the reason for the duality involved in the definition above will become clear shortly. Given covectors S : V → R and T : W → R, their tensor product is the map S ⊗ T : V × W → R given by the rule S ⊗ T (~v , w) ~ = S(~v )T (w) ~ Warning 2

This formula involves the product of S(~v ) and T (w) ~ as real numbers.

Question 3

S ⊗ T is a function from V × W to R. Is it bilinear?

Solution (a)

Yes.

(b)

No.

X

Let’s prove it! First lets check additivity in the first slot: (S ⊗ T )(v~1 + v~2 , w) ~ = S(~v1 + ~v2 )T (w) ~ = (S(~v1 ) + S(~v2 )) T (w) ~ = S(~v1 )T (w) ~ + S(~v2 )T (w) ~ = (S ⊗ T )(~v1 , w) ~ + (S ⊗ T )(~v2 , w) ~ Proving additivity in the second slot is similar. Lets check scaling in the first slot: (S ⊗ T )(cv~1 , w) ~ = S(c~v )T (w) ~ = cS(~v )T (w) ~ = c(S ⊗ T )(~v , w) ~ Proving scaling in the other slot is similar. So S ⊗ T really is bilinear!

155

66

Some nice covectors

Bilinear forms can be written in terms of particularly nice covectors. Let us recall some notation from the section on derivatives. n Let ~ei be the standard basis for Rn . The covector ~e> i : R → R is ~e> v ) = h~ei , ~v i. i (~ Question 1 We can build more complicated examples, too. Suppose L : R5 → R is the covector given by L = 4 ~e> e> e> 3 − 2~ 4 +~ 5. Solution

Hint:

  1 4    Set ~v =  2. We are considering L(~v ). 3 5

Hint:

Then L(~v ) = (4 ~e> e> e> v ). 3 − 2~ 4 +~ 5 )(~

Hint:

So L(~v ) = 4 ~e> v ) − 2 ~e> v ) + ~e> v ). 3 (~ 4 (~ 5 (~

Hint:

Replacing ~e> v ) by h~v , ~ei i yields L(~v ) = 4 h~v , ~e3 i − 2 h~v , ~e4 i + h~v , ~e5 i. i (~

Hint:

In this case, h~v , ~e3 i = 2.

Hint:

And h~v , ~e4 i = 3.

Hint:

And h~v , ~e5 i = 5.

Hint:

We conclude L(~v ) = 4 · 2 − 2 · 3 + 5 = 8 − 6 + 5 = 7.

  1 4     Then L  2 = 7. 3 5

This is a special case of something quite general. Theorem 2 Any covector Rn → R can be written as a linear combination of the covectors ~e> i . Why is this? Think of a covector as Solution

156

66

 a row vector a1

a2 · · ·   a1  a2    a column vector  . .  .. 

(a)

(b)

 an .

Some nice covectors

X

an

Then we can write that row vector as    a1 1 0 · · · 0 + a2 0 1 0

···

  0 + · · · + an 0

···

0

 1 .

But those row vectors are just the duals to the standard basis, so we can write the covector as a1~e> e> e> 1 + a2~ 2 + · · · + an ~ n. How is this related to derivatives? Define the coordinate functions to be πi : Rn → R given by πi (x1 , x2 , x3 , . . . , xn ) = xi . What is the derivative of πi at any point p in Rn ? Solution Hint:

This is a special case of a general theorem.

Theorem 3

The derivative of a linear map (at any point) is the same linear map.

Hint:

In this case, πi is a linear map.

Hint:

So Dπi (p) = πi .

Hint:

But another way to write πi is ~e> i .

(a)

Dπi (p) = ~e> i .

(b)

Dπi (p) = ~ei . Dπi (p) = ~0.

(c)

X

As a result of this, we will often write dxi as a more suggestive notation for the covector with the rather more cumbersome name ~e> i . Let’s do some calculations with this new notation. Solution Hint:

dx2 will select the second entry of any vector

Hint:

  dx2 ( 3, 6, 4 ) = 6

  dx2 ( 3, 6, 4 ) = 6

We can also consider the tensor product of these covectors. Solution

157

66

Some nice covectors

Hint:

         7 2 7 2 dx1 ⊗ dx2 5 , 9 = dx1 5 dx2 9 4 3 4 3

Hint: = 2(9) = 18     2 7 dx1 ⊗ dx2 (5 , 9) =18 3 4

Prove that the set of bilinear forms {dxi ⊗ dyj : 0 ≤ i ≤ n and 0 ≤ j ≤ m} ∗ ∗ forms a basis for the space (Rn ) ⊗ (Rm ) Warning 4 the indexes!

One of your greatest challenges here will be dealing with the all of 

Let B : Rn × Rm

   x1 y1  x2   y2      → R be a bilinear map. Let ~x =  .  and ~y =  . . Then .  .   ..  xn ym

we can write 

   x1 y1  x2   y2      B(~x, ~y ) = B  .  ,  .   ..   ..  xn

ym   y1 j=n   y2  X    = xj B ej ,  .    ..  

j=1

ym =

j=n X i=m X

xj yi B (ej , ei )

j=1 i=1

=

j=n X i=m X

B (ej , ei ) dxi ⊗ dyj (~x, ~y )

j=1 i=1

So B = ∗

j=n X i=m X

B (ej , ei ) dxi ⊗ dyj . This shows that the dxi ⊗ dyj span all of

j=1 i=1 m ∗

(Rn ) ⊗ (R ) . To see that the dxi ⊗ dyj are linearly independent, simply observe that if j=n j=n X i=m X X i=m X ai,J dxi ⊗ dyj = 0, then in particular ai,J dxi ⊗ dyj (ei , ej ) = 0, which j=1 i=1

implies that ai,j = 0 for all i, j. 158

j=1 i=1

66

Example 5

Some nice covectors

The dot product on R2 is given by the expression dx1 ⊗dy1 +dx2 ⊗dy2

Example 6 Let R4 have coordinates (t, x, y, z). The bilinear form η = −dt ⊗ dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz on R4 is incredibly important to physics. It is called the Minkowski inner product1 , and is one of the basic structures underlying the local geometry of our universe.

1

http://en.wikipedia.org/wiki/Minkowski_space

159

67

A basis for forms

A basis for the space of bilinear forms consists of tensors of coordinate functions. Prove that the set of bilinear forms {dxi ⊗ dyj : 1 ≤ i ≤ n and 1 ≤ j ≤ m} forms a ∗ ∗ basis for the space (Rn ) ⊗ (Rm ) . Warning 1 the indexes.

One of your greatest challenges here will be dealing with the all of    y1 x1  y2   x2      → R be a bilinear map. Let ~x =  .  and ~y =  . . Then  ..   ..  ym xn 

Let B : Rn × Rm we can write

   y1 x1  x2   y2      B(~x, ~y ) = B  .  ,  .   ..   ..  

ym   y1 j=n   y2  X    = xj B ej ,  .    ..  xn



j=1

ym =

j=n X X i=m

xj yi B (ej , ei )

j=1 i=1

=

j=n X X i=m

B (ej , ei ) dxi ⊗ dyj (~x, ~y )

j=1 i=1 j=n X i=m X

So B =

B (ej , ei ) dxi ⊗ dyj . This shows that the dxi ⊗ dyj span all of

j=1 i=1 m ∗



(Rn ) ⊗ (R ) . To see that the dxi ⊗ dyj are linearly independent, simply observe that if j=n j=n X i=m X X i=m X ai,J dxi ⊗ dyj = 0, then in particular ai,J dxi ⊗ dyj (ei , ej ) = 0, which j=1 i=1

j=1 i=1

implies that ai,j = 0 for all i, j. Example 2

The dot product on R2 is given by the expression dx1 ⊗dy1 +dx2 ⊗dy2

Question 3 Can you write dx1 ⊗ dy1 + dx2 ⊗ dy2 as α ⊗ β for some covectors α, β : R2 → R? Solution (a)

No.

(b)

Yes.

X

160

67

A basis for forms

Why not? Suppose this were possible. By the rank-nullity theorem, there must be some nonzero vector ~v which is in the kernel of α. That is, there is some nonzero vector ~v ∈ R2 so that α(~v ) = 0. But h~v , ~v i = 6 0, so (α ⊗ β)(~v , ~v ) 6= 0. On the other hand, (α ⊗ β)(~v , ~v ) = α(~v ) · β(~v ) = 0, which is a contradiction. Definition 4

Bilinear forms which can be written as α ⊗ β are pure tensors.

So what we have shown here is that not all bilinear forms are pure tensors.

Example 5 Let R4 have coordinates (t, x, y, z). The bilinear form η = −dt ⊗ dt + dx ⊗ dx + dy ⊗ dy + dz ⊗ dz on R4 is the Minkowski inner product. The Minkowski inner product1 is one of the basic structures underlying the local geometry of our universe.

1

http://en.wikipedia.org/wiki/Minkowski_space

161

68

Python

Build some bilinear maps in Python. Question 1 Suppose ~v and w ~ are both vectors in R4 , represented in Python as two lists of four real numbers called v and w. Build a Python function B which represents some bilinear form B : R4 × R4 → R. Solution Hint:

For example, you could try returning 17 * v[0] * w[3]. Python

1 2

def B(v,w): return # the real number B(v,w)

3 4 5 6

def validator(): if B([4,2,3,4],[6,5,4,3]) + B([6,2,3,4],[6,5,4,3]) != B([10,4,6,8],[6,5,4,3]): return False

7 8 9

if B([1,2,3,4],[6,5,4,3]) + B([1,2,3,4],[6,3,4,3]) != B([1,2,3,4],[12,8,8,6]): return False

10 11 12

if 2*B([1,2,3,4],[6,5,4,3]) != B([2,4,6,8],[6,5,4,3]): return False

13 14 15

if 2*B([1,2,3,4],[6,5,4,3]) != B([1,2,3,4],[12,10,8,6]): return False

16 17

return True

Now let’s write a Python function tensor which takes two covectors α and β, and returns their tensor product α ⊗ β. Solution Hint: The returned function should take two parameters (say v and w) and output α(~v ) · β(w). ~ Hint:

1 2

Specifically, you could try return lambda v,w:

alpha(v) * beta(w)

Python def tensor(alpha,beta): return # the bilinear form alpha tensor beta

3 4 5

def validator(): return tensor(lambda x: 4*x[0] + 5*x[1], lambda y: 2*y[0] - 3*y[1])([1,3],[4,5]) == -133

162

69

Linear maps and bilinear forms

Associated to a bilinear form is a linear map. It turns out that we will be able to use the inner product on Rn to rewrite any bilinear form on Rn in a special form. Given a bilinear map B : V × W → R, we obtain a new map B(·, w) ~ :V →R for each vector w ~ ∈ W . B(·, w) ~ is linear, since by definition of bilinearity it is linear in the first slot for a fixed vector w ~ in the second slot. Thus we have a map Curry(B) : W → V ∗ defined by Curry(B)(w) ~ = B(·, w). ~ If V and W are Euclidean spaces, then we have that every bilinear map Rn × Rm → R gives rise to a map Rm → (Rn )∗ . But every element of ω ∈ (Rn )∗ is just a row vector, and so can be represented as the dot product against the vector ω > . Thus we obtain a map LB : Rm → Rn defined by LB (w) ~ = B(·, w) ~ > . This is called the linear map associated to the bilinear form. We also call the matrix of LB the matrix of B. Computing some examples will make these definitions more concrete in our minds. Question 1 Let B : R2 × R3 → R be a bilinear mapping, and suppose we have the following values of B.      1 1 • B , 0 = 2 0 0      0 1   • B =1 , 1 0 0      0 1   • B = −3 , 0 0 1      1 0   , 0 =3 • B 1 0      0 0   • B , 1 =5 1 0      0 0 • B , 0 = 4 1 1 Solution Hint:

LB : R3 → R2 .

Hint:

LB (e1 ) = B(·, e1 )> .

163

69

Linear maps and bilinear forms

Hint: vectors.

To find the matrix of B(·, ~e1 ) : R2 → R, we need to see its effect on basis B(~e1 , ~e1 ) = 2 B(~e2 , ~e1 ) = 3

 so the matrix of B(·, ~e1 ) is 2

3



  2 3

Hint:

Thus LB (~e1 ) = B(·, ~e1 )> =

Hint:

    −3 1 and LB (~e3 ) = Similarly, LB (~e2 ) = 4 5

Hint:

Thus the matrix of LB is



2 3

1 5

−3 4



What is the matrix of LB ?

Question  2 3 −2 1 3 2

2 If B : R3 × R3 → R is a bilinear map, and the matrix of B is 1 5 1

Solution Hint:

        1 0 1 0 By definition, B 2 , 0 = 2 · LB (0) 0 1 0 1

Hint:

        1 0 1 1 Thus B 2 , 0 = 2 · 5 = 11 0 1 0 1

    1 0 Then B 2 , 0 = 11. 0 1

X Show that the matrix of the bilinear form ai,j dxi ⊗ dxj is the matrix (ai,j ). Let M be the matrix of LB . Following the same line of reasoning as in a previous activity1 , we know that Mi,j = ~e> ej ). But by definition, this is B(~ei , ~ej ), which plainly evaluates to i LB (~ ai,j . The claim is proven. To every linear map L : Rm → Rn we also obtain a bilinear map BL : Rn ×Rm → R defined by BL (v, w) = v > L(w). 1 http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/ inner-product/multiply-dot/

164

69

Linear maps and bilinear forms

To summarize, we have a really nice story about bilinear maps B : Rn ×Rm → R: Every single one of them can be written as B(v, w) = v > L(w) for some unique linear map L : Rm → Rn . Also every linear map Rm → Rn gives rise to a bilinear form by defining B(v, w) = v > L(w). On the level of matrices, we just have that B(v, w) = v > M w where M is the matrix of the linear map LB . We will sometimes say talk about “using a matrix as a bilinear form:” this is what we mean by that. This will be very important to us when we start talking about the second derivative. In this activity we have shown that for bilinear maps Rn × Rm → R, there is a useful notion of a linear map Rm → Rn associated to it. If the codomain of the original bilinear map had been anything other than R we would not have such luck: our work depended crucially on the ability to turn covectors into vectors using the inner product on Rn .

165

70

Python

Use Python to find the linear maps associated to bilinear forms. Question 1 Suppose alpha is a covector α : Rn → R. Write a Python function for converting such a covector in (Rn )∗ into a vector in ~v ∈ Rn . More specifically, covector to vector should take as input a Python function alpha, and return a list of n real numbers. This list of n real numbers, when regarded as a vector ~v ∈ Rn , should have the property that α(w) ~ = h~v , wi. ~ Solution

1 2 3

Hint:

You can determine what vi must be by consider α(~ei ).

Hint:

Define e(i) by [0] * i + [1] + [0] * (n-i-1).

Hint:

In other words, e = lambda i:

Hint:

Then α(~ei ) is alpha(e(i)).

Hint:

So to form a vector, we need only use [alpha(e(i)) for i in range(0,n)].

[0] * i + [1] + [0] * (n-i-1).

Python n = 4 def covector_to_vector(alpha): return # a vector v so that alpha(w) = v dot w

4 5 6 7 8 9 10 11 12

def validator(): if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[1] != 2: return False if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[2] != -17: return False if covector_to_vector(lambda x: 3*x[0] + 2*x[1] - 17*x[2] + 30*x[3])[3] != 30: return False return True

Suppose B is a bilinear form B : Rn × Rm → R. Write a Python function for taking such a bilinear form, and producing the corresponding linear map LB . Recall that we encode a linear map Rm → Rn in Python as regular Python function which takes as input a list of m real numbers, and outputs a list of n real numbers. Solution Hint: You may want to make use of covector to vector; copy the code from above and paste it into the box below. Hint:

We defined LB (w) ~ = B(·, w) ~ >.

166

70

Python

Hint: In other words, LB sends a vector w ~ to the vector corresponding to the covector ~x 7→ B(~ x, w). ~

Hint: So we should return a linear map sending w ~ to the covector to vector applied to ~ x 7→ B(~ x, w). ~

Hint:

1 2 3 4

So we should return lambda w:

covector to vector(lambda x:

B(x,w)).

Python n = 4 m = 3 def bilinear_to_linear(B): return # the associated linear map L_B

5 6 7 8 9 10 11

def validator(): if bilinear_to_linear(lambda x,y: 7 * x[0] * y[1] + 3*x[1]*y[2])([3,5,7])[1] != 21: return False if bilinear_to_linear(lambda x,y: 7 * x[0] * y[1] + 3*x[1]*y[2])([3,5,7])[0] != 35: return False return True

These are again examples of “higher-order functions.” Keeping track of dual spaces and thinking about operations which transform bilinear maps into linear maps are two examples of where such “higher-order thinking” comes in handy.

167

71

Adjoints

Adjoints formalize the transpose. Let L : Rm → Rn be a linear map. Then there is an associated bilinear form BL : Rn × Rm → R given by BL (~v , w) ~ = h~v , L(w)i. ~ One thing we can do to a bilinear map is swap the two inputs, namely, we can build BL∗ : Rm × Rn → R by the rule BL∗ (w, ~ ~v ) = BL (~v , w). ~ And with this “swapped” bilinear map, we can go back and recover an associated linear map LBL∗ : Rn → Rm . Question 1

The domain of LBL∗ is the same as

Solution (a)

the codomain of L.

(b)

the domain of L.

X

Right! The minor surprise is that L went from Rm to Rn , but LBL∗ went “the other way” from Rn to Rm . Definition 2 If L : Rm → Rn is a linear map, the adjoint of L is the map LBL∗ : Rn → Rm . We usually write L∗ for the adjoint of L. Let L : Rm → Rn be a linear map. Show that h~v , L(w)i ~ = hL∗ (~v ), wi ~ for every n m v ∈ R and w ∈ R h~v , L(w)i ~ = BL (~v , w) ~ = BL∗ (w, ~ ~v ) = hw, ~ L∗ (~v )i = hL∗ (~v ), wi ~ Let’s work through an example. Let L : R3 → R2 be the linear map   3 2 1 represented by the matrix with respect to the standard basis. −4 2 9

Question 3

Solution 

 3 −4

Hint:

L(~e1 ) =

Hint:

h~e2 , L(~e1 )i = h~e2 ,

Hint:

h~e2 ,





 3 i −4

 3 i = −4. −4

h~e2 , L(~e1 )i = −4

168

71

Adjoints

Recall that h~v , L(w)i ~ = hL∗ (~v ), wi. ~ Consequently, setting ~v = ~e2 and w ~ = ~e1 , ∗ we have hL (~e2 ), ~e1 i is also −4. Let’s write (`ij ) for the entries of the matrix for L, and (`∗ij ) for the entries of the matrix for L∗ . The fact that h~e2 , L(~e1 )i = −4 amounts to saying `2,1 = −4, and then since hL∗ (~e2 ), ~e1 i = −4, we have that `∗1,2 = −4. Solution Hint: The matrix of the adjoint of a linear map is the transpose of the matrix of that linear map. 

3 The matrix of L is 2 1 ∗

Hint:

 −2 2 9

What is the matrix of L∗ ?

What do you notice about these entries? Solution (a)

`ij = `∗ji

(b)

`ij = `∗ij

X

Let’s summarize this fact as a theorem. Theorem 4 Let L : Rn → Rm be a linear map. If L has matrix M with respect to the standard basis, then L∗ has matrix M > . > Recall that M > is the transpose of M , meaning that Mij = Mji . It is your turn to prove this theorem. Let’s use the fundamental insight from this activity1 . Let the matrix of L∗ be called M ∗ for now. To find the entry in the ith row and th j column of M ∗ , we just compute ∗ Mi,j = e> i M (ej ) ∗ = e> i L (ej )

= hei , L∗ (ej )i = hL(ei ), ej i = e> j L(ei ) = e> j M (ei ) = Mj,i So the entry in the ith row and j th column of M ∗ is the entry in the j th row and ith column of M . Thus M ∗ = M > . Here, finally, is a question for you to ponder: why are we bothering about adjoints of linear maps if we have transposes of matrices?

1 http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/ inner-product/multiply-dot/

169

72

Spectrum of the adjoint

Taking adjoints doesn’t affect the spectrum. The set of eigenvalues of a linear operator—what we call the spectrum of the linear operator—is of fundamental importance. Taking adjoints is one way to build a new linear operator from an old linear operator. Fusing these two ideas together results in a question: how does the spectrum of L relate to the spectrum of its adjoint, L∗ ? Surprisingly, the spectrum of L is the same as the spectrum of L∗ . Let’s get started: show that if ~v is a nonzero eigenvector of L : Rn → Rn , with eigenvalue λ then there is a nonzero eigenvector ~u of L∗ with eigenvalue λ. Warning 1

This is a very hard problem.

Hint: Consider the map S : Rn → Rn given by S(~v ) = L∗ (~v ) − λ~v , or in other words ∗ S = L − λI. Showing that this map has a nontrivial kernel is the same as showing that L∗ has λ as an eigenvector.

Hint:

Notice that L − λI is adjoint to S = L∗ − λI

Hint:

For all w ~ ∈ Rn , we have hS(w), ~ ~v i = hw, ~ L(~v ) − λ~v i = 0

Hint: Thus S(w) ~ is in the subspace of vectors perpendicular to the eigenvector ~v , which we denote ~v ⊥ .

Hint:

Thus we have that Im(S) ⊂ ~v ⊥ . This implies that dim(Im(S)) ≤ n − 1

Hint:

By the rank nullity theorem, we have that dim(ker(S)) ≥ 1

Hint: λ.

So S has a nontrivial kernel, so L∗ has a nonzero eigenvector ~ u with eigenvalue

Consider the map S : Rn → Rn given by S(~v ) = L∗ (~v ) − λ~v . Showing that this map has a nontrivial kernel is the same as showing that L∗ has λ as an eigenvector. Notice that L − λI is adjoint to S = L∗ − λI For all w ~ ∈ Rn , we have hS(w), ~ ~v i = hw, ~ L(~v ) − λ~v i = 0 Thus S(w) ~ is in the subspace of vectors perpendicular to the eigenvector ~v , which we denote ~v ⊥ . Thus we have that Im(S) ⊂ ~v ⊥ . This implies that dim(Im(S)) ≤ n − 1 By the rank nullity theorem, we have that dim(ker(S)) ≥ 1 So S has a nontrivial kernel, so L∗ has a nonzero eigenvector ~u with eigenvalue λ.

170

73

Self-adjoint maps

Linear maps which equal their own adjoint are important Definition 1

A linear operator L : Rn → Rn is self-adjoint if L = L∗ .

These ideas also pop up when considering bilinear forms. Definition 2 A bilinear form on Rn is symmetric if for all ~v , w ~ ∈ Rn we have B(~v , w) ~ = B(w, ~ ~v ). Question 3 Consider the bilinear form B : Rn × Rn → R given by B(~v , w) ~ = h~v , wi. ~ In other words, B is just the inner product. Is B symmetric? Solution (a)

Yes.

(b)

No.

X

That’s right! Example 4 The dot product on Rn is a symmetric bilinear form, since we have already shown that v · w = w · v. Which of the following bilinear forms on R2 are symmetric? Solution     = x y + 3x2 y1 is not symmetric since, for example, Hint: B x , x , y1 , y 2     1 2    1 2 1 0 0 1 B , = 1, but B , = 3. 0 1 1 0     Hint: B x1 , x2 , y1 , y2 = x1 y1 +5x2 y2 +x1 y2 is not symmetric since, for example,         0 1 B 1, 0 , 0, 1 = 1 but B , =0 1 0

        Hint: B x1 , x2 , y1 ,y2 = 2x1 y1 +4x2 y2 is symmetric, since B x1 , x2 , y1 , y2 = 2x1 y1 + 4x2 y2 = B y1 , y2 , x1 , x2 (a) (b) (c)

    x1 , x2 , y1 , y2 = x1 y2 + 3x2 y1     B x1 , x2 , y1 , y2 = 2x1 y1 + 4x2 y2 X     B x1 , x2 , y1 , y2 = x1 y1 + 5x2 y2 + x1 y2

B

171

74

Symmetric matrices

The matrix of a self-adjoint linear map is symmetric. The matrix of a self-adjoint operator equals its own transpose. In other words, it is symmetric about the main diagonal. Definition 1

A matrix which equal its transpose is a symmetric matrix. 

   x1 y , 1 = x2 y2 2x1 y1 + 4x2 y2 + x1 y2 + x2 y1 . What is the matrix of B? What do you notice about this matrix?

Question 2

Let B be the symmetric bilinear form on R2 defined by B

Solution Hint:

Remember that the entry Mi,j = B(ei , ej )

Hint:     1 1 , )=2 0 0     1 0 M1,2 = B( , )=1 0 1     0 1 M2,1 = B( , )=1 1 0     0 0 M2,2 = M ( , )=4 1 1 M1,1 = B(

Hint:

The matrix of B is

 2 1

1 4



Notice that this matrix is a symmetric matrix!

Show that L is self-adjoint if and only if the bilinear form associated to it is symmetric. If L is self-adjoint, then BL (v, w) = v > L(w) = hv, L(w)i = hL(v), wi = w> L(v) = BL (w, v) So the bilinear form associated to L is symmetric. 172

74

Symmetric matrices

On the other hand, if B is a symmetric, then hLB (v), wi = hB(v, ·)t op, wi = B(v, ·)(w) = B(v, w) = B(w, v) = B(w, ·)(v) = hB(w, ·)> , vi

= hLB (w), vi

So the linear map associated with B is self-adjoint

173

75

Python

Build some self-adjoint operators in Python. Question 1 We will represent a linear operator in Python as a function which takes as input a list of n real numbers, and outputs a list of n real numbers. Write down a linear operator L which is self-adjoint. Solution Hint:

In this problem, n = 4.

Hint: v[3].

So the input v will be a list of four numbers, namely v[0], v[1], v[2], and

Hint:

The output should also be a list of four numbers.

Hint: We must make sure that the resulting operator is self-adjoint, which we can achieve if the corresponding matrix is symmetric.

Hint: Since we just need to write down one example, we could even get away with return v, namely, the identity operator. But let’s try to be fancier!

Hint:

Hint: v[3]].

1 2 3

 2 3  Let’s make L into the linear operator represented by the matrix  0 0

6 7 8 9 10 11

0 0 1 0

 0 0 . 0 1

We can achieve this with return [2*v[0] + 3*v[1], 3*v[0] + 4*v[1], v[2],

Python n = 4 def L(v): return # the vector L(v), but make sure that L is self-adjoint

4 5

3 4 0 0

def validator(): e = lambda i: [0] * i + [1] + [0] * (n-i-1) for i in range(0,4): for j in range(i,4): if L(e(i))[j] != L(e(j))[i]: return False return True

Fantastic!

174

76

Definiteness and the spectral theorem

“Definiteness” describes what we can say about the sign of the output of a bilinear form. Definition 1

A bilinear form B : Rn × Rn → R is

Positive definite

if B(~v , ~v ) > 0 for all ~v 6= ~0,

Positive semidefinite Negative definite

if B(~v , ~v ) < 0 for all ~v 6= ~0,

Negative semidefinite Indefinite

if B(~v , ~v ) ≥ 0 for all ~v ,

if B(~v , ~v ) ≤ 0 for all ~v , and

if B there are v and w with B(v, v) > 0 and B(w, w) < 0

Let M be a diagonal matrix. In a sentence, can you relate the sign of the entries Mi,i to the definiteness of the associated bilinear form? Given a diagonal n  × n matrix M with Mi,i = λi , we see that B(~x, ~x) = x1 h i  x2    . 2 2 2 .  ~v = λ1 x1 + λ2 x2 + · · · + λn xn x1 x2 .. xn M   ..  xn If the λi are all positive, this expression is always positive whenever the xi are not all 0. So the bilinear form is positive definite. If the λi are all nonnegative, this expression is always nonnegative whenever the xi are not all 0. So the bilinear form is positive semidefinite. If the λi are all negative, this expression is always negative whenever the xi are not all 0. So the bilinear form is negative definite. If the λi are all nonpositive, this expression is always nonpositive whenever the xi are not all 0. So the bilinear form is negative semidefinite. If the λi > 0 and λj < 0 for some i, j ≤ n, then B(ei , ei ) = λi > 0 and B(ej , ej ) = λj < 0, so the bilinear form is indefinite. Our goal will now be to reduce the study of general symmetric bilinear forms to those whose associated matrix is diagonal. Let L : Rn → Rn be a self adjoint linear operator. Prove that if ~v1 and ~v2 are eigenvectors with distinct eigenvalues λ1 and λ2 , then ~v1 ⊥ ~v2 . hL(~v1 ), ~v2 i = h~v1 , L(~v2 )i hλ1~v1 , ~v2 i = h~v1 , λ2~v2 i (λ1 − λ2 )h~v1 , ~v2 i = 0 h~v1 , ~v2 i = 0 Let L : Rn → Rn be a self adjoint linear operator. Let ~v be an eigenvector of L. Prove that L restricts to a self adjoint linear operator on the space of vectors perpendicular to ~v , ~v ⊥ . All we need to show is that w ⊥ v implies L(w) ⊥ v. 175

76

Definiteness and the spectral theorem

hL(w), vi = hw, L(v)i = hw, λvi = λhw, vi =0 so we are done! Theorem 2 eigenvector.

If L : Rn → Rn is a self adjoint linear operator, then L has an

Proof First assume that L is not the identically 0 map. If it is, we are done because 0 is an eigenvector in that case. Since the unit sphere in Rn is compact1 , the function ~v 7→ |L(~v )| achieves its maximum M . So there is a unit vector ~u so that |L(~u)| = M , and |L(v)| ≤ M for all other unit vectors ~v . M > 0 because L 6= 0. Now let w = L(u)/M . This is another unit vector. Note that hw, L(u)i = M , so we also have hL(w), ui = M , since L is self adjoint. hL(w), ~ ~ui ≤ |L(w)||~ ~ u| with equality if and only if L(w) ~ ∈ span(~u) by CauchySchwarz. But |L(w)||~ ~ u| = |L(w)| ~ because ~u is a unit vector! Since M is the maximum value of |L(~u)| over all unit vectors ~u, we must have L(w) ~ ∈ span(~u) We can conclude that L(w) ~ = M~u. Now either ~u + w ~ 6= 0 or ~u − w ~ 6= 0 . In either case, L(~u ± w) ~ = M~u ± M w, ~ so the nonzero ~v ± w ~ is an eigenvector of L. Credit for this beautiful line of reasoning goes to Marcos Cossarini2 . Most proofs of this theorem use either Lagrange Multipliers (which we will learn about soon), or complex analysis. Here we use only linear algebra along with the one analytic fact that a continuous function on a compact set achieves its maximum value.  We can combine the previous theorem and question to prove the “Spectral Theorem for Real Self-adjoint Operators”: Theorem 3 eigenvectors.

A self adjoint operator L : Rn → Rn has an orthonormal basis of

Proof L has an eigenvector ~v1 . L ~v⊥ : ~v1⊥ → ~v1⊥ is another self adjoint linear 1 operator and so it has an eigenvector ~v2 . Continue in this way until you have constructed all n eigenvectors. Because of how they are constructed, we have that each one is perpendicular to all of the eigenvectors which came before it.  This, in some sense, completely answers the question of how to characterize the definiteness of a symmetric bilinear form. Look at its associated linear operator, which must be self adjoint. By the Spectral Theorem, it has a orthonormal basis of eigenvectors. Then 1 2

176

http://en.wikipedia.org/wiki/Compact_space http://mathoverflow.net/a/118759/1106

76

Definiteness and the spectral theorem



B positive definite ⇐⇒ LB has all positive eigenvalues



B positive semidefinite ⇐⇒ LB has all nonnegative eigenvalues



B negative definite ⇐⇒ LB has all negative eigenvalues



B negative semidefinite ⇐⇒ LB has all nonpositive eigenvalues



B indefinite ⇐⇒ LB has both positive and negative eigenvalues

This will be crucially important when we get to the second derivative test: it will turn out that the second derivative is a symmetric bilinear form, and the definiteness of this bilinear form is analogous to the concavity of a function of one variable. Identifying local maxima and minima with the “second derivative test” will require analysis of the eigenvalues of the associated linear map.

177

77

Second order partial derivatives

Second order partial derivatives are partial derivatives of partial derivatives ∂f : Rn → R is another function, ∂xi so we can take its partial derivative with respect to xj . We define the second order ∂2f ∂ ∂f partial derivative = . In the special case that i = j we will write ∂xj ∂xi ∂xj ∂xi ∂2f (even though this notation is horrible, it is standard, so we will follow it). ∂x2i

Definition 1

Question 2

If f : Rn → R is a function, then

Let f (x, y) = x2 y 3

Solution Hint:   ∂2f ∂ ∂ 2 3 = (x y ) ∂x2 ∂x ∂x ∂ = (2xy 3 ) ∂x = 2y 3 ∂2f = 2y 3 ∂x2 Solution Hint:   ∂2f ∂ ∂ 2 3 = (x y ) ∂x∂y ∂x ∂y ∂ = (3x2 y 2 ) ∂x = 6xy 2 ∂2f = 6xy 2 ∂x∂y Solution Hint:   ∂2f ∂ ∂ 2 3 = (x y ) ∂y∂x ∂y ∂x ∂ = (2xy 3 ) ∂y = 6xy 2 ∂2f = 6xy 2 ∂y∂x Solution

178

77

Second order partial derivatives

Hint:   ∂2f ∂ 2 3 ∂ (x y ) = ∂y 2 ∂y ∂y ∂ (3x2 y 2 ) = ∂y = 6x2 y ∂2f = 6x2 y ∂y 2 Solution

Did you notice how

∂2f ∂2f = ? Doesn’t that fill you with a sense of ∂x∂y ∂y∂x

wonder and mystery? (a)

Yes!

(b)

No :(

Question 3

X

Let f (x, y) = sin(xy 2 )

Solution Hint:   ∂2f ∂ ∂ 2 = sin(xy ) ∂x2 ∂x ∂x ∂ 2 = (y sin(xy 2 )) ∂x = −y 4 sin(xy 2 ) ∂2f = −y 4 sin(xy 2 ) ∂x2 Solution Hint:   ∂2f ∂ ∂ 2 = (sin(xy )) ∂x∂y ∂x ∂y ∂ = (2xy cos(xy 2 )) ∂x = 2y cos(xy 2 ) − 2xy(y 2 ) sin(xy 2 ) = 2y cos(xy 2 ) − 2xy 3 sin(xy 2 ) ∂2f = 2ycos(xy 2 ) − 2xy 3 sin(xy 2 ) ∂x∂y Solution Hint:   ∂2f ∂ ∂ = (sin(xy 2 )) ∂y∂x ∂y ∂x ∂ 2 = (y cos(xy 2 )) ∂y = 2y cos(xy 2 ) − y 2 (2xy) sin(xy 2 ) = 2y cos(xy 2 ) − 2xy 3 sin(xy 2 )

179

77

Second order partial derivatives

∂2f = 2ycos(xy 2 ) − 2xy 3 sin(xy 2 ) ∂y∂x Solution Hint:   ∂2f ∂ ∂ 2 (sin(xy )) = ∂y 2 ∂y ∂y ∂ (2xy cos(xy 2 )) = ∂y = 2x cos(xy 2 ) − 2xy(2xy) sin(xy 2 ) = 2x cos(xy 2 ) − 4x2 y 2 sin(xy 2 ) ∂2f = 2xcos(xy 2 ) − 4x2 y 2 sin(xy 2 ) ∂y 2

This lends even more evidence to the startling claim that

Question 4

Let f : R3 → R be the function f (x, y, z) = x2 yz 3

Solution Hint: ∂2f ∂ ∂ 2 3 = x yz ∂x∂z ∂x ∂z ∂ 3x2 yz 2 = ∂x = 6xyz 2 ∂2f =6xyz 2 ∂x∂z Solution Hint: ∂2f ∂ ∂ 2 3 = x yz ∂z∂x ∂z ∂x ∂ = 2xyz 3 ∂x = 6xyz 2 ∂2f =6xyz 2 ∂z∂x

Okay, it really really looks like

180

∂2f ∂2f = ∂x∂y ∂y∂x

∂2f ∂2f = . ∂xi ∂xj ∂xj ∂xi

78

Mixed partials commute

Order of partial differentiation doesn’t matter In the last section on partial derivatives we made the interesting observation that ∂2f ∂2f = for all of the functions we considered. We will now prove this, ∂xi ∂xj ∂xj ∂xi modulo some technical assumptions. Theorem 1 Let f : Rn → R be a differentiable function. Assume that the partial derivatives fxi : Rn → R are all differentiable, and the second partial derivatives fxi ,xj are continuous. Then fxi ,xj = fxj ,xi . First, let’s develop some intuition about why this result is true. This informal discussion will also suggest how we should proceed with the formal proof. Let’s restrict our attention, for the moment, to functions g : R2 → R. Observe g(a + h, b) − g(a, b) for small values of h. Analogously, gy (a, b) ≈ that gx (a, b) ≈ h g(a, b + k) − g(a, b) . k Now applying this idea twice, we have 1 (fy (a + h, b) − fy (a, b)) h  1 f (a + h, b + k) − f (a + h, b) f (a, b + k) − f (a, b) ≈ − h k k f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) = hk

fxy (a, b) ≈

Going through the same process with fyx leads to exactly the same approximation! So our strategy of proof will be to show that we can express both of these partial derivatives as the two variable limit:

fxy (a, b) = lim

h,k→0

f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) = fyx (a, b) hk

Proof Let HODQ(h, k) = f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b). (Here HODQ stands for ”higher order difference quotient”). Let Q(s) = f (s, b + k) − f (s, b). Then HODQ(h, k) = Q(a + h) − Q(a). By the mean value theorem for derivatives, there is an 0 < 1 < h such that Q(a + h) − Q(a) = hQ0 (a + 1 ). So HODQ(h, k) = h(fx (a + 1 , b + k) − fx (a, b)) . By the mean value theorem again, we have HODQ(h, k) = hkfyx (a + 1 , b + 2 ) for some 0 < 2 < k. Now apply exactly the same reasoning to conclude that HODQ(h, k) = hkfyx (a+ ξ2 , b + ξ1 ) for some 0 < ξ1 < k and 0 < ξ2 < h. Let R(s) = f (a + h, s) − f (a, s). 181

78

Mixed partials commute

Then HODQ(h, k) = R(b + k) − R(b). By the mean value theorem for derivatives, there is an 0 < ξ1 < k such that R(b + k) − R(b) = kR0 (b + ξ1 ). So HODQ(h, k) = k(fy (a + h, b + ξ1 ) − fy (a, b)) . By the mean value theorem again, we have HODQ(h, k) = hkfxy (a + ξ2 , b + ξ1 ) for some 0 < ξ2 < h. So we have lim

h,k→0

HODQ(h, k) f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) = lim h,k→0 hk hk = lim fyx (a + 1 , b + 2 ) h,k→0

But since 0 < 1 < h and 0 < 2 < k , then as h, k → 0, a + 1 → a and b + 2 → b. By the continuity of fyx , we have that the limit equals fyx (a, b). f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) = Apply the same reasoning to conclude that lim h,k→0 hk fxy (a, b) f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) HODQ(h, k) = lim h,k→0 h,k→0 hk hk = lim fxy (a + ξ2 , b + ξ1 ) lim

h,k→0

But since 0 < ξ1 < k and 0 < ξ2 < h , then as h, k → 0, a + ξ2 → a and b + ξ1 → b. By the continuity of fxy , we have that the limit equals fxy (a, b). So we can conclude that fxy (a, b) = fyx (a, b), because they are the common f (a + h, b + k) − f (a + h, b) − f (a, b + k) + f (a, b) value of the limit lim .  h,k→0 hk We close with a cautionary example. This result is not always true if the second partial derivatives are not continuous. Remember that we define g(a + h, b) − g(a, b) g(a, b + k) − g(a, b) gx (a, b) = lim , and similarly gy (a, b) = lim h→0 k→0 h k  2 2 xy x − y if (x, y) 6= (0, 0) Question 2 Define f (x, y) = x2 + y 2  0 if (x,y)=(0,0) Solution Hint:

Question 3

Solution

Hint: fy (x, 0) = lim

k→0

f (x, k) − f (x, 0) k 2

= lim

2

xk xx2 −k −0 +k2

k x2 − k2 = lim x 2 k→0 x + k 2 =x k→0

182

78

Mixed partials commute

fy (x, 0) =x Solution Hint: fx (0, y) = lim

h→0

f (h, y) − f (0, y) h 2

= lim

2

hy hh2 −y −0 +y 2

h h2 − y 2 = lim y 2 h→0 h + y 2 = −y h→0

fx (0, y) =−y

Hint: fy (h, 0) − fy (0, 0) h h−0 = lim h→0 h =1

fxy (0, 0) = lim

h→0

Hint: fx (0, k) − fx (0, 0) k −k − 0 = lim h→0 k = −1

fyx (0, 0) = lim

k→0

fxy (0, 0) =1 Solution

fyx (0, 0) =−1

183

79

Second derivative

The second derivative records how small changes affect the derivative. The derivative allowed us to find the best linear approximation to a function at a point. But how do these linear approximations change as we move from point to nearby point? That is exactly what the second derivative is aiming for.

184

80

Intuitively

The second derivative is a bilinear form. From our perspective, the second derivative of a function f : Rn → R at a point will be a bilinear form on Rn . Let us take some time to understand, intuitively, why that should be the case. Let f : R2 → R be defined by f (x, y) = x2 y.   D(f ) (x,y) is the linear map given by the matrix 2xy x2 . That is to say,   ∆x D(f ) (x,y) ( ) = 2xy∆x + x2 ∆y ≈ f (x + ∆x, y + ∆y) − f (x, y). ∆y The second derivative should now tell you how much the derivative changes from point to point. If we increment (x, y) by a little  bit to (x + ∆x, y) then we ∂ ∂ should expect the derivative to increase by about ∆x (2xy) ∆x (x2 ) = ∂x ∂x   2y∆x 2x∆x . Similarly, when we increase y by ∆y, the derivative should change     ∂ ∂ by about ∆y (2xy) ∆y (x2 ) = 2x∆y 0∆y . ∂y ∂y By linearity, if we change from (x, y) to (x+∆x, y+∆y), we expect the derivative to change by       2y 2x 2y∆x + 2x∆y 2x∆x + 0 = ∆x ∆y 2x 0 This gives a matrix which is the approximate change in the derivative. You can then apply this to another vector if you so wish. Summing it up, if you wanted to see approximately how  much  the derivative ∆x 2 changes from p = (x, y) to (x+∆x2 , y +∆y2 ) = p+ h~2 (h~2 = ) when both are ∆y2   ∆x1 evaluated in the same direction h~1 = , you would perform the computation: ∆y1      2x 2x ∆x1 ~ ~ Dfp+h~2 (h1 ) − Dfp (h1 ) ≈ ∆x2 ∆y2 2x 0 ∆y1   2x 2x This is exactly using the matrix as a bilinear form applied to the two 2x 0     ∆x1 ∆x2 vectors h~1 = and h~2 = . ∆y1 ∆y2 With all of this as motivation, we make the following wishy washy ”definition” Definition 1 The second derivative of a function f : Rn → R at a point p ∈ Rn 2 is a bilinear form D f p : Rn × Rn → R enjoying the following approximation property: Df p+h~1 (h~2 ) ≈ Df p (h~2 ) + D2 f p (~h1 , ~h2 ) We will make the sense in which this approximation holds precise in another section, but for now this is good enough. 185

80

Intuitively

 Question 2 If f : R2 → R is a function, and Df (1,2) = −1     0 4 −0.2 . Df (1.2,1.1) ( ). 4 1 0.3

 1 and Hf (1,2) =

Solution By the fundamental approximating property, we have Df

Hint:

 (1,2)+

0.3) ≈ Df

 (1,2)

     −0.2 0.2 −0.2 , + D2 f (1,2) 0.3 0.1 0.3

Thus Df

Hint:

 (−0.2

0.2 0.1

 (−0.2 0.2 0.1        −0.2 0.2 0 4 −0.2 1 + 0.3 0.1 4 1 0.3 

(1,2)+

 0.3) ≈ −1

Hint:    −0.2 Df (1.2,1.1) ( ) ≈ −1 0.3

1

     −0.2 0.2 0 + 0.3 0.1 4

4 1



 −0.2 0.3

= −1(−0.2) + 1(0.3) + +(0.2)(0)(−0.2) + (0.2)(4)(0.3) + (0.1)(4)(−0.2) + (0.1)(1)(0.3) = 0.2 + 0.3 + 0 + 0.12 − 0.08 + 0.03 = 0.57  −0.2 ) ≈ 0.57 Df (1.2,1.1) ( 0.3 

Note that the computation really splits into a first order change (Df |p (~h)) and a second order change (D2 f (h~1 , h~2 )). In this case the first order change was 0.5, and the second order change was 0.07. This should be a better approximation to the real value than if we had used the first derivative alone.  2 3 Question   3 Let f : R → R be a function with Df |p = 1 3 . , approximate Df  0.01 3 −2  p+ 0.02

 4 and Hf |p =

Solution Hint:

  (~ h) ≈ Dfp (~v )+ 0.01 0.01  p+ 0.02  0.02 Hf |p as linear maps from R2 → R

By the fundamental approximation property, Df

So Df





0.01 p+ 0.02

 ≈ Dfp + 0.01



Hint: 

186

0.01

     1 3 0.02 Hf |p = 0.01 0.02 3 −2   = 0.01(1) + 0.02(3) 0.01(3) + 0.02(−2)   = 0.07 −0.01

 0.02 Hf |p~v .

80

Hint:

So Df





0.01 p+ 0.02

 ≈ 3

  4 + 0.07

  −0.01 = 3.07

Intuitively

 3.99

Following the development at the beginning of this activity, we can anticipate how to compute the second derivative as a bilinear form: Warning 4

This is an intuitive development, not a rigorous proof n

Let f : R → R. Since  Df p = fx1 (p) fx2 (p) ...

 fxn (p)

It is reasonable to think that h Df p+~h1 ≈ Df p + Dfx1 (p)(~h1 ) Dfx2 (p)(~h1 ) ...

i Dfxn (p)(~h1 )

but   Dfxi (h~1 ) = fx1 x1 (p) fx2 x1 (p) ... fxn x1 (p) h~1   fx1 x1 (p)   >  fx2 x1 (p)  We can rewrite this as h~1  , so we obtain the rather pleasing formula ..   .

Df p+~h1

fxn x1 (p)  fx1 x1 (p)  f >  x2 x1 (p) ≈ Df p + h~1  ..  .

fx1 x2 (p) fx2 x2 (p)

... ...

fxn x1 (p) fxn x2 (p) ...

 fx1 xn (p) fx2 xn (p)     fxn xn (p)

So 

fx1 x1 (p)  f >  x2 x1 (p) Df p+~h (~h2 ) ≈ Df p (~h2 ) + h~1  .. 1  .

fx1 x2 (p) fx2 x2 (p)

... ...

fxn x1 (p) fxn x2 (p) ...

 fx1 xn (p) fx2 xn (p)  ~  h2  fxn xn (p)

So it looks like we have: Theorem 5 is

If f : Rn → R, the matrix of the bilinear form D2 f p : Rn × Rn → R   fx1 x1 (p) fx1 x2 (p) ... fx1 xn (p)  fx2 x1 (p) fx2 x2 (p) ... fx2 xn (p)      ..   . fxn x1 (p) fxn x2 (p) ...

fxn xn (p)

This matrix is also called the Hessian matrix of f . We could also express this in the following convenient notation: n X D f p= 2

∂2f dxi ⊗ dxj ∂xi ∂xj i,j=1 187

80

Intuitively

By the equality of mixed partial derivatives, this bilinear form is actually symmetric! So all of the theory we developed about self adjoint linear operators and symmetric bilinear forms can (and will) be brought to bear on the study of the second derivative. x Question 6 Let f : R2 → R be defined by f (x, y) = . y Solution Hint:

Question 7

Solution

Hint: ∂ ∂ x ∂x ∂x y ∂ 1 = ∂x y =0

fxx =

fxx =0 Solution Hint: ∂ ∂ x ∂x ∂y y ∂ −x = ∂x y 2 −1 = 2 y

fxy =

fxy =−1/y 2 Solution Hint: fyx = fxy by equality of mixed partials −1 = 2 y fyx =−1/y 2 Solution Hint: ∂ ∂y ∂ = ∂y 2x = 3 y

fyy =

fyy =2x/y 3

188

∂ x ∂y y −x y2

80

 Hint:

Intuitively

 −1 y2  2x  y3

0

 H =  −1 y2

What is the Hessian matrix of f at the point (x, y)?

Question 8

Let f : R3 → R be defined by f (x, y, z) = xy + yz.

Solution Hint:

The only second partials which are not zero are fxy = fyx and fyz = fzy

Hint: fxy = fyx = 1 and fyz = fzy = 1 

Hint:

0 H = 1 0

1 0 1

 0 1 0

What is the Hessian matrix of f at the point (x, y, z)?

Notice that the second derivative of this function is the same at every point because f was a quadratic function. Any other polynomial of degree 2 in n variables would also have a constant second derivative. For example f (x, y, z, t) = 1 + 2x + 3z + 4z 2 + zx + xt + t2 + yx would also have constant second derivative.

189

81

Rigorously

The second derivative allows approximations to the derivative. Definition 1 Let f : Rn → R be a differentiable function, and p ∈ Rn . We say that f is twice differentiable at p if there is a bilinear form B : Rn × Rn → R with Df (p + h~1 )(h~2 ) = Df (p)(h~2 ) + B(h~1 , h~2 ) + Error(h~1 , h~2 ) With

lim

Error(h~1 , h~2 )

h~1 ,h~2 →0

|h~1 ||h~2 |

=0

In this case we call B the second derivative of f at p and write B = D2 f (p). Theorem 2 Let f : Rn → R be a function which is twice differentiable everywhere. Then the second derivative of f at p has the matrix   fx1 x1 (p) fx1 x2 (p) ... fx1 xn (p)  fx2 x1 (p) fx2 x2 (p) ... fx2 xn (p)    H(p) =   ..   . fxn x1 (p) fxn x2 (p) ...

fxn xn (p)

Prove this theorem! Hint:

Apply the definition to B(h~ei , k~ej )

We want to show that D2 f (p)(~ei , ~ej ) = fxi ,xj (p). By definition, we have that Df (p + h~ei )(k e~j ) − Df (p)(k e~j ) − D2 f (h~ei , k~ej ) =0 lim h,k→0 |h~ei ||k~ej | So by the linearity of the derivative, and the bilinearity of the second derivative, kDf (p + h~ei )(e~j ) − kDf (p)(e~j ) − hkD2 f (~ei , ~ej ) = 0. lim h,k→0 |hk| So we have Df (p + h~ei )(e~j ) − Df (p)(e~j ) lim − D2 f (~ei , ~ej ) = 0 h,k→0 h Which implies Df (p + h~ei )(e~j ) − Df (p)(e~j ) h→0 h

D2 f (p)(e~1 , e~2 ) = lim

But Df (x)(e~j ) = fxj (x) for any x ∈ Rn , so this is D2 f (p)(e~1 , e~2 ) = lim

h→0

fxj (p + h~ ei ) − fxj (p) h

190

81

Rigorously

But by definition of the directional derivative, this implies that D2 f (p)(e~1 , e~2 ) = fxi ,xj (p)

191

82

Taylor series

The second derivative allows us to approximate functions better than just the first derivative As it stands, the second derivative lets us get approximations of the first derivative. The first derivative allows us to get approximations of the original function. In the following extended question, we will see how we can use the second derivative to get more information about the first derivative, which then lets us get more information about the original function. This will lead to approximations with second order accuracy, rather than just first order accuracy. This is the essence of the second order Taylor’s theorem.

192

83

An example

A specific example sheds light on Taylor series. Let’s work through an example. Question 1 Let f : R2 → R be a function. All we know about f at the point (1, 2) is the following: •

f (1, 2) = 6



  Df (1, 2) = 4 5   1 −2 D2 f (1, 2) = −2 3



Suppose that we want to approximate f (1.1, 1.9) as accurately as we can given this information. We can simply use the linear approximation to f at (1, 2): Solution Hint:  f (1.1, 1.9 ≈ 6 + 4

5





0.1 −0.1



= 6 + 0.4 − 0.5 = 5.9 Using the linear approximation to f at (1, 2), we find f (1.1, 1.9) ≈ 5.9.

This approximation ignores the second order data provided by the second derivative: we have essentially assumed that the first derivative is constant along the line from (1, 2) to (1.1, 2.2). Since we know the second derivative at the point (1, 2) we can estimate how the derivative is changing along this line assuming the second derivative was constant, and get a better approximation. For example, we could use the following three step process: •

Use the linear approximation to f at (1, 2) to approximate f (1.05, 1.95)



Use the second derivative to approximate Df (1.05, 1.95)



Use the linear approximation to f at (1.05, 1.95) to approximate f (1.1, 1.9)

Solution Hint:  f (1.05, 1.95) ≈ 6 + 4

5





0.05 −0.05

= 6 + 0.2 − 0.25 = 5.95 Let’s try that here: f (1.05, 1.95) ≈ 5.95

193



83

An example

Solution Hint: 





1 −2

Df (1.05, 1.95) ≈ Df (1, 2) + 0.05 −0.05    = 4 5 + 0.05(1) + −0.05(−2)     = 4 5 + 0.15 −0.25   = 4.15 4.75

−2 3



 0.05(−2) + −0.05(3)

Using the second derivative, Df (1.05, 1.95) is approximately: Solution Hint:  f (1.1, 1.9) ≈ 5.95 + 4.15

 4.75



 0.05 −0.05

= 5.95 + 4.15(0.05) + 4.75(−0.05) = 5.92 Using the linear approximation to f at (1.05, 1.95), f (1.1, 1.9) ≈ 5.92

So this method allowed us to get  a slightly better approximation of f (1.1, 2.2) 1 using the fact that the Dfp ( ) is increasing as p moves from (1, 2) in the −1   1 direction . We really got a slightly higher estimate from f (1.9, 2.1) using −1 this two step compared to using the linear approximation because  approximation    1 1 2 D f (1, 2) , = 8 is positive. −1 −1 We do not have to limit ourselves to only using a two step approximation: we could get better and better approximations of f (1.1, 1.9) by using more and more partitions of the line segment from (1, 2) to (1.1, 1.9). For example, we could use ten partitions: •

Use the linear approximation to f at (1, 2) to approximate f (1.01, 1.99)



Use the second derivative to approximate Df (1.01, 1.99)



Use the linear approximation to f at (1.01, 1.99) to approximate f (1.02, 1.98)



Use the second derivative to approximate Df (1.02, 1.98)



Use the linear approximation to f at (1.02, 1.98) to approximate f (1.03, 1.97)



.. .



Use the linear approximation to f at (1.09, 1.91) to approximate f (1.1, 1.9)

This kind of process, where we are summing more and more of smaller and smaller values to approximate something, furiously demands to be phrased as an integral. 194

83

An example

Solution Hint: 

Notice that 1  h 1   i 1 0.1 0.1 2  n , n  = 0.1 1 , −0.1 1 D f 1 1 n n −2 −0.1 −0.1 n n

 1   −2  0.1 n  1 3 −0.1 n

= (0.1(0.1)(1) + 0.1(−0.1)(−2) + (0.1)(−0.1)(−2) + (−0.1)(−0.1)(3)) = 0.08

Hint: sum

1 n2

1 n2

By partitioning [0, 1] into n little pieces of equal width, the contribution to the



   1 1 1 over [0, ] is Df (1, 2) 0.1 − 0.1 = 4 n n n



1 2 over [ , ] is n n

1  n  = −0.1 1 5  1 n −0.1 n 



0.1

   1  1  1   1  0.1 0.1 0.1 1 1  0.1 n  2 n  + D f ()  n , n  Df (1 + 0.1 , 2 + (−0.1) ) ≈ Df (1, 2)  1 1 1 1 n n (−0.1) (−0.1) (−0.1) (−0.1) n n n n 1 1 = −0.1 + 0.08 2 n n •

2 3 over [ , ] is n n

   1  1  1 0.1 1 1  0.1 n  1 1  0.1 n  2  n Df (1 + 2(0.1 ), 2 + 2((−0.1) )) ≈ Df (1 + 0.1 , 2 + (−0.1) ) + D f 1 1 1 n n n n (−0.1) (−0.1) (−0.1) n n n 1 1 1 ≈ −0.1 + 0.08 2 + 0.08 2 n n n 21 1 = 1.4 + 0.08 n nn •

.. .



over [

k+1 k+2 , ] is n n

  1  1  1 1  0.1 n  1 1  0.1 n  Df (1 + (k + 1)(0.1 ), 2 + (k + 1)((−0.1) )) ≈ Df (1 + (k)0.1 , 2 + (k)(−0.1) ) + 1 1 n n n n (−0.1) (−0.1) n n 1 1 1 ≈ −0.1 + (k − 1)0.08 2 + 0.08 2 n n n k1 1 = 1.4 + 0.08 n nn

Hint:

So f (1.1, 2.2) ≈ 6 +

n X k=0

−0.1

1 k1 + 0.08 n nn

195

83

An example

Hint:

By definition of the integral we have lim

n→∞

n X k=0

−0.1

k1 1 + 0.08 = n nn

1

Z

(−0.1 + 0

0.08t)dt 1

Z In this case, we get that f (1.1, 1.9) ≈ 6 +

f (t)dt, where f (t) =−0.1 + 0.08t 0

Solution Hint: Z

1

f (1.1, 1.9) ≈ 6 +

(−0.1 + 0.08t)dt 0  1 1 = 6 + −0.1t + (0.08)t2 0 2 = 6 − 0.1 + 0.04 = 5.94

Evaluating this integral we have f (1.1, 1.9) ≈ 5.94. This is the best approximation we can really expect to get given only this information about f at (1, 2).

  0.1 Notice that this approximation of f is really just f (1, 2) + Df (1, 2) + −0.1     1 2 0.1 0.1 D f (1, 2) , . −0.1 −0.1 2 The first two terms are just the regular linear approximation    to fat (1, 2), but 0.1 0.1 2 the next term arose from integrating the function D f ( , )t from t = 0 −0.1 −0.1     1 0.1 0.1 to t = 1. This is exactly D2 f ( , ). −0.1 −0.1 2 Generalizing, we might expect in general that   1 Theorem 2 f (p + ~h) ≈ f (p) + Df (p)(~h) + D2 f (p) ~h, ~h 2 This is the second order taylor approximation of f at p. Notice how similar it looks to the second order taylor approximation of a single variable function! If we had not taken the time to develop an understanding of D2 f (p) as bilinear map, it would be quite messy to even state this theorem, and it would only get worse for the higher order Taylor’s theorem we will be learning about next week. Hopefully this (admittedly long) discussion has helped you to understand where this approximation comes from! We will give a rigorous statement and proof of the theorem in the next section.

196

84

Rigorously

The second derivative enables quadratic approximation You should know the statement of the following theorem for this course: Theorem 1 (Second order Taylor’s theorem) differentiable function, p ∈ Rn then we have

If f : Rn → R is a twice

1 ˜ ~h, ~h) f (p + ~h) = f (p) + Df (p)~h + D2 f (b + ξ h)( 2 for some ξ ∈ [0, 1]. It follows (after a lot of work!) that 1 f (p + ~h) = f (p) + Df (p)~h + D2 f (p)(~h, ~h) + Error(~h) 2 |Error(~h)| with lim =0 ~ h→~ 0 |~h|2 This approximation is also sometimes phrased as f (x) ≈ f (p) + Df (p)(x − p) + D2 f (p)(x − p, x − p) In the future, we will prove the above theorem. To prepare yourself, you should at a minimum make sure you have already worked through the other two optional sections on the formal definition of a limit1 and also the proof of the chain rule2 where operator norms are introduced. Proof



1 http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/limits/ formal-limit/ 2 http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/chain-rule/ proof/

197

85

Taylor’s theorem examples

Let’s see how Taylor’s theorem gives us better approximations Warning 1 quite long. Question 2

Just due to the sheer number of calculations, these questions are Consider f : R2 → R defined by f (x, y) = x cos(y) + xy.

Solution Hint:

Question 3

Solution

Hint:

∂ x cos(y) + xy = cos(y) + y ∂x

So fx (0, 0) = 1 fx (0, 0) =1 Solution ∂ x cos(y) + xy = −x sin(y) + x ∂y

Hint:

So fy (0, 0) = 0 fy (0, 0) =0

Hint:   x f (x, y) ≈ f (0, 0) + Df (0, 0)( ) y  = 0 cos(0) + 0(0) + fx (0, 0)     x = 1 0 y =x The linear approximation to f at (0, 0) is f (x, y) ≈ x Solution Hint:

Question 4

Hint:

fxx = 0

Solution

fxx (0, 0) =0 Solution Hint:

fxy = − sin(y) + 1

fxy (0, 0) =1

198

   x fy (0, 0) y

85

Taylor’s theorem examples

Solution Hint:

fyx = − sin(y) + 1

fyx (0, 0) =1 Solution Hint:

fyy = 0

fyy (0, 0) =0



0 1

1 0



Hint:

So H(0, 0) =

Hint:

      1 x x x By Taylors theorem, f (x, y) ≈ f (0, 0) + Df (0, 0)( ) + D2 f ( , ) y y y 2

Hint:

So 1 x f (x, y) ≈ 0 + x + 2 1 = x + (2xy) 2 = x + xy

  0 y 1

1 0

  x y

The second order approximation to f at (0, 0) is f (x, y) ≈ x + xy

It is kind of cool that we could also read this off from the following magic: y4 y2 + − ...) + xy 2! 4! xy 2 xy 4 = x + xy − + − ... 2! 4!

x cos(y) + xy = x(1 −

So it looks like the second order approximation is x + xy Solution

Using the first order approximation f (0.1, 0.2) ≈ 0.1

Solution

Using the second order approximation f (0.1, 0.2) ≈ 0.12

A calculator tells me f (0.1, 0.2) ≈ 0.11800665778. So clearly, the second order approximation is better. Notice that the second order approximation is slightly high, and this is apparent from our magical calculation, since the next term should 0.1(0.2)2 be − = −0.002, which gets us even closer to the exact answer. We will 2 make the magic more precise when we deal with the full multivariable taylors theorem later.

Question 5

Consider f : R3 → R defined by f (x, y, z) = xez+y + z 2 199

85

Taylor’s theorem examples

Solution Hint:

     x x x 1 2 f (x, y, z) ≈ f (0, 0, 1) + Df (0, 0, 1)( y ) + D f  y  ,  y  2 z−1 z−1 z−1

Hint:

Question 6



Solution

Hint:  Df (0, 0, 1) =

∂f ∂x

 = ez+y  = e 0

∂f ∂y

 ∂f ∂z (0,0,1)

xez+y  2

 xez+y + 2z (0,0,1)

The matrix of Df (0, 0, 1) is Solution Hint: ∂2f  ∂x∂x  2  ∂ f H(0, 0, 1) =   ∂y∂x  ∂2f ∂z∂x  0 = ez+y ez+y  0 e = e 0 e 0 

∂2f ∂x∂y ∂2f ∂y∂y ∂2f ∂z∂y ez+y xez+y xez+y  e 0 2

 ∂2f ∂x∂z   2 ∂ f   ∂y∂z  2 ∂ f  ∂z∂z (0,0,1)  ez+y xez+y  xez+y + 2 (0,0,1)

The hessian matrix of f at (0, 0, 1) is

Hint: 

 f (x, y, z) ≈ 1 + e

0

 x 1 2  y + x 2 z−1 

y

  0 z − 1 e e

e 0 0

  e x 0  y  2 z−1

= 1 + ex + 2(z − 1) + exy + ex(z − 1) + (z − 1)2 The second order taylor expansion of f about the point (0, 0, 1) is f (x, y, z) ≈1 + ex + 2(z − 1) + exy + ex(z − 1) + (z − 1)2

Question 7 • 200

Let f : R4 → R be a function with

f (0, 0, 0, 0) = 2

85





 Df (0, 0, 0, 0) = 1 −1 0  0 0 0  0 2 0 D2 f (0, 0, 0, 0) =  0 0 0 0 3 0

Taylor’s theorem examples

 0  0 3  0 0

Solution

Hint:

      x x x y  1 2 y  y           f (x, y, z, t) ≈ f (0, 0, 0, 0) + Df (0, 0, 0, 0)   + D f   ,   z z  z 2 t t t

Hint:

 f (x, y, z, t) ≈ 2 + 1

−1

0

  x  y  1   0  z  + 2 x t

y

z

 0  0 t  0 0

0 2 0 3

0 0 0 0

  0 x y  3   0  z  0 t = 2 + x − y + y 2 + 3yt

The second order approximation to f at (0, 0, 0, 0) is f (x, y, z, t) ≈ 2 + x − y + y 2 + 3yt

201

86

Optimization

Optimization means finding a biggest (or smallest) value. Suppose A is a subset of Rn , meaning that each element of A is a vector in Rn . Maybe A contains all vectors in Rn , maybe not. Further suppose f : A → R is a function. Question 1 Is there a vector ~v ∈ A so that f (~v ) is at least as large as any other output of f ? Solution (a)

This is always the case.

(b)

This is not necessarily the case.

X

It really does depend on A and on f . For example, suppose f (~v ) = h~v , ~v i, meaning f sends ~v to the square of the length of ~v . Further suppose that A = {~v ∈ Rn : |v| ≤ 1}. Then for all ~v ∈ A, it is the case that f (~v ) ≤ 1. And yet, there is not a single vector ~v ∈ A so that f (~v ) is at least as large as all outputs of f . If you claim that you have found a vector ~v so that f (~v ) is as large as any output of f , then you should consider the input w ~=

1 + |~v | · ~v , 2

and note that f (w) ~ > f (~v ).

Question 2

Let’s consider an example. Let g : R2 → R be the function given by   x g = 10 − (x + 1)2 − (y − 2)2 . y

Solution Hint:

No matter what x is, (x + 1)2 ≥ 0.

Hint:

No matter what y is, (y − 2)2 ≥ 0.

Hint:

No matter what x and y are, (x + 1)2 + (y − 2)2 ≥ 0.

Hint:

No matter what x and y are, −(x + 1)2 − (y − 2)2 ≤ 0.

Hint:

No matter what x and y are, 10 − (x + 1)2 − (y − 2)2 ≤ 10.

Hint:

If (x, y) = (−1, 2), then 10 − (x + 1)2 − (y − 2)2 = 10.

202

86

Hint:

Optimization

Consequently, the largest possible output of g is 10.

The largest possible output of g is 10. Solution

This largest possible output occurs when x is −1.

Solution

This largest possible output occurs when y is 2.

In this case, we were able to think through the situation by considering some algebra—namely the fact that when we square a real number, the result is nonnegative. Here is the key idea that motivates everything we are about to do: using the second derivative, we can approximate complicated functions by “quadratic” functions, and quadratic functions we can analyze just as we analyzed this example.

203

87

Definitions

“Local” means “after restricting to a small neighborhood.” Definition 1 Let X ⊂ Rn and f : X → R. To say that the maximum value of f occurs at the point p ∈ X is to say that, for all q ∈ X, we have f (p) ≥ f (q). Conversely, to say that the minimum value of f occurs at the point p ∈ X is to say that, for all q ∈ X, we have f (p) ≤ f (q). Sometimes people use the term “extremum value” to speak of both maximum values and minimum values. Sometimes people say “maxima” instead of maximums and “minima” instead of minimums. A function need not achieve a maximum or a minimum value. Our goal will be to use calculus to search for maximums and minimums, but that raises a problem. The derivative at a point is only describing what is happening around that point, so if we use calculus to search for extreme values, then we will only see “local” extremes. Definition 2 Let X ⊂ Rn and f : X → R. To say that a local maximum of f occurs at the point p ∈ X is to say that there is an  > 0 so that for all q ∈ X within  of p, we have f (p) ≥ f (q). Conversely, to say that a local minimum of f occurs at the point p ∈ X is to say that there is an  > 0 so that for all q ∈ X within  of p, we have f (p) ≤ f (q). Here’s an example of how this works out in practice. Let g : R2 → R be the function given by g(x, y) = x2 + y 2 + y 3 Question 3

Does this function g achieve a minimum value?

Solution (a)

No.

(b)

Yes.

X

That’s correct: there is no “global” minimum. No matter how negative you want the output to g to be, you can achieve it by looking at g(0, y) where y is a very negative number. On the other hand, if we restrict our attention near the point (0, 0), then g is nonnegative there. Solution

Whenever (x, y) is within of (0, 0), then g(x, y) ≥ g(0, 0) = 0.

As a result, g achieves a local minimum at (0, 0), in spite of the fact that there is no global maximum.

204

88

Critical points and extrema

Extremes happen where the derivative vanishes Definition 1 Let f : Rn → R be a function. A point p ∈ U is called a critical point of f if f is not differentiable at p, or if Df (p) is the zero map. Question 2 Consider f : R2 → R defined by f (x, y) = ex has only one critical point

2

+y 2

. The function f

Solution Hint:

Question 3

Solution

Hint: 

∂f ∂f Df (x, y) = ∂x ∂y h 2 2 = 2xex +y



2yex

2

+y 2

i

What is Df (x, y)?

Hint:

h 2 2 So we need 2xex +y

Hint:

This only occurs when x = 0 and y = 0

Hint:

Enter this as

2yex

2

+y 2

i

 = 0

0



  0 0

What is this critical point? Give you answer as a vertical vector.

Question 4

Consider f : R2 → R defined by f (x, y) = x3 + y 3 − 3xy.

Solution Hint:

Question 5

Solution

Hint: 

∂f ∂f Df (x, y) = ∂x ∂y  2 = 3x − 3y

  3y 2 − 3x

What is Df (x, y)?

Hint:

 So we need 3x2 − 3y

  3y 2 − 3x = 0

205

0



88

Critical points and extrema

Hint:

Hint:

(

3x2 − 3y = 0 3y 2 − 3x = 0

(

y = x2 x = y2

(

y = y4 x = y2

(

y(y − 1)(y 2 + y + 1) = 0 x = y2

The only two points that work are (0, 0) and (1, 1)

f has two critical points. One of them is (0, 0). What is the other?

You already had some practice with this concept in week 21 . Definition 6 A function f : Rn → R has a local maximum at the point p ∈ Rn if there is an  > 0 so that for all x ∈ Rn with |x − p| ≤ , f (x) ≤ f (p). Warning 7 The fact that the inequalities in this definition are not strict means that, for example, for the function f (x, y) = 1 every point is both a local maximum and a local minimum. Write a good definition for the local minimum of a function A function f : Rn → R has a local minimum at the point p ∈ Rn if there is an  > 0 so that for all x ∈ Rn with |x − p| ≤ , f (x) ≥ f (p). We call points which are either local maxima or local minima local extrema. Theorem 8 If f : Rn → R is a differentiable function, and p a local extremum. Then p is a critical point of f . Prove this theorem Let p be a local maximum. We want to show that Df (p)(~v ) = 0 for all ~v ∈ Rn . Recall that one formula for the derivative is Df (p)(~v ) = lim

t→0

f (p + t~v ) − f (p) t

f (p + t~v ) − f (p) Since f is differentiable, this limit must exist. As t → 0+ , we have ≤ t 0, since the numerator is less than or equal to zero by definition of a local maximum, and the denominator is greater than 0. So the limit must be less than or equal to 0 On the other hand, as t → 0− , the numerator is still less than 0, but the denominator is now negative, so the limit must be greater than or equal to 0. Therefore f (p + t~v ) − f (p) Df (p)(~v ) = lim =0 t→0 t 1 http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week2/practice/ stationary-points/

206

88

Critical points and extrema

Since we did this with an arbitrary vector ~v ∈ Rn , we see that Df (p) is the zero map. We leave the nearly identical case of a local minima to you. This theorem tells us that if we want to identify local extrema, a good place to start is by looking for all the critical points. It is worthwhile to note that just because a point is a critical point does not mean it is a local extrema: Example 9 Let f : R2 → R be defined by f (x, y) = x2 − y 2 . Then (0, 0) is a critical point of f (check this!), but (0, 0) is not a local extremum. In fact we can see that along the line y = 0, (0, 0) is a local maximum, while along the line x = 0 it is a local minimum. The graph of f looks like a saddle. Definition 10

A critical point which is not a local extremum is a saddle point.

Warning 11 A saddle point does not need to be a local minimum in some directions and a local maximum in others. For example, according to our definition 0 is a saddle point of f (x, y) = x3 In the next section we will learn how to determine when a critical point is a local maximum, minimum, or saddle by using the second derivative.

207

89

Second derivative test

Definiteness of the second derivative determines extreme behavior at critical points. In this section, we apply the second derivative to extreme value problems. Theorem 1 (Second derivative test) p ∈ Rn be a critical point of f . Then

Let f : Rn → R be a C 2 function. Let



If D2 f (p) is positive definite, p is a local minimum



If D2 f (p) is negative definite, p is a local maximum



If D2 f (p) is indefinite, then p is a saddle point



If D2 f (p) is only positive semidefinite or negative semidefinite, we get no information

Proof

By the second order taylor’s theorem, we have f (p + ~h) = f (p) + Df (p)(~h) + D2 f (p + ξ~h)(~h, ~h) for some ξ ∈ [0, 1]

Since p is a critical point, f (p + ~h) = f (p) + D2 f (p + ξ~h)(~h, ~h) for some ξ ∈ [0, 1] If D2 f (p) is positive definite, then because f is C 2 , D2 f (p + ξ~h) is also positive semidefinite for small enough ~h (this just uses continuity of the second derivative). Thus D2 f (p + ξ~h)(~h, ~h) > 0 for all small enough values of ~h, say |~h| ≤ . But this just says that f (p + ~h) > f (p), so p is a local minimum. If D2 f (p) is negative definite, a completely analogous proof works. If D2 f (p) is indefinite, then there are directions ~h1 where D2 f (p)(~h1 , ~h1 ) > 0, and ~h2 where D2 f (p)(~h2 , ~h2 ) < 0. By continuity again, for |h~i | <  , we have D2 f (p + ξ~h1 )(~h1 , ~h1 ) > 0 and D2 f (p + ξ~h2 )(~h2 , ~h2 ) < 0. So we have f (p + ξ1 h~1 ) > f (p) and f (p + ξ2 h~2 ) < f (p). Thus p is neither a local maximum nor a minimum, and so is a saddle point. The method of proof show why the semidefinite cases might break down. Without strict positivity or negativity, continuity does not guarantee that D2 f is still positive definite or negative definite for nearby points: you might slip into an indefinite case, for instance. 2 For a concrete counterexample in the semidefinite  case,  consider f (x, y) = x + 2 0 y 3 . (0, 0) is a critical point, but and the Hessian is positive semidefinite, 0 0 but (0, 0) is not a local minimum, since f (0, −) = −3 is always less  thanf (0, 0). 2 0 The trouble here is really that the Hessian at nearby points (x, y) is , which 0 6y is indefinite for y < 0.  We have already proven in the section on bilinear forms that the definiteness of a bilinear form is completely determined by the sign of the eigenvalues of the associated linear operator. 208

89

Second derivative test

For the following exercises, use whatever means necessary to compute the eigenvalues of the Hessian, use that information to determine the definiteness of the second derivative, and use this to draw extremum information about f . I recommend using a computer algebra system like Sage1 to compute the eigenvalues, but you can also use a free online app like this one2 . Question 2

Let f (x, y) = x3 + e3y − 3xey

Solution Hint:

 2 3x − 3ey

  Df (x, y) = 0 0    3e3y − 3xey = 0 0 ( 3x2 − 3ey = 0 3e3y − 3xey = 0 ( x2 = ey e3y = xey ( x2 = ey x = e2y ( x4 = x x = e2y

To satisfy the first solution, either x = 0 or x = 1. If x = 0, then x = e2y has no solutions, so we must have x = 1. Thus (1, 0) is the only critical point. What is the critical point of f ? Give your answer as a vertical vector. Solution  Hint:

H(x, y) =

Hint:

H(0, 0) =



6x −3ey

6 −3

−3ey 9ey −3 6





What is the Hessian matrix of f at (1, 0)? Solution Hint: By using a computer algebra system, we see that the eigenvalues of the Hessian are 3 and 9. Hint:

So D2 f (1, 0) is positive definite.

Hint:

Thus (1, 0) is a local minimum.

1 2

http://www.sagemath.org/ http://www.bluebit.gr/matrix-calculator/

209

89

Second derivative test

(a)

(1, 0) is a local maximum

(b)

(1, 0) is a local minimum

(c)

(1, 0) is a saddle point

(d)

The second derivative gives no information in this case

X

Observe that even though (1, 0) is the only local extremum, and this is a local minimum, (1, 0) is not a global minimum because f (1, 0) = −1 but f (−10, 0) = −1000 + 1 − 3(−10) = −969. Contemplate this carefully.

Question 3 Let f : R3 → R be defined by f (x, y, z) = ex+y+z −x−y−z+4z 2 +xy. f has a critical point at (0, 0, 0). Solution Hint:

Question 4   1 2 1 2 1 1  1 1 9

Hint:

Solution

The Hessian matrix of f at (0, 0, 0) is

Hint: According to computer algebra software, the eigenvalues of this matrix are approximately 9.31662, −1 and 2.68338, so the second derivative is indefinite

Hint:

Thus f has a saddle point at (0, 0, 0)

(a)

(0, 0, 0) is a local maximum

(b)

(0, 0, 0) is a local minimum

(c)

(0, 0, 0) is a saddle point

(d)

The second derivative gives no information in this case

Question 5

X

Let f (x, y) = cos(x + y 2 ) − x2 . (0, 0) is a critical point of f .

Solution 

−1 Hint: The hessian matrix of f at (0, 0) is 0 −1 and −2. Hint:

Thus D2 f (0, 0) is negative definite

Hint:

Thus f has a local maximum at (0, 0)

210

 0 which plainly has eigenvalues −2

89

Second derivative test

Hint: You can actually see that this is a global maximum, since the largest cos could ever be is 1, and the largest −x2 could ever be is 0, so 1 is the largest value that f ever attains, and this is attained at (0, 0). In fact this value is attained at infinitely many points on the line x = 0. (a)

(0, 0) is a local maximum

(b)

(0, 0) is a local minimum

X

(c)

(0, 0) is a saddle point

(d)

The second derivative gives no information in this case

211

90

Lagrange multipliers

Lagrange multipliers enable constrained optimization In the previous section we considered unconstrained optimization problems. Sometimes we want to find extreme values of a function f : Rn → R subject to some constraints. Example 1 Say we want to maximize f (x, y) = x2 y subject to the contraint that x2 +y 2 = 1. In words, we want to know among all points on the unit circle, which of them has the greatest product of the square of the first coordinate with the second coordinate. One way we can do this is to reduce it to a single variable calculus problem: we can parameterize the unit circle by γ(t) = (cos(t), sin(t)), and try to maximize f (γ(t)) : [0, 2π] → R. In other words, we are maximizing cos2 (t) sin(t) on the interval [0, 2π]. Definition 2 Let f : Rn → R be a function, and g : Rn → Rm be another function. A point p is called a local maximum of f with constraint g(x) = ~0 if there is an  > 0 such that if |x − p| <  and g(x) = 0, then f (x) < f (p) Give a definition of a constrained local minimum: Let f : Rn → R be a function, and g : Rn → Rm be another function. A point p is called a local minimum of f with constraint g(x) = ~0 if there is an  > 0 such that if |x − p| <  and g(x) = 0, then f (x) > f (p) The method we outlined above is a great way to find constrained local extrema: If you can parameterize g −1 ({0}), by some function M : U ⊂ Rk → Rn , then you can just try to find the unconstrained extrema of f ◦ M . The problem is that, although finding a parameterization of the circle was easy, more general sets might be harder to parameterize. Some of them might not be parameterizable by just one open set: you might have to use several patches. We will not pursue this method further. Instead we will develop an alternative method, the method of lagrange multipliers. Theorem 3 Let f : Rn → R and g : Rn → Rm . Assume g has the m component functions g1 , g2 , ..., gn : Rn → R. If p is a constrained extrema of f with the constraint g(x) = 0, and Rank(Dg(p)) = m, then there exist λ1 , λ2 , ...λm with Df (p) = λ1 Dg1 (p) + λ2 Dg2 (p) + ... + λm Dgm (p). The scalars λi are called Lagrange Multipliers. Proof The full proof of this theorem would require the implicit function theorem1 . We will make one intuitive assumption to get around this. A vector ~v should be tangent to the set g −1 ({~0}) at the point p if and only if Dg(p)(~v ) = 0. This is just because moving “infinitesmally” in the direction of a tangent vector to g −1 ({~0}) should not change the value of g to first order. Since p is a constrained maximum, moving in one of these tangent directions should not effect the value of f to first order either. We can summarize these intuitive statements as Null(Dg(p)) ⊂ Null(Df (p)). This is the assumption whose formal proof would require the implicit function theorem. 1

http://en.wikipedia.org/wiki/Implicit_function_theorem

212

90

Lagrange multipliers

Given this assumption, the result follows essentially formally from our work with linear algebra: Null(Dg(p)) ⊂ Null(Df (p)) Null(Df (p))⊥ ⊂ Null(Dg(p))⊥ Image(Df (p)> ) ⊂ Image(Dg(p)> )

This last line is exactly what we are trying to prove!



Now let’s get some practice actually using this theorem as a practical tool. First some questions with only one constraint equation: g : Rn → R Question 4

Solution

Hint: At a point constrained maximum point (x, y) we would need Df (x, y) = λDg(x, y) for some λ ∈ R. Hint: Solution

Hint:

Question 5

Solution

What is Df (x, y)?

What is Dg(x, y)?

 So we must have 2xy

  x2 = λ 2x

2y



Hint: (

2xy = 2λx x2 = 2λy

( y = λ x 6= 0 for if it were then y = 0 by the second equation x2 = 2λy ( y=λ x2 = 2λ2 ±1 But x2 + y 2 = 1, so we have 3λ2 = 1, or λ = √ 3 r Hint:

This results in only 4 possible extrema at (

2 1 ± ,±√ ) 3 3

s

2 The values of f at these points are ± . So the maximum value of 3 ∗ sqrt(3) s s 2 2 f on the unit circle is and the minimum value is − . 3 ∗ sqrt(3) 3 ∗ sqrt(3) Hint:

213

90

Lagrange multipliers

The maximum value of f (x, y) = x2 y subject to the constraint x2 + y 2 = 1 is 2/3 ∗ sqrt(3)

Question 6 Let f : R3 → R be defined by f (x, y, z) = x2 + y 2 + z 2 subject to the constraint that g(x, y, z) = xyz − 1 = 0. Solution Hint: At a point constrained maximum point (x, y) we would need Df (x, y) = λDg(x, y) for some λ ∈ R. Hint:

Question 7

Solution

Solution

What is Df (x, y)?

What is Dg(x, y)?

Hint:

 So we must have 2x

Hint:

  2x 2y   2z

2y

  2z = λ λyz

λxz

λxy



= λyz = λxz = λxy

Hint: Multiplying all of these equations together, we have 8xyz = λ3 (xyz)2 . Since xyz = 1, we have λ3 = 8, so λ = 2

Hint:   x y   z

= yz = xz = xy

  x y   z

= xz 2 = yx2 = zy 2

Hint: So x = ±1, y = ±1, z = ±1. So the only possible location of constrained extrema are (1, 1, 1), (1, −1, −1), (−1, 1, −1), (−1, −1, 1). At each of these points f (x, y, z) = 1. These are all local minima. f has no local or global maxima. The minimum value of f subject to this constraint is 1

Here is a question with two constraint equations: g : Rn → R2  x2 + y 2 + z 2 − 1 Here g(x, y, z) = x+y+z−1 

Question 8 Hint:

214

Hint:

 We need 1

−1

  0 = λ1 2x

2y

  2z + λ2 1

1

1



90

Lagrange multipliers

  2λ1 x + λ2 = 1 Hint: 2λ1 y + λ2 = −1   2λ1 z + λ2 = 0 Adding these all together and using that (x + y + z) = 1, we have 2λ1 + 3λ2 = 0. So the last equation becomes −3λ2 z + λ2 = 0, or λ2 (1 − 3z) = 0. λ2 6= 0, for otherwise λ1 = 0, which leads to a contradiction. So We know that z =

Hint:

1 3

There are only two points satisfying both constraints and with z =

1 3

Hint: Solving the system of 2 equations  2 x + y 2 + ( 1 )2 = 1 3 x + y + 1 = 0 3

√ √ √ √ 1− 2 1+ 2 1 1+ 2 1− 2 1 we obtain that the only two points which work are ( , , ) and ( , , ). 3 3 3 3 3 3

Hint:

So the maximum value of x − y is

√ 2 2 3

The maximum value of f (x, y, z) = x − y subject to the two constraints that x2 + y 2 + z 2 = 1 and x + y + z = 1 is 2sqrt(2)/3

215

91

Hill climbing in Python

Hill climbing is a computational technique for finding a local maximum. Let’s try hill climbing to—at least numerically—attempt to find a local maximum. The idea is the following: the gradient points up hill so if I want to find a local maximum, I should start somewhere and follow the gradient up. Hopefully I’ll find a point where the gradient vanishes (i.e., a critical point). Question 1

Here’s the procedure that I’d like you to code in Python:

(a)

Start with some point p.

(b)

Replace p with p plus a small multiple of ∇f (p).

(c)

If ∇f (p) is very small, stop!

(d)

Otherwise, repeat.

Solution Hint:

You might want to use some code to add and scale vectors, like

def add_vector(v,w): return [sum(v) for v in zip(v,w)] def scale_vector(c,v): return [c*x for x in v] def vector_length(v): return sum([x**2 for x in v])**0.5 Hint:

You may also want some code to compute the gradient numerically.

epsilon = 0.01 def gradient(f, p): n = len(p) ei = lambda i: [0]*i + [epsilon] + [0]*(n-i-1) return [ (f(add_vector(p, ei(i))) - f(p)) / epsilon for i in range(n) ]

Hint:

To do the hill climbing, we can put together these pieces.

def climb_hill(f, starting_point): p = starting_point nabla = gradient(f,p) while vector_length(nabla) > epsilon: p = add_vector(p, scale_vector(epsilon,nabla)) nabla = gradient(f,p) return p

Hint: Incidentally, be careful with your choice of  in this problem; if it is too small, the Python code might take too long to run!

216

91

1 2 3 4 5 6 7 8 9 10

Hill climbing in Python

Python def climb_hill(f, starting_point): p = starting_point # while gradient of f is pretty big at p # p = p + multiple of gradient f # return p # # here’s an example to try p = [3,6,2] f = lambda x: 10 - (x[0] + x[1])**2 - (x[0] - 3)**2 - (x[2] - 4)**2 print(climb_hill(f, p))

11 12 13 14 15

def validator(): f = lambda x: 10 - (x[0] - 2)**4 - (x[1] - 3)**2 - (x[2] - 4)**2 p = climb_hill(f, [3,6,2]) return abs(p[0] - 2) < 0.5 and abs(p[1] - 3) < 0.5 and abs(p[2] - 4) < 0.5

So you can use your program to find the maximum value of the function f : R3 → R given by f (x, y, z) = 10 − (x + y)2 − (x − 3)2 − (z − 4)2 . Solution

In this case, x is 3.

Solution

And y is −3.

Solution

And z is 4.

Fantastic!

217

92

Multilinear forms

Multilinear forms are separately linear in multiple vector variables. Definition 1 Let X be a set. We introduce the notation X k to stand for the set of all ordered k-tuples of elements of X. In other words, X k = X × X × · · · × X, with k “factors” of X. For example, if X = {cat, dog}, then X 3 is a set with eight elements, consisting of 3-tuples of either cat or dog. For example, (cat, cat, dog) ∈ X 3 . Definition 2 A k-linear form on a vector space V is a function T : V k → R which is linear in each vector variable. In other words, given (k − 1) vectors v1 , v2 , . . . , vi−1 , vi+1 , . . . , vk , the map Ti : V → R defined by Ti (v) = T (v1 , v2 , . . . , vi−1 , v, vi+1 , . . . , vk ) is linear. The k-linear forms on V form a vector space. Question 3 Let T : R2 × R2 × R2 → R be a trilinear form on R2 . Suppose we know that       1 1 1 • T , , =1 0 0 0       1 1 0 • T , , =2 0 0 1       1 0 1 • T , , =3 0 1 0       1 0 0 • T , , =4 0 1 1       0 1 1 , =5 • T , 1 0 0       0 1 0 • T , , =6 1 0 1       0 0 1 • T , , =7 1 1 0       0 0 0 • T , , =8 1 1 1 Solution Hint:                   1 1 1 1 1 1 0 1 1 T , , =T , , +T , , 1 2 0 0 2 0 1 2 0                         1 1 1 1 0 1 0 1 1 0 0 1 =T , , + 2T , , +T , , + 2T , , 0 0 0 0 1 0 1 0 0 1 1 0 = 1 + 2(3) + 5 + 2(7) = 26

218

92

T

Multilinear forms

      1 1 1 =26 , , 0 2 1

From the last example—and by analogy with the bilinear case—it is clear that if you know the value of a k−linear form on all k-tuples of basis vectors of V (there are (dimV )k of such), then you can find the value of T on any k-tuple of vectors. Definition 4 Let T : V k1 → R and S : V k2 → R be multilinear forms. Then we define their tensor product by T ⊗ S : V k1 +k2 → R by multiplication: (T ⊗ S)(v1 , v2 , . . . , vk1 +k2 ) = T (v1 , v2 , . . . , vk1 )S(vk1 +1 , vk1 +2 , . . . , vk1 +k2 ). Theorem 5 The k-linear forms dxi1 ⊗ dxi2 ⊗ · · · ⊗ dxik where 1 < ij < n form a basis for the space of all multilinear k-forms on Rn . In fact, X T (ei1 , ei2 , . . . , eik )dxi1 ⊗ dxi2 ⊗ · · · ⊗ dxik , T = where the sum ranges of all nk k-tuples of basis vectors. The proof is as straightforward as the corresponding proof for bilinear forms, but the notation is something awful. Question 6 Hint:

Solution   1 dx1 = 1. 2 

Hint:

dx2

Hint:

dx2

 −2 = 4. 4

  5 ) = 6. 6

Hint:

So putting this all together, we have       1 −2 5 (dx1 ⊗ dx2 ⊗ dx2 ) , , = 1 · 4 · 6 = 24 2 4 6       1 −2 5 (dx1 ⊗ dx2 ⊗ dx2 ) , , = 24. 2 4 6

Question 7  Let  T = dx1 ⊗ dx1 ⊗ dx1 + 4dx2 ⊗ dx2 ⊗ dx1 be a trilinear form on x 2 R . Let ~v = y Solution Hint: T (~v , ~v , ~v ) = dx1 ⊗ dx1 ⊗ dx1 (

            x x x x x x , , ) + 4dx2 ⊗ dx2 ⊗ dx1 ( , , ) y y y y y y

= x · x · x + 4y · y · x = x3 + 4y 2 x

219

92

Multilinear forms

As a function of x and y, T (~v , ~v , ~v ) = x3 + 4 ∗ y 2 ∗ x

As this example shows, applying a trilinear form to the same vector three times gives a polynomial. Solution Hint:

The monomial x3 has degree three.

Hint:

The monomial 4x2 x also has degree three.

Hint:

So the total degree of each monomial is three.

The total degree of each monomial is 3.

What we are seeing is a special case of the following result. Theorem 8 Applying a k-linear form to the same vector k-times gives a homogeneous polynomial of degree k.

220

93

Symmetry

Various sorts of symmetry are possible for multilinear forms. Definition 1

A k-linear form F is a symmetric if F (~v1 , ~v2 , . . . , ~vk ) = F (~vi1 , ~vi2 , . . . , ~vik ),

whenever (i1 , i2 , . . . , ik ) is a rearrangement of (1, 2, . . . , k). Let B : R2 × R2 → R be the bilinear form

Question 2

B = dx1 ⊗ dx2 + dx2 ⊗ dx1 . Solution   1 = 1. 2

Hint:

dx1

Hint:

  1 dx2 = 2. 2

Hint:

dx1

  3 = 3. 4

Hint:

dx2

  3 = 4. 4

Hint:

(dx1 ⊗ dx2 )

    1 3 , = 1 · 4. 2 4

Hint:

(dx2 ⊗ dx1 )

    1 3 , = 2 · 3. 2 4

Hint:

B

B

    1 3 , = 1 · 4 + 2 · 3 = 4 + 6 = 10. 2 4

    1 3 , = 10. 2 4

Solution     1 3 , = 3 · 2 + 4 · 1 = 6 + 4 = 10. 2 4     3 1 B , = 10. 4 2

Hint:

B

Is the bilinear form B symmetric? 221

93

Symmetry

Solution (a)

Yes.

(b)

No.

X

Now let’s consider trilinear forms. Let T : R2 × R2 × R2 → R be the trilinear form T = dx1 ⊗ dx1 ⊗ dx2 + dx1 ⊗ dx2 ⊗ dx1 Is the triilinear form T symmetric? Solution (a)

Yes.

(b)

No.

X

For example, compare T



1

  0 , 0

  1 , 1

 0

T



0

  0 , 1

  0 , 1

 0 .

to the value of Can you cook up some examples of symmetric trilinear forms? Sure! Here is an example: T = dx1 ⊗ dx1 ⊗ dx2 + dx1 ⊗ dx2 ⊗ dx1 + dx2 ⊗ dx1 ⊗ dx1 .

222

94

Higher order derivatives

Higher derivatives of a function are multilinear maps. The (k + 1)st order derivative of a function f : Rn → R at a point p is a (k + 1)linear form Dk+1 f (p), which allows us to approximate changes in the k th order derivative. This approximation works as follows. Dk (p+vk+1 ~ )(v~1 , ~v2 , . . . , ~vk ) = Dk (p)(v~1 , ~v2 , . . . , ~vk )+Dk+1 (p)(v~1 , ~v2 , . . . , ~vk , ~vk+1 )+ Error(p)(v~1 , ~v2 , . . . , ~vk , ~vk+1 ) = Error(p)(v~1 , ~v2 , . . . , ~vk , ~vk+1 ) where lim |v~1 ||v~2 | · · · |vk+1 ~ | ~ v1 ,~ v2 ,...,~ vk+1 →~ 0 0 X ∂kf Theorem 2 Dk f (p) = dxi1 ⊗ dxi2 ⊗ · · · ⊗ dxik , where the ∂xi1 ∂xi2 · · · ∂xik sum ranges over all k-tuples of basis covectors.

Definition 1

Question 3

f : R2 → R is defined by f (x, y) = x2 y.

Solution Hint: The only terms which are not zero are the terms involving 2 partial derivatives with respect to x and 1 partial derivative with respect to y. ∂3f ∂3f ∂3f dx ⊗ dx ⊗ dy + dx ⊗ dy ⊗ dx + dy ⊗ dx ⊗ dx ∂x∂x∂y ∂x∂y∂x ∂y∂x∂y

Hint:

So D3 f =

Hint:

So D3 f (0, 0, 0) = 2dx ⊗ dx ⊗ dy + 2dx ⊗ dy ⊗ dx + 2dy ⊗ dx ⊗ dx

Hint:

So D3 f (0, 0, 0)(

      1 3 0 , , ) = 2(2 · 3 · 1) + 2(1 · 4 · 0) + 2(2 · 3 · 0) = 12 2 4 1

      1 3 0 D3 f (0, 0)( , , ) =12 2 4 1 2 Assume  D f (p) = 3dx1 ⊗ dx2 + 3dx2 ⊗ dx1 . In other words, the 0 3 matrix of Df (p) is . Assume D3 f (p) = dx ⊗ dx ⊗ dx. 3 0

Question 4

Solution Hint:

Bythe definition of higher order derivatives,  we  have 0.3 0.3 D2 f (p + 0.2)(~v1 , v~2 ) ≈ D2 f (p)(v~1 , v~2 ) + D3 f (0.2 , v~1 , v~2 ) 0.3 0.2

Hint:

So     0.3 0.3 D2 f (p + 0.2)(~v1 , v~2 ) ≈ 3dx1 ⊗ dx2 (~v1 , ~v2 ) + 3dx2 ⊗ dx1 (~v1 , v~2 ) + dx ⊗ dx ⊗ dx(0.2 , v~1 , v~2 ) 0.3 0.3 = 3dx1 ⊗ dx2 (~v1 , ~v2 ) + 3dx2 ⊗ dx1 (~v1 , v~2 ) + 0.3dx ⊗ dx(v~1 , v~2 )

223

94

Higher order derivatives 

Hint:

The matrix of this bilinear form is

0.3 3

 3 0

 0.3 The matrix of the bilinear form D f (p + 0.2) is approximately 0.3 

2

224

95

Symmetry

In many nice situations, higher-order derivatives are symmetric. Recall that we once saw the following theorem. Theorem 1 Let f : Rn → R be a differentiable function. Assume that the partial derivatives fxi : Rn → R are all differentiable, and the second partial derivatives fxi ,xj are continuous. Then fxi ,xj = fxj ,xi . After interpreting the “second derivative” as a bilinear form, we were then able to say something nicer (though the hypothesis is stronger, so this is a weaker theorem). Theorem 2 Let f : Rn → R be a continuously twice differentiable function; then the bilinear form representing the second derivative is symmetric. And finally, we are in a position to formulate the higher-order version of this theorem. Theorem 3 Let f : Rn → R be a continuously k-times differentiable function; then the k-linear form representing the k-th order derivative is a symmetric form.

225

96

Taylor’s theorem

Higher order derivatives give rise to higher order polynomial approximations. Here is the statement of a statement of Taylor’s theorem for many variables. Theorem 1

Let f : Rn → R be a (k + 1)-times differentiable function. Then

1 1 1 1 f (p+~h) = f (p)+Df (p)(~h)+ D2 f (p)(~h, ~h)+ D3 f (p)(~h, ~h, ~h)+· · ·+ Dk f (p)(~hk )+ Dk+1 (p+ξ~h)(~hk+1 2! 3! k! (k + 1)! for some ξ ∈ [0, 1], where we have abbreviated the ordered tuple of i ~h0 s as ~hi Let’s apply this to a specific function. Question 2

Let f : R2 → R be defined by f (x, y) = ex+y .

Solution Hint:

The second order taylor approximation is 1 + (x + y) +

(x + y)2 2

Hint: Every partial derivative of this function is ex+y , so all of the third partial derivatives are 1

Hint:

So the third derivative is the sum of all of the following terms



dx ⊗ dx ⊗ dx



dx ⊗ dx ⊗ dy



dx ⊗ dy ⊗ dx



dx ⊗ dy ⊗ dy



dy ⊗ dx ⊗ dx



dy ⊗ dx ⊗ dy



dy ⊗ dy ⊗ dx



dy ⊗ dy ⊗ dy

      x x x Hint: Applying this tensor to ( , , ) we get xxx + xxy + xyx + xyy + yxx + y y y 3 yxy + yyx + yyy = (x + y)

Hint:

So the third order taylor expansion is 1 + (x + y) +

(x + y)3 (x + y)2 + 2 6

The third order taylor series of f about the point (0, 0) is 1+(x+y)+(x+y)2 /2+(x+y)3 /6

226

97

Python

There are numerical examples of higher-order Taylor series. In this exercise, given a function f, we compute a higher-order Taylor series for f numerically. Question 1

Suppose f is a Python function with two real inputs, perhaps

def f(x,y): return 2.71828182845904**(x+y) We have a couple of “numerical differentiation” functions epsilon = 0.001 def partial_x(f): return lambda x,y: (f(x+epsilon,y) - f(x,y))/epsilon def partial_y(f): return lambda x,y: (f(x,y+epsilon) - f(x,y))/epsilon We can build a linear approximation function. epsilon = 0.001 def linear_approximation(f): return lambda x,y: f(0,0) + x*partial_x(f)(0,0) + y*partial_y(f)(0,0) It is now your task to build a “quadratic approximation” function. Solution Hint:

We need only write down the second order Taylor series.

def quadratic_approximation(f): return lambda x, y: f(0,0) + x*partial_x(f)(0,0) + y*partial_y(f)(0,0) + x*x*partial_x(partial_x(f))(0

1 2 3 4 5 6 7 8 9 10 11

Python epsilon = 0.001 def partial_x(f): return lambda x,y: (f(x+epsilon,y) - f(x,y))/epsilon def partial_y(f): return lambda x,y: (f(x,y+epsilon) - f(x,y))/epsilon def quadratic_approximation(f): return lambda x,y: # the second order Taylor series approximation # # here’s an example to try f = lambda x,y: 2.71828182845904**(x+y) print(quadratic_approximation(f)(0.1,0.2)) # should be about exp(0.3), which is about 1.35

12 13 14 15 16 17 18

def validator(): f = lambda x,y: x*y if abs(quadratic_approximation(f)(2,3) - 6) > 0.1: return False f = lambda x,y: x*x if abs(quadratic_approximation(f)(2,3) - 4) > 0.1:

227

97

19 20 21 22 23 24 25 26 27 28 29

Python

return False f = lambda x,y: y*y if abs(quadratic_approximation(f)(2,3) - 9) > 0.1: return False f = lambda x,y: x if abs(quadratic_approximation(f)(2,3) - 2) > 0.1: return False f = lambda x,y: y if abs(quadratic_approximation(f)(2,3) - 3) > 0.1: return False return True

If you like this, you could build a version of this that produces a third order approximation.

228

98

Denouement

Farewell! That’s it! You have reached the end of this course. If you want a high level overview of essentially everything we did in this course, we recommend that you read this set of lecture notes1 . It has been a joy working with all of you and talking to you on the forums. I am grateful for the many people who submitted pull requests to fix errors in these notes. Keep an eye on this space: we will hopefully be improving this course by adding videos, questions, a peer review system for free response answers, and some interactive 3-D graphics! We also are planning a follow-up course on multivariable integral calculus using differential forms2 .

1 2

http://math.caltech.edu/~ma108a/notesderivatice.pdf http://en.wikipedia.org/wiki/Differential_form

229

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF