Linear Algebra - Jin Ho Kwak & Sungpyo Hong

January 6, 2018 | Author: Diyanko Bhowmik | Category: Matrix (Mathematics), System Of Linear Equations, Equations, Eigenvalues And Eigenvectors, Algebra
Share Embed Donate


Short Description

Book on Linear Algebra....

Description

JinHo Kwak Sungpyo Hong

Linear Algebra Second Edition

Springer Science+Business Medi~ LLC

JinHo Kwak Department of Matbematics Pohang University of Science and Technology Pohang, Kyungbuk 790-784 South Korea

Sungpyo Hong Department of Mathematics Pohang University of Science and Technology Pohang, Kyungbuk 790-784 SouthKorea

Library of Cougress Cataloging-in-PubHeation Data Kwak, lin Ho, 1948Linear algebra I lin Ho Kwak, Sungpyo Hong.-2nd ed. p.cm. Includes bibliographical references and index. ISBN 978-0-8176-4294-5 ISBN 978-0-8176-8194-4 (eBook) DOI 10.1007/978-0-8176-8194-4 1. Algebras, Linear. I. Hong, Sungpyo, 1948- ß. Title.

QAI84.2.K932004 512' .5-dc22

2004043751 CIP

AMS Subject Classifications: 15-01 ISBN 978-0-8176-4294-5

Printed on acid-free paper.

@2004 Springer Science+Business Media New York Originally published by Birkhlluser Boston in 2004 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrievaI, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to property rights.

987654321

SPIN 10979327

www.birkhasuer-science.com

Preface tothe Second Edition

This second edition is based on many valuable comments and suggestions from readers of the first edition. In this edition, the last two chapters are interchanged and also several new sections have been added. The following diagram illustrates the dependencies of the chapters.

Chapter 1 Linear Equations and Matrices

Chapter 2 Determinants

Chapter 4 Linear Transformations

Chapter 6 Diagonalization

Chapter 5 Inner Product Spaces

Chapter? Complex Vector Spaces

Chapter 8 Jordan Canonical Forms

vi

Preface to the Second Edition

The major changes from the first edition are the following. (1) In Chapter 2, Section 2.5.1 "Miscellaneous examples for determinants" is added as an application . (2) In Chapter 4, "A homogeneous coordinate system" is introduced for an application in computer graphics. (3) In Chapter 5, Section 5.7 "Relations of fundamental subspaces" and Section 5.8 "Orthogonal matrices and isometries" are interchanged. "Least squares solutions," "Polynomial approximations" and "Orthogonal projection matrices" are collected together in Section 5.9-Applications. (4) Chapter 6 is entitled "Diagonalization" instead of "Eigenvectors and Eigenvalues." In Chapters 6 and 8, "Recurrence relations," "Linear difference equations" and "Linear differential equations" are described in more detail as applications of diagonalizations and the Jordan canonical forms of matrices . (5) In Chapter 8, Section 8.5 "The minimal polynomial of a matrix" has been added to introduce more easily accessible computational methods for Anand e A , with complete solutions of linear difference equations and linear differential equations . (6) Chapter 8 "Jordan Canonical Forms" and Chapter 9 "Quadratic Forms" are interchanged for a smooth continuation of the diagonalization problem of matrices. Chapter 9 "Quadratic Forms" is extended to a complex case and includes many new figures. (7) The errors and typos found to date in the first edition have been corrected . (8) Problems are refined to supplement the worked-out illustrative examples and to enable the reader to check his or her understanding of new definitions or theorems. Additional problems are added in the last exercise section of each chapter. More answers, sometimes with brief hints, are added, including some corrections. (9) In most examples , we begin with a brief explanatory phrase to enhance the reader's understanding. This textbook can be used for a one- or two-semester course in linear algebra. A theory oriented one-semester course may cover Chapter 1, Sections 1.1-1.4, 1.6-1.7; Chapter 2 Sections 2.1-2.3; Chapter 3 Sections 3.1-3.6; Chapter 4 Sections 4.1-4.6; Chapter 5 Sections 5.1-5.4; Chapter 6 Sections 6.1-6.2; Chapter 7 Sections 7.1-7.4 with possible addition from Sections 1.8, 2.4 or 9.1-9.4. Selected applications are included in each chapter as appropriate. For a beginning applied algebra course, an instructor might include some ofthem in the syllabus at his or her discretion depending on which area is to be emphasized or considered more interesting to the students. In definitions , we use bold face for the word being defined, and sometimes an italic or shadowbox to emphasize a sentence or undefined or post-defined terminology.

Preface to the Second Edition

vii

Acknowledgement: The authors would like to express our sincere appreciation for the many opinions and suggestions from the readers of the first edition including many of our colleagues at POSTECH. The authors are also indebted to Ki Hang Kim and Fred Roush at Alabama State University and Christoph Dalitz at Hochschule Niederrhein for improving the manuscript and selecting the newly added subjects in this edition . Our thanks again go to Mrs . Kathleen Roush for grammatical corrections in the final manuscript, and also to the editing staff of Birkhauser for gladly accepting the second edition for publication.

JinHo Kwak Sungpyo Hong E-mail: [email protected] [email protected]

January 2004, Pohang, South Korea

Preface to the First Edition

Linear algebra is one of the most important subjects in the study of science and engineering because of its widespread applications in social or natural science, computer science, physics , or economics . As one of the most useful courses in undergraduate mathematics , it has provided essential tools for industrial scientists. The basic concepts of linear algebra are vector spaces, linear transformations, matrices and determinants, and they serve as an abstract language for stating ideas and solving problems . This book is based on lectures delivered over several years in a sophomore-level linear algebra course designed for science and engineering students. The primary purpose of this book is to give a careful presentation of the basic concepts of linear algebra as a coherent part of mathematics, and to illustrate its power and utility through applications to other disciplines . We have tried to emphasize computational skills along with mathematical abstractions , which have an integrity and beauty of their own. The book includes a variety of interesting applications with many examples not only to help students understand new concepts but also to practice wide applications ofthe subject to such areas as differential equations, statistics, geometry, and physics. Some of those applications may not be central to the mathematical development and may be omitted or selected in a syllabus at the discretion of the instructor. Most basic concepts and introductory motivations begin with examples in Euclidean space or solving a system of linear equations, and are gradually examined from different points of view to derive general principles . For students who have finished a year of calculus, linear algebra may be the first course in which the subject is developed in an abstract way, and we often find that many students struggle with the abstractions and miss the applications . Our experience is that, to understand the material, students should practice with many problems, which are sometimes omitted . To encourage repeated practice, we placed in the middle of the text not only many examples but also some carefully selected problems, with answers or helpful hints . We have tried to make this book as easily accessible and clear as possible , but certainly there may be some awkward expressions in several ways. Any criticism or comment from the readers will be appreciated .

x

Preface to the First Edition

We are very grateful to many colleagues in Korea, especially to the faculty members in the mathematics department at Pohang University of Science and Technology (POSTECH), who helped us over the years with various aspects of this book. For their valuable suggestions and comments, we would like to thank the students at POSTECH , who have used photocopied versions of the text over the past several years. We would also like to acknowledge the invaluable assistance we have received from the teaching assistants who have checked and added some answers or hints for the problems and exercises in this book. Our thanks also go to Mrs. Kathleen Roush who made this book much more readable with grammatical corrections in the final manuscript. Our thanks finally go to the editing staff of Birkhauser for gladly accepting our book for publication. Jin Ho Kwak Sungpyo Hong

April 1997, Pohang, South Korea

Contents

Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Linear Equations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Gaussian elimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Sums and scalar multiplications of matrices. . . . . . . . . . . . . . . . . . . . . 1.4 Products of matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Block matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Elementary matrices and finding A-I . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 LDU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.9 Applications..... . .. . . . .. . . .. . .. .... . . . .. .. . . . . ....... . ... . 1.9.1 Cryptography.. . . .. . . . . . . . .. . .. . .. .. .. . .. . .. . . .. .. .. 1.9.2 Electrical network 1.9.3 Leontief model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Exercises

1 4 11 15 19 21 23 29 34 34 36 37 40

Determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.1 Basic properties of the determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Existence and uniqueness of the determinant. . . . . . . . . . . . . . . . . . .. 2.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.4 Cramer's rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.5 Applications .... . ....... ... . . . .. . ... ............. ... ... .... 2.5.1 Miscellaneous examples for determinants. . . . . . . . . . . . . . .. 2.5.2 Area and volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2.6 Exercises ............. ..... . . .. . . .. . ...... .... .. .. . .. .. . . .

45 45 50 56 61 64 64 67 72

1

2

1

xii

Contents

3

Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1 The n-space jRn and vector spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Subspaces. . . . .. . . . ... . . .. . . . .. . . . . . ... . ... ... . . . . ... .... .. 3.3 Bases. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Dimensions . ... . ... .. .. ... .. .. . .. .. . . . . .... . .. ... . . . ... . .. 3.5 Rowand column spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.6 Rank and nullity 3.7 Bases for subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.8 Invertibility.... .. ...... ... .. ... . .. . . ... . . . . . . . . . . . . . . . . . . . 3.9 Applications 3.9.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 The Wronskian 3.10 Exercises

75 75 79 82 88 91 96 100 106 108 108 110 112

4

Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Basic propertiesof linear transformations 4.2 Invertiblelinear transformations ..................... 4.3 Matrices of linear transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.4 Vector spaces of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Change of bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Similarity. . . . .. .. . . . . . . . . . .. . . . . . .. .... .. . . . ... .. . . . ... . .. 4.7. Applications 4.7.1 Dual spaces and adjoint " 4.7.2 Computer graphics 4.8 Exercises ... . . . . . .. . ... . . . .. . .. . . . . . . . . . . . .. . .. . . . . . . . . . ..

117 117 122 126 131 134 138 143 143 148 152

5

Inner Product Spaces 5.1 Dot products and inner products 5.2 The lengths and angles of vectors.. . . . .. . .. .. . . .. . . .. .. . . .. . .. 5.3 Matrix representations of inner products 5.4 Gram-Schmidt orthogonalization 5.5 Projections. .... . .. . . ... .. . . ... .... .... ................. . . . 5.6 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Relations of fundamental subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Orthogonal matrices and isometries 5.9 Applications 5.9.1 Least squares solutions 5.9.2 Polynomial approximations 5.9.3 Orthogonalprojectionmatrices. . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Exercises

157 157 160 163 164 168 170 175 177 181 181 186 190 196

Contents

xiii

6

Diagonalization 6.1 Eigenvalues and eigenvectors 6.2 Diagonalization of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Applications 6.3.1 Linear recurrence relations . . . . . . . . . .. 6.3.2 Linear difference equations 6.3.3 Linear differential equations I . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Exponential matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.5 Applications continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.5.1 Linear differential equations II 6.6 Diagonalization of linear transformations . . . . . . . . . . . . . . . . . . . . . . . 6.7 Exercises . .. .. . .. .. .... . ... .... .... .. ....... . .. .. . ....... .

201 201 207 212 212 221 226 232 235 235 240 242

7

Complex Vector Spaces and complex vector spaces " 7.1 The n-space 7.2 Hermitian and unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.3 Unitarily diagonalizable matrices 7.4 Normal matrices 7.5 Application . ...... .. .... . ........ ..... .............. .. .. .. . 7.5.1 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.6 Exercises.. ...... .. .... . .. ..... . ... .... .... . . ........ ... . .

247 247 254 258 262 265 265 269

8

Jordan Canonical Forms 8.1 Basic properties of Jordan canonical forms . . . . . . . . . . . . . . .. 8.2 Generalized eigenvectors " 8.3 The power A k and the exponential eA .• •. •• • • • •. ••• •• •••. •.•. •. 8.4 Cayley-Hamilton theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8.5 The minimal polynomial of a matrix " 8.6 Applications. .... .. ... ...... .... ...... .. ..... .. .. .. .. ...... 8.6.1 The power matrix A k again 8.6.2 The exponential matrix eA again 8.6.3 Linear difference equations again . . . . . . . . . . . . . . . . . . . . . .. 8.6.4 Linear differential equations again. . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises

273 273 281 289 294 299 302 302 306 309 310 315

9

Quadratic Forms 9.1 Basic properties of quadratic forms " 9.2 Diagonalization of quadratic forms 9.3 A classification of level surfaces 9.4 Characterizations of definite forms " 9.5 Congruence relation 9.6 Bilinear and Hermitian forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Diagonalization of bilinear or Hermitian forms 9.8 Applications 9.8.1 Extrema of real-valued functions on jRn • • .• • • • • • •• . • • • . •

319 319 324 327 332 335 339 342 348 348

en

Contents

xiv

9.8.2 Constrained quadratic optimization 353 9.9 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

SelectedAnswersand mots

361

Bibliography

383

Index

385

Linear Algebra

1 Linear Equations and Matrices

1.1 Systems of linear equations One of the central motivations for linear algebra is solving a system oflinear equations. We begin with the problem of finding the solutions of a system of m linear equations in n unknowns of the following form :

I

a2lXI

+ +

amlXI

+

allXI

+ +

+ +

alnXn

a22x2

a2nXn

= =

b2

am2X2

+ . ., +

amnXn

=

b m,

a12 X2

bl

where Xl , X2, . . . , Xn are the unknowns and aij's and b.'s denote constant (real or complex) numbers. A sequence of numbers (Sl, S2, .. . ,sn) is called a solution of the system if Xl = sj , X2 = S2, , X n = Sn satisfy each equation in the system simultaneously. When bl = b2 = = b m = 0, we say that the system is homogeneous . The central topic of this chapter is to examine whether or not a given system has a solution , and to find the solution if it has one. For instance, every homogeneous system always has at least one solution XI = X2 = ... = Xn = 0, called the trivial solution . Naturally, one may ask whether such a homogeneous system has a nontrivial solution or not. If so, we would like to have a systematic method of finding all the solutions. A system of linear equations is said to be consistent if it has at least one solution, and inconsistent if it has no solution . For example, suppose that the system has only one linear equation

If ai = 0 for i = I, .. . , n, then the equation becomes 0 = b. Thus it has no solution if b i: 0 (nonhomogeneous), or has infinitely many solutions (any n numbers Xi'S can be a solution) if b 0 (homogeneous). In any case, if all the coefficients of an equation in a system are zero, the equation is vacuously trivial. In this book, when we speak of a system of linear equation, we

=

J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

2

Chapter 1. Linear Equations and Matrices

always assume that not all the coefficients in each equation of the system are zero unless otherwise specified. Example 1.1 The system of one equation in two unknowns x and y is ax +by = c,

in which at least one of a and b is nonzero. Geometrically this equation represents a straight line in the xy-plane. Therefore, a point P = (x, y) (actually, the coordinates x and y) is a solution if and only if the point P lies on the line . Thus there are infinitely many solutions which are all the points on the line. Example 1.2 The system of two equations in two unknowns x and y is a\x { a2 X

+ +

b\y

=

c\

b2Y

=

C2·

Solution: (I) Geometric method. Since the equations represent two straight lines in the xy-plane, only three types are possible as shown in Figure 1.1. Case (1)

{2xx-y -2y

=

Case (2) -1 -2

y

{x- y x-y

= =

Case (3)

{x+ y x-y

-1 0

y

x

=

1 0

y

x

Figure 1.1. Three types of solution sets

Since a solution is a point lying on both lines simultaneously, by looking at the graphs in Figure 1.1, one can see that only the following three types of solution sets are possible: (1) the straight line itself if they coincide, (2) the empty set if the lines are parallel and distinct, (3) only one point if they cross at a point. (II) Algebraic method. Case (1) (two lines coincide): Let the two equations represent the same straight line, that is, one equation is a nonzero constant multiple of the other. This condition is equivalent to

1.1. Systemsof linear equations

3

In this case, if a point (s, t) satisfies one equation, then it automatically satisfies the other too. Thus, there are infinitely many solutions which are all the points on the line. Case (2) (two lines are parallel but distinct) : In this case, a2 = Aa" bi = Ab" but C2 : /= ACl for A : /= O. (Note that the first two equalities are equivalent to a,b2 - a2b, = 0). Then no point (s, t) can satisfy both equations simultaneously, so that there are no solutions . Case (3) (two lines cross at a point) : Let the two lines have distinct slopes, which means aib: - a2b, ::/= O. In this case, they cross at a point (the only solution), which can be found by the elementary method of elimination and substitution. The following computation shows how to do this:

a,

Without loss of generality, one may assume : /= 0 by interchanging the two equations if necessary. (If both a, and a2 are zero, the system reduces to a system of one variable.) (1) Elimination : The variable x can be eliminated from the second equation by adding - a2 times the first equation to the second, to get

a,

(2) Since a, b2 - a2b, : /= 0, y can be found by multiplying the second equation by a nonzero number to get a,b2 - a2b,

a,

c, a,c2 -a2c, a,b2 - a2b,

(3) Substitution: Now, x is solved by substituting the value of y into the first equation, and we obtain the solution to the problem: b2Cl - b,C2

aib: - a2b,

Note that the condition a, b2 - a2b, : /= 0 is necessary for the system to have only one solution. D In Example 1.2, the original system of equations has been transformed into a simpler one through certain operations, called elimination and substitution, which is

4

Chapter 1. Linear Equations and Matrices

just the solution of the given system . That is, if (x, y) satisfies the original system of equations, then it also satisfies the simpler system in (3), and vice-versa. As in Example 1.2, we will see later that any system of linear equations may have either no solution, exactly one solution, or infinitely many solutions. (See Theorem 1.6.) Note that an equation ax + by + cz = d, (a, b, c) i- (0,0,0) , in three unknowns represents a plane in the 3-space ]R3. The solution set includes

+ by = + cz = {(O, y, z) I by + cz =

{(x, y, 0) I ax

d}

in the xy-plane,

{(x, 0, z) I ax

d}

in the xz-plane,

d}

in the yz-plane.

One can also examine the various possible types of the solution set of a system of three equations in three unknown. Figure 1.2 illustrates three possible cases.

Infinitly many solutions

Only one solution

No solutions

Figure 1.2. Three planes in ]R3

Problem 1.1 For a system of three linear equations in three unknowns allx a21x

1

a31 x

+ + +

anY a22Y a32Y

+ + +

al3Z a23Z a33Z

=

= =

bl b2 b3,

describe all the possible types of the solution set in the 3-space ]R3 .

1.2 Gaussian elimination A basic idea for solving a system of linear equations is to transform the given system into a simpler one, keeping the solution set unchanged, and Example 1.2 shows an idea of how to do it. In fact, the basic operations used in Example 1.2 are essentially only the following three operations , called elementary operations:

1.2. Gaussian elimination

5

(1) multiply a nonzero constant throughout an equation, (2) interchange two equations, (3) add a constant multiple of an equation to another equation. It is not hard to see that none of these operations alters the solutions . That is, if satisfy the original equations , then they also satisfy those equations altered by the three operations, and vice-versa. Moreover, each ofthe three elementary operations has its inverse operation which is also an elementary operation:

Xi ' S

(I') multiply the equation with the reciprocal of the same nonzero constant , (2') interchange two equations again, (3') add the negative of the same constant multiple of the equation to the other.

Therefore, by applying a finite sequence of the elementary operations to the given original system, one obtains another new system, and by applying these inverse operations in reverse order to the new system, one can recover the original system. Since none of the three elementary operations alters the solutions, the two systems have the same set of solutions. In fact, a system may be solved by transforming it into a simpler system using the three elementary operations finitely many times. These arguments can be formalized in mathematical language. Observe that in performing any of these three elementary operations , only the coefficients of the variables are involved in the operations, while the variables Xl, X2 , • • •, X n and the equal sign U= " are simply repeated . Thus, keeping the places of the variables and U= " in mind, we just pick up the coefficients only from the given system of equations and make a rectangular array of numbers as follows:

This matrix is called the augmented matrix of the system. The tenn matrix means just any rectangular array of numbers, and the numbers in this array are called the entries of the matrix. In the following sections, we shall discuss matrices in general. For the moment, we restrict our attention to the augmented matrix of a system. Within an augmented matrix, the horizontal and vertical subarrays

[ail a i2 . .. ain

bj)

and

are called the i-th ro w (matrix), which represents the i-th equation , and the j-th column (matrix), which are the coefficients of j -th variable X i - of the augmented

6

Chapter 1. Linear Equations and Matrices

matrix, respectively. The matrix consisting of the first n columns of the augmented matrix

is called the coefficient matrix of the system. One can easily see that there is a one-to-one correspondence between the columns of the coefficient matrix and variables of the system. Note also that the last column [bl b2 . . . bmf of the augmented matrix represents homogeneity of the system and so no variable corresponds to it. Since each row of the augmented matrix contains all the information of the corresponding equation of the system, we may deal with this augmented matrix instead of handling the whole system of linear equations, and the elementary operations may be applied to an augmented matrix just like they are applied to a system of equations . But in this case, the elementary operations are rephrased as the elementary row operations for the augmented matrix: (1st kind) multiply a nonzero constant throughout a row, (2Dd kind) interchange two rows, (3rd kind) add a constant multiple of a row to another row.

The inverse row operations which are also elementary row operations are (1st kind) multiply the row by the reciprocal of the same constant, (2Dd kind) interchange two rows again, (3rd kind) add the negative of the same constant multiple of the row to the other.

Definition 1.1 Two augmented matrices (or systems of linear equations) are said to be row-equivalent if one can be transformed to the other by a finite sequence of elementary row operations . Note that, if a matrix B can be obtained from a matrix A by these elementary row operations, then one can obviously recover A from B by applying the inverse elementary row operations to B in reverse order. Therefore, the two systems have the same solutions: Theorem 1.1 If two systems of linear equations are row-equivalent, then they have the same set ofsolutions. The general procedure for finding the solutions will be illustrated in the following example :

1.2. Gaussian elimination

7

Example 1.3 Solve the system of linear equations :

I

2y 2y

+ +

x 3x

4Y

+ + +

4z = 2z = 6z =

2 3 -1.

Solution: One could work with the augmented matrix only. However, to compare the operations on the system of linear equations with those on the augmented matrix, we work on the system and the augmented matrix in parallel. Note that the associated augmented matrix for the system is

[~ ; ~

~].

3 4 6 -1

(1) Since the coefficient of x in the first equation is zero while that in the second equation is not zero, we interchange these two equations :

I I

x

+

3x

+

2y 2y 4Y

+ + +

2z = 4z = 6z =

[b;; ;].

3 2 -1

3 4 6 -1

(2) Add -3 times the first equation to the third equation:

+

x

2y 2y 2y

-

+ +

2z = 4z = =

3 2 -10

[

1

2

o

-2

o

2

;o

;].

-10

Thus , the first variable x is eliminated from the second and the third equations. In this process , the coefficient 1 of the first unknown x in the first equation (row) is called the first pivot. Consequently, the second and the third equations have only the two unknowns y and z. Leave the first equat ion (row) alone, and the same elimination procedure can be applied to the second and the third equations (rows): The pivot to eliminate y from the last equation is the coefficient 2 of y in the second equation (row). (3) Add 1 times the second equation (row) to the third equation (row):

I

x

+

2y 2y

+ +

2z = 4z = 4z =

3 2 -8

;] . [b;; o 0 4 -8

The elimination process (i.e., (1) : row interchange , (2): elimination of x from the last two equations (rows), and then (3): elimination of y from the last equation (row)) done so far to obtain this result is called forward elimination. After this forward elimination, the leftmost nonzero entries in the nonzero rows are called the pivots . Thus the pivots of the second and third rows are 2 and 4, respectively. (4) Normalize nonzero rows by dividing them by their pivots. Then the pivots are replaced by 1:

8

r

2y y

{X

+2;

Chapter 1. Linear Equations and Matrices

+

+ +

= = =

2z 2z z

3 I

-2

[ o~ i ~ iJ . 0 I -2

The resulting matrix on the right-hand side is called a row-echelon form of the augmented matrix, and the I's at the pivotal positions are called the leading I's. The process so far is called Gaussian elimination. The last equation gives z = -2. Substituting z = -2 into the second equation gives y = 5. Now, putting these two values into the first equation, we get x = -3. This process is called back substitution.The computation is shown below: i.e., eliminating numbers above the leading I's, (5) Add -2 times the third row to the second and the first rows:

z

= = =

7 5

-2

2 0

U

~l

I 0 I -2

o

(6) Add -2 times the second row to the first row: { x

y

z

= = =

-3 5

-2

0 0 I 0 -35 o I -2

U

J

.

This resulting matrix is called the reduced row-echelon form of the augmented matrix, which is row-equivalent to the original augmented matrix and gives the solution to the system . The whole process to obtain the reduced row-echelon form is 0 called Gauss-Jordan elimination. In summary, by applying a finite sequence of elementary row operations , the augmented matrix for a system of linear equations can be transformed into its reduced row-echelon form, which is row equivalent to the original one. Hence the two corresponding systems have the same solutions. From the reduced row-echelon form, one can easily decide whether the system has a solution or not, and find the solution of the given system if it is consistent. Definition 1.2 A row-echelon form of an augmented matrix is of the following form: (1) The zero rows, if they exist, come last in the order of rows. (2) The first nonzero entries in the nonzero rows are I, called leading I's. (3) Below each leading I is a column of zeros. Thus, in any two consecutive nonzero rows, the leading I in the lower row appears farther to the right than the leading I in the upper row. The reduced row-echelon form of an augmented matrix is of the form: (4) Above each leading I is a column of zeros, in addition to a row-echelon form .

1.2. Gaussian elimination

9

Example 1.4 The first three augmented matrices below are in reduced row-echelon form, and the last one is just in row-echelon form.

[ ~ ~ ~] , [~~ ~ i ~] , [~~ ~ ~], [~i ~ ~]. 000

000 0 0

000 1

001 3

Recall that in an augmented matrix [A b], the last column b does not correspond to any variable. Thus , if the reduced row-echelon form of an augmented matrix for a nonhomogeneous system has a row of the form [ 0 0 ... 0 b ] with b f:. 0, then the associated equation is OXj + OX2 + . .. + OXn = b with b f:. 0, which means the system is inconsistent. If b = 0, then it has a row containing only O's, which can be neglected and deleted . In this example, the third matrix shows the former case, and the first two matrices show the latter case. 0 In the following example, we use Gauss-Jordan elimination again to solve a system which has infinitely many solutions . Example 1.5 Solve the following system of linear equations by Gauss-Jordan elimination. Xj + 3X2 2X3 = 3 2xj + 6X2 2X3 + 4X4 = 18 X2 + X3 + 3X4 = 10.

I

Solution: The augmented matrix for the system is

[

13-2 0 3]

2 6 -2 4 18 o 1 1 3 10

.

The Gaussian elimination begins with: (1) Adding -2 times the first row to the second produces

[

13-2 0 3]

o o

0 1

2 4 12 1 3 10

.

(2) Note that the coefficient of X2 in the second equation is zero and that in the third equation is not. Thus, interchanging the second and the third rows produces

[

13-2 0 3]

011310 o 0 2 4 12

.

(3) Dividing the third row by the pivot 2 produces a row-echelon form

10

Chapter 1. Linear Equations and Matrices

We now continue the back-substitution : (4) Adding -1 times the third row to the second, and 2 times the third row to the first produces

1 3 0 4 15] 1 0 1 4 . [ 0 1 2 6

o o

(5) Finally, adding -3 times the second row to the first produces the reduced row-echelon form:

13]

1 0 0 010 14 [ 001 2 6

.

The corresponding system of equations is Xl

1

X2 x3

+ + +

X4 X4

2X4

= =

=

3 4

6.

This system can be rewritten as follows:

1

~~

X3

: =

~

6

~:

2X4 .

Since there is no other condition on X4, one can see that all the other variables x}, X2, and X3 may be uniquely determined if an arbitrary real value t e lRis assigned to X4 (R denotes the set of real numbers): thus the solutions can be written as (Xl, X2, X3, x4)=(3-t , 4 - t , 6-2t, t), teR

o

Note that if we look at the reduced row-echelon form in Example 1.5, the variables and X3 correspond to the columns containing leading 1's, while the column corresponding to X4 contains no leading 1. An augmented matrix of a system of linear equations may have more than one row-echelon form, but it has only one reduced row-echelon form (see Remark (2) on page 97 for a concrete proof) . Thus the number of leading 1's in a system does not depend on the Gaussian elimination.

Xl , x2,

Definition 1.3 Among the variables in a system, the ones corresponding to the columns containing leading l's are called the basic variables, and the ones corresponding to the columns without leading 1's, if there are any, are called the free variables. Clearly the sum of the number of basic variables and that of free variables is equal to the total number of unknowns: the number of columns . In Example 1.4, the first and the last augmented matrices have only basic variables but no free variables, while the second one has two basic variables Xl and X3, and two

1.3. Sums and scalar multiplications of matrices

11

free variables X2 and X4. The third one has two basic variables XI and X2, and only one free variable X3. In general, as we have seen in Example 1.5, a consistent system has infinitely many solutions if it has at least one free variable, and has a unique solution if it has no free variable. In fact, if a consistent system has a free variable (which always happens when the number of equations is less than that of unknowns), then by assigning arbitrary value to the free variable, one always obtains infinitely many solutions. Theorem 1.2 If a homogeneous system has more unknowns than equations, then it

has infinitely many solutions. Problem 1.2 Suppose that the augmented matrices for some systems of linear equations have been reduced to reduced row-echelon forms as below by elementary row operations. Solve the systems:

(1)

[

1 0 0 I

o

o o o

0

5] -2 , 4

[bo ~ ~ ~

(2)

0 I 3

-I ]

6

.

2

Problem 1.3 Solve the following systems of equations by Gaussian elimination. What are the pivots?

(1)1

(3)\

-x + y + 3x + 4y + 2x + 5y + w + x -3w 17x 4w 17x 5x

2z

z

3z

=

0 0 O.

+ y + y + 2z 5z + 8y 2y + z

(2)

14X

=

3

3x

2y lOy

3y

z + 3z

I 5 6.

I I l.

Problem 1.4 Determine the condition on bi so that the following system has a solution. (1)

1~

3x

+ 2y + 6z 3y

2z y + 4z

=

bl b2 b3·

(2) {

~ 4x

+ 3y

2z

+ 3z + 2y + Z Y

bl b2 b3·

1.3 Sums and scalar multiplications of matrices Rectangular arrays of real numbers arise in many real-world problems. Historically, it was an English mathematician J.1. Sylvester who first introduced the word "matrix " in the year 1848. It was the Latin word for womb, as a name for an array of numbers. Definition 1.4 An m by n (written mxn) matrix is a rectangular array of numbers arranged into m (horizontal) rows and n (vertical) columns . The size of a matrix is specified by the number m of the rows and the number n of the columns .

12

Chapter 1. Linear Equations and Matrices

In general, a matrix is written in the following form:

or just A = [aij] if the size of the matrix is clear from the context. The number aij is called the (i, j)-entry of the matrix A , and is written as aij = [A]ij. An m x 1 matrix is called a column (matrix) or sometimes a column vector, and a 1 xn matrix is called a row (matrix), or a row vector. In general, we use capital letters like A, B, C for matrices and small boldface letters like x, y, z for column or row vectors. Definition 1.5 Let A = [aij] be an mxn matrix. The transpose of A is the nxm matrix, denoted by AT, whose j -th column is taken from the j -th row of A: that is, [AT]ij = [Alji. Example 1.6 (1) If A =

[135]

,then AT =

2 4 6

[12] ;:

.

(2) The transpose of a column vector is a row vector and vice-versa:

X

-- [

X~:'n~

{:=:}

]

xT =

[XI X2 • ••

xnl.

o

Definition 1.6 A matrix A = [aij] is called a square matrix of order n if the number of rows and the number of columns are both equal to n. Definition 1.7 Let A be a square matrix of order n. (1) The entries au, a22, . . . , ann are called the diagonal entries of A. (2) A is called a diagonal matrix if all the entries except for the diagonal entries are zero. (3) A is called an upper (lower) triangular matrix if all the entries below (above, respectively) the diagonal are zero. The following matrices U and L are the general forms of the upper triangular and lower triangular matrices, respectively:

~~: a~2

ln

. .. :

a2n a

. a~n

]

,

L= [

a~1

a n2

Jl

1.3. Sums and scalar multiplications of matrices

13

Note that a matrix which is both upper and lower triangular must be a diagonal matrix, and the transpose of an upper (lower, respectively) triangular matrix is lower (upper, respectively) triangular. Definition 1.8 Two matrices A and B are said to be equal, written A = B , if their sizes are the same and their corresponding entries are equal : i.e., [A]ij = [B]ij for all i and j.

=

This definition allows us to write a matrix equation. A simple example is (ATl A by definition . Let Mmxn{lR) denote the set of all m x n matrices with entries of real numbers. Among the elements of Mmxn{lR), one can define two operations, called the scalar multiplication and the sum of matrices, as follows:

Definition 1.9 (I)(Scalarmultiplication) Foranm xn matrix A = [aij] E Mmxn{lR) and a scalar k E lR (which is simply a real number), the scalar multiplication of k and A is defined to be the matrix kA such that [kA]ij k[A]ij for all i and j: i.e., in an expanded form:

=

. ..

a ll

.

k [

.

.

.

amI

(2)(Sum ofmatrices) For two matrices A = [aij] and B = [bij] in Mmxn(R), the sum of A and B is defined to be the matrix A + B such that [A + B]ij [A]ij + [B]ij for all i and j : i.e., in an expanded form:

=

aln]

:.

amn

+

[b

11

:.

al

.. .

'. '

bml

n

7

bIn ].

amn +bmn

The resulting matrices kA and A + B from these two operations also belong to Mmxn(lR) . In this sense, we say Mm xn(lR) is closed under the two operations. Note that matrices of different sizes cannot be added ; for example, a sum

124]+[ad e fc] [3 b

cannot be defined. If B is any matrix , then - B is by definition the multiplication (-1) B. Moreover, if A and B are two matrices of the same size, then the subtraction A - B is by definition the sum A + (-1) B. A matrix whose entries are all zeros is called a zero matrix, denoted by the symbol 0 (or Omxn when the size is emphasized). Clearly, matrix sum has the same properties as the sum of real numbers. The real numbers in the context here are traditionally called scalars even though "numbers" is a perfectly good name and "scalar" sounds more technical. The following theorem lists the basic arithmetic properties of the sum and scalar multiplication of matrices.

14

Chapter 1. Linear Equations and Matrices

Theorem 1.3 Suppose that the sizes of A, Band C are the same. Then the following arithmetic rules ofmatrices are valid: (1) (2) (3) (4) (5) (6) (7)

(A + B) + C = A + (B + C) , (written as A A+ 0 0+ A A, A+(-A)=(-A)+A=O, A + B = B + A, k (A + B) = kA + k.B, (k + £)A = kA + lA, (kl)A = k (fA).

=

=

+

B

+

C)

(Associativity),

(Commutativity),

Proof: We prove only the equality (S) and the remaining ones are left for exercises. For any (i, j), ij

= k[A +

= k([A]ij +

B]ij

[B]ij)

= [kA] ij +

[kB]ij

= (A +

+

= [kA +kB]ij .

In particular, A + A = 2A , A + (A + A) = 3A nA = (n - l)A + A for any positive integer n,

A)

o

A, and inductively

Definition 1.10 A square matrix A is said to be symmetric if AT = A , or skewsymmetric if AT = - A . For example , the matrices A =

[~ ~

b c

b ] ,

B =

sc

[-~ ~; ] -2 -3 0

,

are symmetric and skew-symmetric, respectively. Notice that all the diagonal entries of a skew-symmetric matrix must be zero, since aii = -aii . By a direct computation, one can easily verify the following properties of the transpose of matrices: Theorem 1.4 Let A and B be m x n matrices. Then (kA)T = kAT and (A

+

B)T = AT

+

B T.

Problem 1.5 Prove the remaining parts of Theorem 1.3. Problem 1.6 Find a matrix B such that A A

=

+

[

Problem 1.7 Find a , b , c and d such that

BT

= (A -

2-3 0]

4 -1 3 -1 0 1

3]

B)T , where

.

a+9] b . [ ac db] = 2[a2 a+c + [2+b c+d

1.4. Products of matrices

15

1.4 Products of matrices The sum and the scalar multiplic ation of matrices were introduced in Section 1.3. In this section , we introduce the product of matrices. Unlike the sum of two matrices , the product of matrices is a little bit more complicated, in the sense that it can be defined for two matrice s of different sizes. The product of matrices will be defined in three steps: Step ( 1) Product of vectors: For a 1 x n row vector a = [al a2 . . . an] and an n x 1 column vector x = [XI X2 ... Xnf , the product ax is a 1 x 1 matrix (i.e., just a number) defined by the rule

ax

~

[al a2 ... a.] [

~~ ~ ]

[a,x, + a2X2 + .. .+ a.x.l

~

[t.aiXi] .

Note that the number of entries of the first row vector is equal to the number of entries of the second column vector, so that an entry-wise multiplication is possible. Step (2) Product of a matrix and a vector: For an m x n matrix

with the row vectors a i'S and for an n x 1 column vector x = [XI'" xn]T, the product Ax is by definition an m x 1 matrix , whose m rows are computed according to Step (1 ):

Ax =

[:~: :~~ ami a m2

n al oi« ] [ XI] X2

a~n



=

[alX] a2X [I:?=l I:?=I aloux.ix i ] a~x = I:?=I:amiXi .

Therefore, for a system of m linear equations in n unknowns, by writing the n unknowns as an n x 1 column matrix x and the coefficients as an m x n matrix A, the system may be expressed as a matrix equation Ax = b . Step (3) Product of matrices: Let A be an m x n matrix and B an n x r matrix with columns bj , b2, . . . , b.. written as B = [bl b2 .. . b.]. The product AB is defined to be an m x r matrix whose r columns are the products of A and the r columns of B, each computed according to Step (2) in corresponding order. That is,

AB = [

Abl

16

Chapter 1. Linear Equations and Matrices

which is an m x r matrix. Therefore , the (i , j)-entry [AB] ij of AB is: [A Blij

= a ib j = ailbl j +

ai2b 2j

+ . .. +

ainbnj

=

Lk=1 aikbkj .

This can be easily memorized as the sum of entry-wise multiplications of the boxed vectors in Figure 1.3.

AB

:

[ bll

~ amI

amn

am2

...

blj b2j

bl r b2r

bnj

bnr

J

Figure 1.3. The entry [AB]ij

Example 1.7 Consider the matrices

A=[~ ~l

B=

1 0 . [51 - 20]

The columns of AB are the product of A and each column of B :

[~ ~] [;]

[~ ~] [-i] =

[~ ~][~]

2 · 1 + 3 . 5 ] _ [ 17 ] [ 4 ·1+0 ·5 4 ' 2 ·2 + 3 . (-1) ] _ [ 1 ] [ 4 ·2+0·(-1) 8 ' 2 .0 + 3.0 ] _ [ 0 ] [ 4 ·0+0 ·0 0 .

Therefore , AB is

2 3] [1 2 0] = [17 1 0] [ 4 0 5 -1 0 4 8 0 . Since A is a 2 x2 matrix and B is a 2x3 matrix, the product AB is a 2x3 matrix. If we concentrate, for example, on the (2, I)-entry of AB, we single out the second row from A and the first column from B, and then we multiply corresponding entries together and add them up, i.e., 4 . 1 + 0 . 5 = 4. 0 Note that the product AB of A and B is not defined if the number of columns of A and the number of rows of B are not equal.

1.4. Products of matrices

17

Remark: In step (2), instead of defining a product of a matrix and a vector, one can define alternatively the product of a 1 x n row matrix a and an n x r matrix Busing the same rule defined in step (1) to have a 1 x r row matrix aB. Accordingly, in step (3) an appropriate modification produces the same definition of the product of two matrices . We suggest that readers complete the details. (See Example 1.10.) The identity matrix of order n, denoted by In (or I if the order is clear from the context) , is a diagonal matrix whose diagonal entries are all 1, i.e.,

By a direct computation, one can easily see that AIn = A

= InA for any n x n matrix

A.

The operations of scalar multiplication, sum and product of matrices satisfy many, but not all, of the same arithmetic rules that real or complex numbers have. The matrix Om xn plays the role of the number 0, and In plays that of the number 1 in the set of usual numbers. The rule that does not hold for matrices in general is commutativity AB = BA of the product, while commutativity of the matrix sum A + B = B + A always holds . The follow ing example illustrates noncommutativity of the product of matrices. Example 1.8 (Noncommutativity of the matrix product) Let A =

[~ _ ~]

l [_~ ~ l

and B =

AB =

[~ ~

Then,

BA =

[~ -~] 0

which shows AB =1= BA .

The following theorem lists the basic arithmetic rules that hold in the matrix product. Theorem 1.5 Let A, B, C be arbitrary matrices for which the matrix operations below can be defined, and let k be an arbitrary scalar. Then (1) A(BC) = (AB)C, (written as ABC) (2) A(B + C) = AB + AC, and (A + B)C = AC (3) I A = A = AI, (4) k(BC) = (kB)C = B(kC), (5) (AB)T B T AT.

=

+

BC,

(Associativity), (Distributivity),

18

Chapter 1. Linear Equations and Matrices

Proof: Each equality can be shown by direct computation of each entry on both sides of the equalities. We illustrate this by proving (I) only, and leave the others to the reader. Assume that A = [aij] is an mxn matrix, B = [bkl] is an nxp matrix, and C [cstl is a pxr matrix. We now compute the (i, i)-entry of each side of the equation. Note that BC is an n xr matrix whose (i , i)-entry is [BC]ij = L:f=l bnc:..l : Thus

=

n

[A(BC)]ij

n

= :~::>i/l [ B C]/lj = L /l=1

n

p

ai/l I)/l}.C}.j

/l=1

p

=L

Lai/lb/l}.C}.j. /l=1 }.=I

}.=I

Similarly, AB is an m x p matrix with the (i, i)-entry [A B]ij = L:~=I ai/lb/lj, and p

[(AB)C]ij

p

= L[AB]i}.C}.j = L }.=l

n

L

n

ai/lb/l}.c}.j

=L

L ai/lb/l}.c}.j' /l=1 }.=I

}.=I/l=1

This clearly shows that [A(BC)]ij = A(BC) (AB)C as desired.

=

p

[(AB)C]ij for all i,

i.

and consequently

0

Problem 1.8 Give an exampleof matrices A and B such that (AB)T f= AT BT . Problem 1.9 Proveor disprove: If A is not a zeromatrixand AB is it true or not that AB = 0 implies A = 0 or B = O?

= AC,thenB = C.Similarly,

Problem 1.10 Showthatany triangularmatrix A satisfying A AT

= AT A is a diagonalmatrix.

Problem 1.11 For a square matrix A , showthat

(1) AA T and A + AT are symmetric, (2) A - AT is skew-symmetric, and (3) A can be expressedas the sum of symmetric part B

part C

= ~(A -

AT) , so that A

=B+

= ~ (A +

AT) and skew-symmetric

C.

As an application of our results on matrix operations, one can prove the following important theorem: Theorem 1.6 Any system of linear equations has either no solution, exactly one solution, or infinitely many solutions. Proof: We have seen that a system of linear equations may be written in matrix form as Ax = b. This system may have either no solution or a solution. If it has only one solution, then there is nothing to prove. Suppose that the system has more than one solution and let Xl and X2 be two different solutions so that AXI = band AX2 = b. Let Xo = XI - X2· Then xo f= 0, and Axo = A(XI - X2) = O.Thus

1.5. Block matrices

19

+ kxo) == AXI + kAxo = b. This means that Xl + kxo is also a solution of Ax = b for any k. Since there are A(XI

infinitely many choices for k, Ax = b has infinitely many solutions.

0

Problem 1.12 For which values of a does each of the following systems have no solution,

exactly one solution, or infinitely many solutions? (1)

I I

+

2y

+

y

x -

y 3y ay

x

3x

4x

(2)

x

2x

+ +

y

3z Sz

+ + + + +

(a 2 - 14)z

z

=

az

3z

4 2 a

+

2.

1 2 3.

1.5 Block matrices In this section we introduce some techniques that may be helpful in manipulations of matrices. A submatrix of a matrix A is a matrix obtained from A by deleting certain rows and/or columns of A. Using some horizontal and vertical lines, one can partition a matrix A into submatrices, called blocks, of A as follows: Consider a matrix

divided up into four blocks by the dotted lines shown. Now, if we write All

= [all a21

a12 a 13], A12 a22 a23

= [ a14 a24

] ,

then A can be written as

called a block matrix. The product of matrices partitioned into blocks also follows the matrix product formula, as if the blocks Aij were numbers: If A = [All

A12] A2l An

and

B = [Bll B12] B2l B22

are block matrices and the number of columns in Aik is equal to the number of rows in Bkj, then

20

Chapter 1. Linear Equations and Matrices _ -

AB

[AllBll A21Bll

+ +

Al2 B21 A22B21

AII Bl2 A2I Bl2

+ +

Al2 B22 ] A22 B22 .

This will be true only if the columns of A are partitioned in the same way as the rows of B. It is not hard to see that the matrix product by blocks is correct. Suppose, for example, that we have a 3x3 matrix A and partition it as

I a 13] I a23 I a33

a ll al2 a21 a22

A =

[ a31 an

[ A All

=

21

and a 3 x 2 matrix B which we partition as

[:~:E3i1J3i :~~] = [ :~: ] .

B=

Then the entries of C = [Cij] = A B are Cij = (ailb lj

+

ai2b2j)

+

ai3b3j.

The quantity ailblj + ai2b2j is simply the (i, j)-entry of AllBll if i .:::: 2, and is the (i , j)-entry of A21 Bll if i = 3. Similarly, ai3b3j is the (i, j)-entry of A l2B21 if i .:::: 2, and of A22B21 if i = 3. Thus AB can be written as

=[

AB

=[

Cll ] e21

AllBll A21 Bll

+ +

AI2 B21 ] . A22 B21

Example 1.9 If an m x n matrix A is partitioned into blocks of column vectors : i.e., A = [CI C2 cn] , where each block Cj is the j-th column, then the product Ax with x = [XI xnf is the sum of the block matrices (or column vectors) with coefficients X /s:

Ax

~ ICI

'2

where Xj Cj = xj[alj a2j but the vector equation XI CI

',] [

+

~~ ~ ]

x1'd X2'2 +.

+ x""

anjf . Hence, a matrix equation Ax XnC n = b.

X2C2

+ .. . +

= b is nothing 0

Example 1.10 Let A be an m x n matrix partitioned into the row vectors ar , a2, ... , an as its blocks, and let B be an n x r matrix so that their product AB is well defined . By considering the matrix B as a block , the product AB can be written at a2 .

AB =

[

..

am

J =[ B

alB a2B

..

.

amB

J=

l [alb a2bl .

alb2 a2b2

ambl

amb2

..

1.6. Inverse matrices

21

where b., b2, . .. , b, denote the columns of B . Hence, the row vectors of AB are the products of the row vectors of A and the column vectors of B. 0

Problem 1.13 Compute AB using block multiplication, where

A=

0]

1211

[

-3 ~~ ~

o 01 2 -1

1

,

B=

[

~

3

012]

~~

.

-2 I 1

1.6 Inverse matrices As shown in Section 1.4, a system of linear equations can be written as Ax = b in matrix form. This form resembles one of the simplest linear equations in one variable ax = b whose solution is simply x = a-I b when a f= O. Thus it is tempting to write the solution of the system as x = A -I b. However, in the case of matrices we first have to assign a meaning to A -I . To discuss this we begin with the following definition. Definition 1.11 For an m x n matrix A, an n x m matrix B is called a left inverse of A if BA = In, and an n x m matrix C is called a right inverse of A if AC = 1m • Example 1.11 (One -sided inverse) From a direct calculation for two matrices A

we have AB =

=

h , and

[1 2-1] andB=[-~ -2 2 0

BA =

1

[-~

-; - : ]

12 -4

f=

-3 ] 5

,

7

13 .

9

Thus, the matrix B is a right inverse but not a left inverse of A , while A is a left inverse but not a right inverse of B. 0 A matrix A has a right inverse if and only if AT has a left inverse, since (AB)T = B T AT and IT = I. In general, a matrix with a left (right) inverse need not have a right (left , respectively) inverse. However, the following lemma shows that if a matrix has both a left inverse and a right inverse, then they must be equal: Lemma 1.7 If an n x n square matrix A has a left inverse B and a right inverse C, then Band C are equal, i.e., B = C. Proof: A direct calculation shows that B = BIn = B(AC) = (BA)C = InC = C .

o

22

Chapter 1. Linear Equations and Matrices

By Lemma 1.7, one can say that if a matrix A has both left and right inverses, then any two left inverses must be both equal to a right inverse C, and hence to each other. By the same reason, any two right inverses must be both equal to a left inverse B, and hence to each other. So there exists only one left and only one right inverse, which must be equal. We will show later (Theorem 1.9) that if A is a square matrix and has a left inverse, then it has also a right inverse, and vice-versa. Moreover, Lemma 1.7 says that the left inverse and the right inverse must be equal. However, we shall also show in Chapter 3 that any non-square matrix A cannot have both a right inverse and a left inverse: that is, a non-square matrix may have only a one-sided inverse. The following example shows that such a matrix may have infinitely many one-sided inverses.

[~

!]

B=

[~ ~ ~] is a left inverse of A.

Example 1.12 (Infinitely many one-sided inverses) A non-square matrix A =

can have more than one left inverse, In fact, for any x, Y E Ill. the matrix 0

Definition 1.12 An n x n square matrix A is said to be invertible (or nonsingular) if there exists a square matrix B of the same size such that

AB = In = BA. Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix A is said to be singular if it is not invertible. Lemma 1.7 implies that the inverse matrix of a square matrix is unique. That is why we call B 'the' inverse of A. For instance, consider a2x2 matrix A = If ad - be

:/= 0, then

it is easy to verify that

d

A-I _ 1 - ad -be

-b

[d-c -b] _[ ad-c- be a -

ad - be

ad - be

ad -be

a

since AA -1 = l: = A -1 A . Note that any zero matrix is singular.

[~ ~].

l

Problem 1.14 Let A be an invertible matrix and k any nonzero scalar. Show that

=

(1) A-I is invertible and (A-I)-I A; (2) the matrix kA is invertible and (kA)-I = tA-I; (3) AT is invertible and (AT)-I = (A-I)T.

Theorem 1.8 The product of invertible matrices is also invertible, whose inverse is the product of the individual inverses in reversed order:

1.7. Elementary matrices and finding A-I

23

Proof: Suppose that A and B are invertible matrices of the same size. Then (A B)(B - I A- I ) = A(BB-I)A- I = AlA-I = AA- I = I , and similarly (B - 1A-I) (AB) = I. Thus , AB has the inverse B- 1A-I .

o

The inverse of A is written as ' A to the power -1 ' , so one can give the meaning of Ak for any integer k : Let A be a square matrix. Define A0 = I. Then, for any positive integer k, we define the power A k of A inductively as

Moreover, if A is invertible, then the negative integer power is defined as

It is easy to check that AkH = AkA l whenever the right-hand side is defined. (If A is not invertible, A3+(-I) is defined but A-I is not.) Problem 1.15 Prove:

(1) If A has a zero row, so does AB . (2) If B has a zero column, so does AB. (3) Any matrix with a zero row or a zero column cannot be invertible. Problem 1.16 Let A be an invertible matrix. Is it true that (Ak) T Justify your answer.

= (A T) k

for any integer k?

1.7 Elementary matrices and finding A-I We now return to the system of linear equations Ax = b. If A has a right inverse B so that AB = 1m , then x = Bb is a solution of the system since Ax

= A (Bb) = (AB)b = b .

(Compare with Problem 1.23). In particular, if A is an invertible square matrix, then it has only one inverse A- I, and x = A- Ib is the only solution of the system. In this section, we discuss how to compute A-I when A is invertible. Recall that Gaussian elimination is a process in which the augmented matrix is transformed into its row-echelon form by a finite number of elementary row operations. In the following, one can see that each elementary row operation can be expressed as a nonsingular matrix, called an elementary matrix, so that the process of Gaussian elimination is the same as multiplying a finite number of corresponding elementary matrices to the augmented matrix. Definition 1.13 An elementary matrix is a matrix obtained from the identity matrix In by executing only one elementary row operation.

24

Chapter 1. Linear Equations and Matrices

For example, the following matrices are three elementary matrices corresponding to each type of the three elementary row operations:

(1St kind) (2nd kind)

[~ _~] : the second row of h is multiplied by -5;

[~o ~0 ~1 0~]: theares~cond and the fourth rows of 14 interchanged; o

(3 rd kind)

I 0 0

[bo ~ ~]: 0 I

3 times the third row is added to the first row of h-

It is an interesting fact that, if E is an elementary matrix obtained by executing a certain elementary row operation on the identity matrix 1m , then for any m x n matrix A, the product EA is exactly the matrix that is obtained when the same elementary row operation in E is executed on A. The following example illustrates this argument. (Note that AE is not what we want. For this, see Problem 1.18.) Example 1.13 (Elementary operation by an elementary matrix) Let b = [bl bz b3]T be a 3 x 1 column matrix. Suppose that we want to execute a third kind of elementary operation 'adding (-2) x the first row to the second row' on the matrix b. First, we execute this operation on the identity matrix h to get an elementary matrix E:

E =

[

100] .

-2 1 0 001

Multiplying this elementary matrix E to b on the left produces the desired result:

Similarly, the second kind of elementary operation 'interchanging the first and the third rows' on the matrix b can be achieved by multiplying an elementary matrix P obtained from h by interchanging the two rows, to b on the left:

o Recall that each elementary row operation has an inverse operation, which is also an elementary operation, that brings the elementary matrix back to the original identity matrix. In other words, if E denotes an elementary matrix and if E' denotes the elementary matrix corresponding to the 'inverse' elementary row operation of E, then E' E = 1, because

1.7. Elementary matrices and finding A -I

25

(1) if E multiplies a row by c =f. 0, then E' multiplies the same row by ~; (2) if E interchanges two rows, then E' interchanges them again; (3) if E adds a multiple of one row to another, then E' subtracts it from the same row.

= =

I E E' : every elementary matrix is invertible Furthermore, one can say that E' E and inverse matrix E- 1 = E' is also an elementary matrix.

Example 1.14 (Inverse ofan elementary matrix) If EI =

0 ] , E2 = [ 1 01 0c O 0 0 10 0 ] , E 3 = [01 [ 001 301 0

then

o Definition 1.14 A permutation matrix is a square matrix obtained from the identity matrix by permuting the rows. In Example 1.14, E3 is a permutation matrix, but E2 is not. Problem 1.17 Prove: (1) A permutation matrix is the product of a finite number of elementary matrices each of which corresponds to the 'row-interchanging' elementary row operation. (2) Every permutation matrix P is invertible and p - 1 pT . (3) The product of any two permutation matrices is a permutation matrix. (4) The transpose of a permutation matrix is also a permutation matrix.

=

Problem 1.18 Define the elementary column operations for a matrix by just replacing 'row ' by 'column' in the definition of the elementary row operations . Show that if A is an m x n matrix and if E is a matrix obtained by executing an elementary column operation on In, then AE is exactly the matrix that is obtained from A when the same column operation is executed on A. In particular, if D is an n x n diagonal matrix with diagonal entries d I, d2, . . . , dn, then AD is obtained by multiplication by dl, d2, ... , dn of the columns of A, while DA is obtained by multiplication by dl' d2' . . . , dn of the rows of A .

The next theorem establishes some fundamental relations between n xn square matrices and systems of n linear equations in n unknowns. Theorem 1.9 Let A be an n x n matrix. The following are equivalent: (1) A has a left inverse; (2) Ax = 0 has only the trivial solution x = 0;

26 (3) (4) (5) (6)

Chapter 1. Linear Equations and Matrices A A A A

is row-equivalent to In ; is a product of elementary matrices; is invertible; has a right inverse.

Proof: (1) =} (2): Let x be a solution of the homogeneous system Ax = 0, and let B be a left inverse of A. Then

x = Inx

= (BA)x = B(Ax) = BO = O.

(2) =} (3): Suppose that the homogeneous system Ax = 0 has only the trivial solution x 0:

=

I

XI

X2 Xn

= =

0 0

=

O.

This means that the augmented matrix [A 0] of the system Ax = 0 is reduced to the system [In 0] by Gauss-Jordan elimination. Hence, A is row-equivalent to In. (3) =} (4): Assume A is row-equivalent to In, so that A can be reduced to In by a finite sequence of elementary row operations . Thus, one can find elementary matrices EI, E2, "" e, such that Ek .. . E2EIA = In. By multiplying successively both sides of this equation by E 1, E)I on

s;', ..., 2

the left, we obtain A = E I- I E2- I

'"

E-II k

n=

E-IE- I I

2'"

E- I

k '

which expresses A as the product of elementary matrices . (4) =} (5) is trivial, because any elementary matrix is invertible. In fact, A-I = Ek " · E2E l . (5) =} (1) and (5) =} (6) are trivial. (6) =} (5): If B is a right inverse of A , then A is a left inverse of B and one can apply (1) =} (2) =} (3) =} (4) =} (5) to B and conclude that B is invertible, with A as its unique inverse, by Lemma 1.7. That is, B is the inverse of A and so A is invertible.

o

If a triangular matrix A has a zero diagonal entry, then the system Ax = 0 has at least one free variable, so that it has infinitely many solutions . Hence, one can have the following corollary. Corollary 1.10 A triangular matrix is invertible if and only entry.

if it has no zero diagonal

From Theorem 1.9, one can see that a square matrix is invertible if it has a onesided inverse. In particular, if a square matrix A is invertible, then x = A -I b is a unique solution to the system Ax = b.

1.7. Elementary matrices and finding A -I

27

Problem 1.19 Find the inverse of the product

[~

o

~ ~] [ ~ ~ ~] [-~ ~ ~1]'

-c 1

-b 0 1

0 0

As an application of Theorem 1.9, one can find a practical method for finding the inverse A-I of an invertible nxn matrix A. If A is invertible, then A is row equivalent to In and so there are elementary matrices EI, Ez, .. . , Ek such that Ek'" EzEIA = In. Hence, A-I

= Ek '"

EzEI

= Ek'"

EzElln .

This means that A-I can be obtained by performing on In the same sequence ofthe elementary row operations that reduces A to In . Practically, one first constructs an n x 2n augmented matrix [A I In] and then performs a Gaussian-Jordan elimination that reduces A to In on [A I In] to get [In I A -I]: that is, [A I In] -+ [Ee'" EIA I Ee'" Ell] = [U I K] -+

[Fk'" FlU I Fk'" FIK] = [I I A-I],

where Ee .. . EI represents a Gaussian elimination that reduces A to a row-echelon form U and Fk .. . FI represents the back substitution. The following example illustrates the computation of an inverse matrix.

Example 1.15 (Computing A -I by Gauss-Jordan elimination) Find the inverse of A=

[I 23] 235 102

.

Solution: Apply Gauss-Jordan elimination to [A I I]

=

2 3 3 5 0 2

U

U U U

2 3 -1 -1 -2 -1

-+

-+

-+

100]

I I 0 1 0 I a a 1

2 3 1 1 -2 -1

2 3 1 1 a 1

1 0 2 -1 3 -2

n n

I a -2 1 -1 a

1 0 2 -1 -1 a

(-2)row 1 + row 2 (-1 )row I + row 3

n

(-l)row 2

(2)row 2 + row 3

28

Chapter 1. Linear Equations and Matrices

This is [U I K] obtained by Gaussian elimination. Now continue the back substitution to reduce [U I K] to [I I A -I] .

[U

1 00]

2 3 1 1 0 1

U U U

I K] =

-+

-+

2 0 1 0 0 1

-8

0 0 1 0 0 1

-6

6

(-2)row 2 + row 1

-3 ]

-1 1 - 1 3 -2 1

4

-1 1 3 -2

Thus, we get A-I

(-l)row 3 + row 2 (-3)row 3 + row 1

2 -1 0 3 -2 1

=

-i-1 ] =

[IIA-

I].

[-6 4-1 ] -1 1 -1 3 -2 1

(The reader should verify that AA- I = I

.

= A-I A .)

0

Note that if A is not invertible, then , at some step in Gaussian elimination, a ]For example, the matrix zero r[ow ~il~ ShO~ ]u p on the left-hand Sid[e ~n [U61

Kl

A=

2 4 -1 -1 2 5 and is not invertible.

is row-equivalent to

0 -8 -9 0 0 0

, which has a zero row

By Theorem 1.9, a square matrix A is invertible if and only if Ax = 0 has only the trivial solution. That is, a square matrix A is noninvertible if and only if Ax 0 has a nontrivial solution, say XC. Now, for any column vector b [bl '" bn]T , if XI is a solution of Ax = b for a noninvertible matrix A, so is kxo + xi for any k, since

=

A(kxo

+ Xl) = k(Axo) +

AXI

=

= kO + b = b.

This argument strengthens Theorem 1.6 as follows when A is a square matrix:

Theorem 1.11 If A is an invertible n x n matrix. then for any column vector b = [bl .. . bnf. the system Ax = b has exactly one solution x = A-lb. If A is not invertible, then the system has either no solution or infinitely many solutions according to the consistency of the system. Problem 1.20 Express A-I as a product of elementary matrices for A given in Example 1.15.

1.8. LDU factorization

Problem 1.21 When is a diagonal matrix D

=

[

dO l ...

29

0 ] . nonsingular, and what is dn

Problem 1.22 Write the system of linear equations

I

x + 2x 4x -

2y 2y 3y

+ + +

2z 3z 5z

= =

10 1 4

in matrix form Ax = b and solve it by finding A-I b.

Problem 1.23 True or false: If the matrix A has a left inverse C so that C A is a solution of the system Ax

= b . Justify your answer.

= In, then x = Cb

1.8 WU factorization In this section, we show that the forward elimination for solving a system of linear equations Ax = b can be expressed by some invertible lower triangular matrix, so that the matrix A can be factored as a product of two or more triangular matrices. We first assume that no permutations of rows (2nd kind of operation) are necessary throughout the whole process of forward elimination on the augmented matrix [A b]. Then the forward elimination is just multiplications of the augmented matrix [A b] by finitely many elementary matrices Ek, .. . , E 1: that is,

where each E, is a lower triangular elementary matrix whose diagonal entries are all l's and [U y] is a row-echelon form of [A b] without divisions of the rows by the pivots. (Note that if A is a square matrix, then U must be an upper triangular matrix). Therefore, if we set L = (Ek' " E1)-1 = Ell . . . Ei: 1, then we have A = LU, where L is a lowertriangularmatrix whosediagonalentriesareall l's. (In fact, each l is also a lower triangular matrix, and a product of lower triangular matrices is also lower triangular (see Problem 1.25). Such factorization A = LU is called an LU factorization or an LU decomposition of A. For example,

Ei

where di 'S are the pivots. Now, let A LU be an LU factorization. Then the system Ax as LUx = b. Let Ux = y. Thus, the system

=

= b can be written

30

Chapter 1. Linear Equations and Matrices

Ax= LUx=b can be solved by the following two steps: Step 1 Solve Ly = b for y. Step 2 Solve Ux = y by back substitution. The following example illustrates the convenience of an LU factorization of a matrix A for solving the system Ax = b. Example 1.16 Solve the system oflinear equations

Ax =

2 1 1 4 1 0

[ -2 2 1

by using an LU factorization of A. Solution: The elementary matrices for the forward elimination on the augmented matrix [A b] are easily found to be

0]

o1 0 , 3 1 so that .:

o

.,

-4

0 41 ]

= U.

Thus, if we set

L = Ell E:;1 E3"1

=[ ;

~ ~],

-1 -3 1

which is a lower triangular matrix with 1 's on the diagonal, then

A = LU = [

; -1

10] -2 1 . ~ ~] [~ -~ -3 1 0 0 -4 4

Now, the system

Ly =b :

Y2 3Y2

+

= = Y3 =

1

-2 7

can be easily solved inductively to get y = (1, -4, - 4) and the system

1.8. WU factorization

= =

Ux=y:

=

31

1

-4 -4

also can be solved by back substitution to get

o

for t E JR, which is the solution for the original system.

As shown in Example 1.16, it is a simple computation to solve the systems Ly = b and Ux y, because the matrix L is lower triangular and the matrix U is the matrix obtained from A after forward elimination so that most entries of the lower-left side are zero.

=

Remark: For a system Ax = b, the Gaussian elimination may be described as an LU factorization of a matrix A. Let us assume that one needs to solve several systems of linear equations Ax = bi for i = 1, 2, . . . , l with the same coefficient matrix A. Instead ofperforming the Gaussian elimination process l times to solve these systems, one can use an LU factorization of A and can solve first Ly = b, for i = 1,2, ... , l to get solutions Yi and then the solutions of Ax = bi are just those of Ux = Yi . From an algorithmic point of view, the suggested method based on the LU factorization of A is much efficient than doing the Gaussian elimination repeatedly, in particular, when l is large . Problem 1.24 Determine an LU factorization of the matrix A from which solve Ax

=

[

1 -1

0 ]

o

2

-1

2 -1

-1

,

= b for (1) b = [1 1 If and (2) b = [2 0

- l]T .

Problem 1.25 Let A and B be two lower triangular matrices. Prove that (1) their product AB is also a lower triangular matrix; (2) if A is invertible , then its inverse is also a lower triangular matrix ; (3) if the diagonal entries of A and B are all 1 's, then the same holds for their product AB and their inverses. Note that the same holds for upper triangular matrices, and for the product of more than two matrices .

The matrix U in the decomposition A = LU of A can further be factored as the product U = DO, where D is a diagonal matrix whose diagonal entries are the pivots

32

Chapter 1. Linear Equations and Matrices

of U or zeros and U is a row-echelon form of A with leading 1's, so that A = LDU. For example,

u n[~l * *] u n[~l n[~ 0 0 1 0 * 1

A

* * *

0 dz * * * 0 0 0 d3 * 0 0 0 o 0

* *

=

0 0 1 0 * 1

0 dz 0 0

* *

=

0 0 d3 0

= LU

* * * dI dI 0 1 .!. * d2 * d2 0 0 0 1 * d3

* .]

0

0 0

0 0

LDU.

For notational convention, we replace U again by U and write A = LDU. This decomposition of A is called an LDU factorization or an LDU decomposition of

A.

For example, the matrix A in Example 1.16 was factored as

A

=

1 ° 0] [2 1-210] 1 = LU . [ -12 -31 01 0-1 0 0 -4 4

It can be further factored as A

= LDU by taking

1/2 1/2 °] 2 1 10] [200][1 ° ° [ o -1 -2 1 00-44

=

0 -1 0 0 -4

0 0

1

2

-1

1

-1

= DU.

The LDU factorization of a matrix A is always possible when no row interchange is needed in the forward elimination process . In general, if a permutation matrix for a row interchange is necessary in the forward elimination process , then an LDU factorization may not be possible. Example 1.17 (The LDU factorization cannot exist)

[~

Consider a matrix A

=

b] ' For forward elimination, it is necessary to interchange the first row with

the second row. Without this interchange, A has no LU or LDU factorization . In fact, one can show that it cannot be expressed as a product of any lower triangular matrix L and any upper triangular matrix U. 0 Suppose now that a row interchange is necessary during the forward elimination on the augmented matrix [A b]. In this case, one can first do all the row interchanges before doing any other type of elementary row operations, since the interchange of rows can be done at any time, before or after the other elementary operations, with the same effect on the solution . Those 'row-interchanging' elementary matrices altogether form a permutation matrix P so that no more row interchanges are needed during the forward elimination on PA. Now, the matrix PA can have an LDU factorization.

e

1.8. LDU factorization

E[Xrr t]18;:, :a:~:::::::~::o: )n::'::~::t:~:: 102

:::::ro:

with the third row, that is, we need to multiply A by the permutation matrix P

[~ ~

33

=

~] so that

100

PA

= [~

~ ~] = [~011 ~ ~] [~0~0~]2 0 [~0~ ~1] = LDU.

012

Note that U is a row-echelon form of the matrix A.

0

Of course, if we choose a different permutation p i, then the L DU factorization of p iA may be different from that of PA, even if there is another permutation matrix P" that changes p i A to P A. Moreo ver, as the following example shows, even if a permutation matrix is not necessary in the Gaussian elimination , the LDU factorization of A need not be unique.

Example1.19 (Infinitely many LDU factorizations) The matrix

B=

110] 1 3 0

[ 000

has the LDU factorization B=

[

1 0 1 1

o

~ ] [~ ~ ~] [~000 ~ ~] =

0 1

OO x

LDU

for any value x . It shows that a singular matrix B has infinitely many LDU factorizations . 0 However, if the matrix A is invertible and if the permutation matrix P is fixed when it is necessary, then the matrix PA has a unique LDU factorization.

Theorem 1.12 Let A be an invertible matrix. Thenfor a fixed suitable permutation matrix P the matrix PA has a unique LDU factorization. Proof: Suppose that PA = LjD j U , = L zDzUz , wheretheL 's are lower triangular, the U's are upper triangular whose diagonals are all l's, and the D's are diagonal matrices with no zeros on the diagonal. One needs to show L , = Lz, D; = Dz , and U, = Uz for the uniqueness . Note that the inverse of a lower triangular matrix is also lower triangular, and the inverse of an upper triangular matrix is also upper triangular. And the inverse of a

34

Chapter 1. Linear Equations and Matrices

diagonal matrix is also diagonal. Therefore, by multiplying (LIDI)-I = D I I L I I on the left and V:;I on the right, the equation LI DI VI = L2D2V2 becomes VIV:;I = DIILIIL2D2 '

The left-hand side is upper triangular, while the right-hand side is lower triangular. Hence, both sides must be diagonal. However, since the diagonal entries of the upper triangular matrix VI V:;I are all 1's, it must be the identity matrix I (see Problem 1.25). Thus VI V:; 1 = I, i.e., VI = V2. Similarly, L I I L2 = DID:;I implies that LI = L2 and DI = D2. 0 In particular, if an invertiblematrix A is symmetric (i.e., A = A T) , and if it can be factored into A = LDV without row interchanges,then we have

and thus, by the uniqueness of factorizations, we have V Problem 1.26 FUuI the factors L , D , .nd U for A What is the solution to Ax

= b for b = [1 0

=[

-!

= L T and A = L DL T •

-1 0 ] 2 -1 .

-1

2

- If?

probl[em/2; ~o]r all possible permutation matrices P, find the LDU factorization of PA for A=

2 4 2 111

.

1.9 Applications 1.9.1 Cryptography

Cryptographyis the study of sendingmessagesin disguisedform (secretcodes) so that only the intended recipients can remove the disguise and read the message; modem cryptography uses advanced mathematics. As an application of invertible matrices, we introduce a simple coding. Suppose we associate a prescribed number with every letter in the alphabet; for example, ABC D

t

°

t

t

t

1 2

3

x t

Y

Z Blank

?

t

t

t

23 24 25

t

26

t

27 28.

Suppose that we wantto send the message"GOOD LUCK." Replacethis message by 6, 14, 14, 3, 26, 11, 20, 2, 10

1.9.1. Application: Cryptography

35

according to the preceding substitution scheme. To use a matrix technique, we first break the message into three vectors in ]R3 each with three components, by adding extra blanks if necessary:

Next, choose a nonsingular 3 x 3 matrix A, say

A=

I 00] [2 I 0 , I

I

I

which is supposed to be known to both sender and receiver. Then, as a matrix multiplication, A translates our message into

By putting the components of the resulting vectors consecutively, we transmit

6, 26, 34, 3, 32, 40, 20, 42, 32. To decode this message, the receiver may follow the following process . Suppose that we received the following reply from our correspondent:

19, 45, 26, 13, 36, 41. To decode it, first break the message into two vectors in ]R3 as before:

We want to find two vectors vectors : i.e., AX!

un ~ un

XI , X2

~

such that

AXi

is the i-th vector of the above two

Ax,

Since A is invertible, the vectors XI , X2 can be found by multiplying the inverse of A to the two vectors given in the message. By an easy computation, one can find A-I =

I 00] [ -2 I 0 . 1 -I

Therefore,

I

36

Chapter 1. Linear Equations and Matrices

XI

=[

19 ] -21 01 00] [ 45 26 1 -1 1

= [197 ]

°

,

The numbers one obtains are

19, 7, 0, 13, 10, 18. Using our correspondence between letters and numbers, the message we have received is "THANKS."

Problem 1.28 Encode ''TAKEUFO" using the same matrix A used in the above example.

1.9.2 Electrical network In an electrical network, a simple current flow may be illustrated by a diagram like the one below. Such a network involves only voltage sources , like batteries, and resistors, like bulbs , motors, or refrigerators. The voltage is measured in volts, the resistance in ohms, and the current flow in amperes (amps, in short). For such an electrical network, current flow is governed by the following three laws: Ohm's Law: The voltage drop V across a resistor is the product of the current I and the resistance R: V = I R. Kirchhoff's Current Law (KCL): The current flow into a node equals the current flow out of the node. Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage drops around a closed loop equals the total voltage sources in the loop . Example 1.20 Determine the currents in the network given in Figure 104.

2 ohms

P

2 ohms

18 volts

Figure 1.4. A circuit network

1.9.3. Application: Leontief model 40 ohms

20 volts

37

1 ohm

30 ohms

5 volts

40 ohms

4 volts

40 volts

(2)

(1)

Figure 1.5. Two circuit networks

Solution: By applying KCL to nodes P and Q, we get equations

h + h =

at P,

12

at

12 h + ls

=

Q.

Observe that both equations are the same, and one of them is redundant. By applying KVL to each of the loops in the network clockwise direction, we get

+ 212 212 + 3h

6h

=

0 from the left loop,

=

18 from the right loop.

Collecting all the equations, we get a system of linear equations:

j

h 6h

212

h +

+ 212

h =

+ st,

=

O 0 18.

By solving it, the currents are h = -1 amp, 12 = 3 amps and /3 = 4 amps. The negative sign for h means that the current h flows in the direction opposite to that shown in the figure. 0

Problem 1.29 Determine the currents in the networks given in Figure 1.5.

1.9.3 Leontiefmodel Another significant application of linear algebra is to a mathematical model in economics. In most nations, an economic society may be divided into many sectors that produce goods or services, such as the automobile industry, oil industry, steelindustry,

38

Chapter 1. Linear Equations and Matrices

communication industry, and so on. Then a fundamental problem in economics is to find the equilibrium of the supply and the demand in the economy. There are two kind of demands for the goods: the intermediate demand from the industries themselves (or the sectors) that are needed as inputs for their own production, and the extra demand from the consumer, the governmental use, surplus production, or exports. Practically, the interrelation between the sectors is very complicated, and the connection between the extra demand and the production is unclear. A natural question is whether there is a production level such that the total amounts produced (or supply) will exactly balance the total demandfor the production, so that the equality {Total output} =

=

{Total demand}

{Intermediate demand} + {Extra demand}

holds. This problem can be described by a system of linear equations, which is called the LeontiefInput-Output Model . To illustrate this, we show a simple example. Suppose that a nation's economy consists of three sectors: It = automobile industry, h == steel industry, and h = oil industry. Let x [Xl Xzx3f denote the production vector (or production level) in ]R3 , where each entry Xi denotes the total amount (in a common unit such as 'dollars' rather than quantities such as 'tons' or 'gallons') of the output that the industry Ii produces per year. The intermediate demand may be explained as follows. Suppose that, for the total output Xz units of the steel industry h 20% is contributed by the output of It. 40% by that of hand 20% by that of h Then we can write this as a column vector, called a unit consumption vector of h :

=

Cz =

0.2 ] 0.4 . [ 0.2

For example , if h decides to produce 100 units per year, then it will order (or demand) 20 units from It , 40 units from hand 20 units from h: i.e., the consumption vector of h for the production Xz = 100 units can be written as a column vector: 100cz = [2040 20]T. From the concept of the consumption vector, it is clear that the sum of decimal fractions in the column cz must be ::: 1. In our example, suppose that the demands (inputs) of the outputs are given by the following matrix , called an input-output matrix:

It

A = input

t, h

h

[0.3 0.1 0.3

t c\

output h 0.2 0.4 0.2

t cz

h

03 ]

0.1 0.3

t C3

.

1.9.3. Application: Leontiefmodel

39

In this matrix, an industry looks down a column to see how much it needs from where to produce its total output, and it looks across a row to see how much of its output goes to where. For example, the second row says that, out of the total output Xz units of the steel industry lz. as the intermediate demand, the automobile industry h demands 10% of the output Xl, the steel industry /z demands 40% of the output Xz and the oil industry h demands 10% of the output X3. Therefore , it is now easy to see that the intermediate demand of the economy can be written as Ax =

[ 0.3Xl + 0.2xz + 0.3X3 ] 0.3 0.2 0.3] [ Xl] 0.1 0.4 0.1 Xz = O.lXl + O.4xz + 0.lX3 . [ 0.3 0.2 0.3 X3 0 3Xl + 0.2xz + 0.3X3

=

Suppose that the extra demand in our example is given by d = [dl, di, d3f [30,20, ioi" , Then the problem for this economy is to find the production vector x satisfying the following equation:

x = Ax+d. Another form of the equation is (l - A)x = d, where the matrix I - A is called the Leontief matrix. If I - A is not invertible, then the equation may have no solution or infinitely many solutions depending on what d is. If I - A is invertible, then the equation has the unique solution x = (l - A)-ld. Now, our example can be written as

[]

;~ ] = [~:i ~:~ ~:i] ;~X3 0.3 0.2 0.3

[ X3

+[

;~10 ] .

In this example, it turns out that the matrix I - A is invertible and (l - A)-l =

2.0 1.0 1.0] 0.5 2.0 0.5 . [ 1.0 1.0 2.0

Therefore,

x

~

(l - AJ-1d

= [

~

l

which gives the total amount of product Xi of the industry I i for one year to meet the required demand . Remark: (1) Under the usual circumstances, the sum ofthe entries in a column of the consumption matrix A is less than one because a sector should require less than one units worth of inputs to produce one unit of output. This actually implies that I - A is invertible and the production vector x is feasible in the sense that the entries in x are all nonnegative as the following argument shows. (2) In general, by using induction one can easily verify that for any k = 1, 2, . . . ,

Chapter 1. Linear Equations and Matrices

40

If the sums of column entries of A are all strictly less than one, then limk-+oo A k = 0 (see Section 6.4 for the limit of a sequence of matrices). Thus, we get (l - A)(l + A + ... + A k + ...) = I, that is, (l - A)-I = I

+ A + ...+

Ak

+ ... .

This also shows a practicalway of computing (l - A)-I sinceby takingk sufficiently large the right-handside may be made veryclose to (l- A)-I. In Chapter6, an easier method of computing A k will be shown. In summary, if A and d have nonnegative entries and if the sum of the entries of each column of A is less than one, then I - A is invertibleand the inverseis given as the above formula. Moreover, as the formula shows the entries of the inverse are all nonnegative, and so are those of the production vectorx = (l - A)-Id. Problem 1.30 Determine the total demand for industries It, 12 and 13 for the input-output matrix A and the extra demand vector d given below:

A=

0.1 0.7 0.2] 0.5 0.1 0.6 with d [ 0.4 0.2 0.2

= O.

= services, h = manufacturing industries, and h agriculture . For each unit of output, II demands no services from 1], 0.4 units from lz. and 0.5 units from 13. For each unit of output. lz requires 0.1 units from sector II of services. 0.7 units from other parts in sector lz. and no product from sector 13. For each unit of output. 13 demands 0.8 units of services 110 0.1 units of manufacturing products from lz. and 0.1 units of its own output from lJ. Determine the production level to balance the economy when 90 units of services, 10 units of manufacturing. and 30 units of agriculture are required as the extra demand. Problem 1.31 Suppose that an economy is divided into three sectors : II

=

1.10 Exercises 1.1. Which of the following matrices are in row-echelon form or in reduced row-echelon form?

A~U

0 0 0 -3 ] 0 1 0 4 • 2 0 0 1

c~u E=[l

0 0 0 0

0 1 0 0

0 2 1 0

1 0 0 0 1 0 1 0 -2

h[l n-

-n D=[l nF=[l

0 O 0 0

0 1 0 0

0 0 1 0

1 0 0 0 1 1 0 0 1

0 1 0 0

0 0 0 0

0 0 1 0

~l

-H

1.10. Exercises

41

1.2. Find a row-echelon form of each matrix.

[I -3 212] 3 -9 10 2 9 2 -6 4 2 4 2 -6 8 1 7

(1)

(2)

'

[~

2 3 4 3 4 5 4 5 1 5 1 2 1 2 3

1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.

H

1.4. Solve the systems of equations by Gauss-Jordan elimination. (1)

3XI Xl

I~;

(2)

:

12;~ ~ ;~ + +

2X2 X2

;~

2x

;~ +;:

X3 3X3

+ + +

-~ I

X4 3X4

-8 .

1~

z

1.

4z

What are the pivots in each of 3rd kind elementary operations?

1.5. Which of the following systems has a nontrivial solution?

I

X

(1)

+

2y 2y 2y

+ + +

3z 2z 3z

=

0 12x + 0 (2) x O. 3x +

= =

y 2y y -

z

3z 2z

= =

0 0 O.

x + 1.6. Determine all values of the b, that make the following system consistent:

I

x +

y - z = bl 2y + z = b2 Y-Z=b3·

1.7. Determine the condition on b; so that the following system has no solution :

1

+ -

Y 2y

2x -

y

2x 6x

+ + +

7z l lz 3z

bl b2

b3.

1.8. Let A and B be matrices of the same size. (I) Show that, if Ax (2) Show that, if Ax

= 0 for all x, then A is the zero matrix. = Bx for all x, then A = B.

1.9. Compute ABC and CAB for

A~[i -~ :j. B~[jl C~[I 1.10. Prove that if A is a 3 x 3 matrix such that AB A

= clJ for some constant c.

1.11. Let A

= [~

= BA for every 3 x 3 matrix B , then

~ ~]. Find Ak for all integers k.

001

-I]

42

Chapter 1. Linear Equations and Matrices

1.12. Compute (2A - B)C and CC T for

A

= [~

~ ~] ,

B

I 0 I

= [-;

~ ~] ,

C

0 0 1

=[

~ ~ ~] .

-2 2 I

=

1.13. Let f(x) anxn + an_IX n- 1 + ...+ alx + ao be a polynomial. For any square matrix A, a matrix polynomial f (A) is defined as

n

f(A)

For f(x)

=

(I)A~[

3x 3

+

-i ~

= anAn + an_lAn-I + ...+ + 3, find f(A) for

alA

+ aol.

x 2 - 2x

(2)A~U -~-n

1.14. Find the symmetric part and the skew-symmetric part of each of the following matrices . (I) A

~] ,

=[ ; ;

-1 3 2

=

(2) A

1.15. Find AA T and A T A for the matrix A

[~

;

0 0

= [; 2

1.16. Let A -I

= [~

~ ~] .

421

3

8 4 0

U:J.

(I) Flod. matrix B such that AB

~

(2) Find a matrix C such that AC

= A2 +

A.

1.17. Find all possible choices of a, band c so that A that A-I

-1]. ~ ~ i].

= [; ~] has an inverse matrix such

= A.

1.18. Decide whether or not each of the following matrices is invertible. Find the inverses for invertible ones.

A=[~o ~ ~ :], 0 0 4

; -;].

B= [ 011I] 2 3 , 5 5 I

2

I

1.19. Find the inverse of each of the following matrices:

A=[=~ -~ ;]'B=[~ ~ ~ ~],c=[l ~ ~ ~](k=l=O)' 6411

1248

OOlk

1.20. Suppose A is a 2 x I matrix and B is a I x 2 matrix. Prove that the product AB is not invertible. 1.21. Find three matrices which are row equivalent to A

~ [~

-

i

3

4 ]

2

-I

-3

4

.

1.10. Exercises 1.22. Write the following systems of equations as matrix equations Ax computing A -I b:

+

I

=

= b and solve them by

2 XI X2 + X3 5 (2) XI + X2 X3 2x1 + X2 - 2X3 7, 4xI 3X2 + 2x3 1.23. Find the LDU factorization for each of the following matrices: X2 X2 -

2x1 -

(1)

1

(1) A =

[~ ~

3X3 4X3

= =

J.

(2) A =

U~ H

[~ ~

43

= =

5

-1 -3.

l

1.24. Find the LDL T factorization of the following symmetric matrices : (1) A

~

1.25. Solve Ax

L

(2) A

~

n

[:

= b with A = LU , where Land U are given as

= [ .: o

~ ~], I

-1

U

= [~

-~ -~] ,

n b~Ul

b

= [ -; ] .

0 0 1 4 Forward elimination is the same as Lc = b, and back-substitution is Ux = c.

1~ l 3. If so, that reader may omit reading its continued proof below and rather concentrate on understanding the explicit formula of det A in Theorem 2.6. For matrices of higher order n > 3: Again , we repeat the same procedure as for the 3 x 3 matrices to get an explicit formula for det A of any square matrix A = [aij] of order n. (Step 1) Just as the case for n = 3, use the multilinearity in each row of A to get det A as the sum of the determinants of nn matrices. Notice that each one of n" matrices has exactly n entries from A, one in each of n rows . However, if any two of the n entries from A are in the same column, then it must have a zero column, so that its determinant is zero and it can be neglected in the summation. Now, in each remaining matrix, the n entries from A must be in different columns: that is, no two of the n entries from A are in the same row or in the same column . (Step 2) Now, we aim to estimate how many of them remain. From the observation in Step 1, in each of the remaining matrices, those n entries from A are of the form

with some column indices i , i, k, .. . , l. Since no two of these n entries are in the same column, the column indices i , l, k, .. . , £ are just a rearrangement of I, 2, .. . , n without repetition or omissions. It is not hard to see that there are just n! ways of such rearrangements, so that n! matrices remain for further consideration. (Here n! = n(n - 1) .. ·2· I, called n factorial.)

2.2. Existence and uniquenessof the determinant

53

Remark: In fact, the n! remaining matrices can be constructed from the matrix A = [aij] as follows: First , choose anyone entry from the first row of A, say ali, in the i -th column. Then all the other n - I entries a2j, a3k. .. . , ani should be taken from the columns different from the i -th column. That is, they should be chosen from the submatrix of A obtained by deleting the row and the column containing ali . If the second entry a2j is taken from the second row, then the third entry a3k should be taken from the submatrix of A obtained by deleting the two rows and the two columns containing ali and a 2j, and so on. Finally, if the first n - 1 entries

are chosen, then there is no alternative choice for the last one an i since it is the one left after deleting n - 1 rows and n - I columns from A.

(Step 3) We now compute the determinant of each of the n! remaining matrices. Since each of those matrices has just n entries from A so that no two of them are in the same row or in the same column, one can convert it into a diagonal matrix by ' suitable ' column interchanges. Then the determinant will be just the product of the n entries from A with '±' sign , which will be determined by the number (actually the parity) of the column interchanges. To determine the sign , let us once again look back to the case of n = 3. Example 2.3 (Convert intoa diagonal matrixby column interchanges) Suppose that one of the six matrices is of the form:

0 [

o

0

a22

a31

a 13]

0

0

.

0

Then , one can convert this matrix into a diagonal matrix by interchanging the first and the third columns. That is, det

0 [

0

a 22

0

a13 ]

a31

0

0

0

= - det

[ a13

0

0

Note that a column interchange is the same as an interchange of the corresponding column indices. Moreover, in each diagonal entry of a matrix, the row index must be the same as its column index . Hence, to convert such a matrix into a diagonal matrix, one has to convert the given arrangement of the column indices (in the example we have 3, 2, 1) to the standard order 1, 2, 3 to be matched with the arrangement 1,2,3 of the row indices. In this case, there may be several ways of column interchanges to convert the given matrix to a diagonal matrix. For example, to convert the given arrangement 3, 2, 1 of the column indices to the standard order 1, 2, 3, one can take either just one interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1. In either case, the parity is odd so that the "-" sign in the computation of the determinant came from (-1 ) I = (_ 1)3, where the exponents mean the numbers of interchanges of the column indices. 0

54

Chapter 2. Determinants

To formalize our discussions, we introduce a mathematical terminology for a rearrangement of n objects: Definition 2.3 A permutation of n objects is a one-to-one function from the set of

n objects onto itself.

=

In most cases, we use the set of integers Nn {I, 2, .. . , n} for a set of n objects. A permutation a of Nn assigns a numbera(i) in Nn to each number i in Nn , and this permutation a is usually denoted by a = (a(I), a(2), .. . , a(n») =

(a~l) a~2)

:: :

a~n») '

Here , the first row is the usual lay-out of Nn as the domain set, and the second row is the image set showing an arrangement of the numbers in Nn in a certain order without repetitions or omissions. If Sn denotes the set of all permutations of Nn, then, as mentioned previously, one can see that Sn has exactly n! permutations. For example, S2 has 2 = 2!, S3 has 6 = 3!, and S4 has 24 = 4! permutations. Definition 2.4 A permutation a = (iI, h, .. . , jn) is said to have an inversion if > jt for s < t (i.e., a larger number precedes a smaller number).

i,

For example, the permutation a = (3,1,5,4,2) has five inversions, since 3 precedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the identity (1, 2, . .. , n) is the only one without inversions . Definition 2.5 A permutation is said to be even if it has an even number of inversions, and it is said to be odd if it has an odd number of inversions. For a permutation a in Sn, the sign of a is defined as _ { 1 if a is an even permutation } _ (_I)k sgn (a ) -1 if a is an odd permutation , where k is the number of inversions of a. For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2) are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd. (a(1), a(2), . . . , a(n») in Sn into In general, one can convert a permutation a the identity permutation (1, 2, .. . , n) by transposing each inversion of a . However, the number of necessary transpositions to convert the given permutation into the identity permutation need not be unique as shown in Example 2.3. An interesting fact is that, even though the number of necessary transpositions is not unique, the parity (even or odd) is always the same as that of the number of inversions. (This may not be clear and the readers are suggested to convince themselves with a couple of examples.)

=

We now go back to Step 3 to compute the determinants of those remaining n! matrices. Each of them has n entries of the form al 17 ( I ) , a2u(2), . . . , an17 (n)

2.2. Existence and uniqueness of the determinant

55

for a permutation 0' E Sn. Moreover, this can be converted into a diagonal matrix by column interchanges corresponding to the inversions in the permutation 0' = (O'(l), 0'(2), . . . , O'(n)}. Hence, its determinant is equal to sgn(0')alu(1)a2a(2) . . . anu(n)' This is called a signed elementary product of A. Our discussions can be summarized as follows to get an explicit formula for det A:

Theorem 2.6 For an n x n matrix A, det A =

L

sgn(0')al u(1)a2a(2) . . . anu(n) '

UES n

That is, det A is the sum ofall signed elementary products of A. This shows that the determinant must be unique if it exists. On the other hand, one can show that the explicit formula for det A in Theorem 2.6 satisfies the three rules (Rl)-(R3). Therefore, we have both existence and uniqueness for the determinant function of square matrices of any order n 2: 1, which proves Theorem 2.4. As the last part of this section , we add an example to demonstrate that any permutation 0' can be converted into the identity permutation by the same number of transpositions as the number of inversions in 0'.

Example 2.4 (Convert into the identity permutation by transpositions) Consider a permutation 0' = (3,1 ,5,4, 2) in Ss. It has five inversions, and it can be converted to the identity permutation by composing five transpositions successively: 0'

=

{3, 1,5,4, 2} -+ {I, 3, 5, 4, 2} -+ {I, 3, 5, 2, 4} -+ {I, 3, 2, 5, 4}

-+

{I, 2, 3, 5, 4} -+ {I, 2, 3, 4 , 5}.

It was done by moving the number 1 to the first position, and then 2 to the second position by transpositions, and so on. In fact, the bold faced numbers are interchanged in each one of five steps, and the five transpositions used to convert 0' into the identity permutation are shown below : 0'{2, 1,3,4, 5){1, 2, 3,5, 4}{1,2, 4, 3, 5}{1,3, 2, 4, 5}{1, 2, 3, 5, 4}= (I, 2, 3,4,5).

Here, two permutations are composed from right to left for notational convention : i.e ., if we denote T (2, 1,3,4,5), then O'T 0' 0 T. For example, O'T(2) 0'(1) 3. Also, note that 0' can be converted to the identity permutation by composing the following three transpositions successively:

=

=

0' (2,1,3,4, 5}{1 , 5,3,4, 2}{1,2, 5,4, 3)={1, 2, 3, 4, 5).

=

=

o

56

Chapter 2. Determinants

It is not hard to see that the number of even permutations is equal to that of odd In the case n = 3, one can notice that there are three terms permutations, so it is with + sign and three terms with - sign in det A.

if.

Problem 2.7 Show that the number of even permutations and the number of odd permutations in S« are equal.

=

Problem 2.8 Let A [CI cn] be an n x n matrix with the column vectors c/s. Show that cn] = (-1)j -1 detlc] ... Cj .. . cn]. Note that the same kind of det[c j CI .. . Cj_1 Cj+1 equality holds when A is written in row vectors.

2.3 Cofactor expansion Even ifone has found an explicit formula for the determinant as shown in Theorem 2.6, it is not much help in computing because one has to sum up n! terms, which becomes a very large number as n gets large. Thus, we reformulate the formula by rewriting it in an inductive way, by which the summing time can be reduced. The first factor ala (I) in each of the n! terms is one of al1, a12, ... ,al n in the first row of A. Hence, one can divide the n! terms of the expansion of det A into n groups according to the value of a(1): Say, det A

=

L

sgn(a)ala(l)a2a(2) ... ana(n)

«es, al1AII

+

al2 AI2

+ .. . +

alnAln,

where, for j = 1,2, ... , n, Al j is defined as Alj

=

sgn(a)a2a(2) . .. ana(n) '

L aESn,a(I)=j

This number A Ij will tum out to be the determinant, up to a ± sign, ofthe submatrix of A obtained by deleting the l-st row and j -th column. This submatrix will be denoted by Mlj and called the minor of the entry al j.

Remark: If we replace the entries all, a 12, . ..

, al n in the first row of A as unknown variables XI, X2 , • • . , X n , then det A is a polynomial of the variables Xi , and the number A I j becomes the coefficient of the variable X j in this polynomial.

We now aim to compute Alj for j Al1

=

L aESn,a(l)=1

= 1,2,

sgn(a)a2a(2)" ·ana(n)

.. . , n . Clearly, when j

= 1,

= Lsgn('r)a2T(2)"

. anT(n) ,

T

summing over all permutations r of the numbers 2, 3, . .. , n. Note that each term in A II contains no entries from the first row or from the first column of A. Hence, all

2.3. Cofactor expansion

57

the (n - I)! terms in the sum of All are just the signed elementary products of the submatrix Mll of A obtained by deleting the first row and the first column of A, so that All = det Mll . To compute the number Alj for j > 1, let A = [CI . . . cn] with the column vectors cj's and let B = [Cj CI . . . Cj-l Cj+1 ... cn] be the matrix obtained from A by interchanging the j-th column with its preceding j - 1 columns one by one up to the first. Then, det A = (-1 )j-I det B (see Problem 2.8). Write det B

= bllBll +

b12B\2 + . . . + bi« BIn

as the expansion of det B. Then, alj = bll and the number BII is the coefficient of the entry bll in the formula of det B. By noting A Ij is the coefficient of the entry alj in the formula of det A, one can have Alj = (-I)j-1 BII. Moreover, the minor Mlj of the entry ai] is the same as the minor NIl of the entry bll. Now, by applying the previous conclusion All = det Mll to the matrix B, one can obtain Bll = det Nll and then . I . I . I Alj = (-I)J- Bll = (-I)J- detNll = (-I)J - detMlj' In summary, one can get an expansion of det A with respect to the first row: detA =

=

allAll+a\2AI2+·· ·+alnAln all det Mll - a\2 det MI2 + .. . + (_l) I+na ln det MIn .

This is called the cofactor expansion of det A along the first row. There is a similar expansion with respect to any other row, say the i-th row. To show this, first construct a new matrix C from A by moving the i-th row of A up to the first row by interchanges with its preceding i - I rows one by one. Then det A = (_1)i-1 det C as before . Now, the expansion of det C with respect to the first row [ail .. . ain] is detC = ailCII +ai2C\2 + . .. +ainCln, where Clj =:. (-I)j-1 det Mlj and Mlj denotes the minor of Clj in the matrix C. Noting that Mlj = Mij as minors, we have Aij = (-I)i- IClj =

(-I/+ j detMij

as before and then det A =

ailAil + ai2Ai2 + .. . + ainAin.

The ~u~matrix Mij is called the minor of the entry aij and the number Aij ( _1)1 + J det Mij is called the cofactor of the entry aij. Also, one can do the same with the column vectors because det AT = det A. This gives the following theorem:

58

Chapter 2. Determinants

Theorem 2.7 Let A be an n x n matrix and let Aij be the cofactor of the entry aij' Then, (1) for each 1

s i s n, det A = ail Ail

+ aizAiz + . . . + ainAin,

called the cofactor expansion of det A along the i -th row. j n,

(2) For each 1

s s

det A = aljAlj

+ a2jAzj + . . . + anjAnj,

called the cofactor expansion of det A along the j -tb column.

This cofactor expansion gives an alternative way of defining the determinant inductively. Remark: The sign (-1 )i+ j of the cofactor can be explained as follows:

+ + (_l)n+1

(_l)l+n (_l)2+n (_1)3+n

+

+

+

(_l)n+Z (_l)n+3

(_l)n+n

Therefore, the determinant of an n x n matrix A is the sum of the products of the entries in any given row (or, any given column) with their cofactors. Example 2.5 (Computing det A by a cofactor expansion) Let

23] 5 6

.

8 9 Then the cofactors of all , al2 and al3 are

All = (_1)1+1 det [ ;

~]

5 ·9-8 ·6=-3,

AI2 = (-l)I+Zdet[

~ ~]

=

AI3 = (_1)1+3det[

~ ~]

= 4·8-7 ·5=-3,

(-1)(4·9-7 .6)=6,

respectively. Hence the expansion of det A along the first row is detA = allAII

+ alzAI2 + al3AI3

= 1 . (-3)

+ 2·6 + 3· (-3) =

O.

0

2.3. Cofactor expansion

59

The cofactor expansion formula of det A suggests that the evaluation of Aij can be avoided whenever aij = 0, because the product aij Aij is zero regardless of the value of Aij. Therefore, the computation of the determinant will be simplified by making the cofactor expansion along a row or a column that contains as many zero entries as possible. Moreover, by using the elementary row (or column) operations which do not alter the determinant, a matrix A may be simplified into another one having more zero entries in a row or in a column whose computation of the determinant may be simpler. For example, a forward elimination to a square matrix A will produce an upper triangular matrix U, and so the determinant of A will be just the product of the diagonal entries of U up to the sign caused by possible row interchanges. The next examples illustrate this method for an evaluation of the determinant.

Example 2.6 (Computing det A by aforward elimination and a cofactor expansion) Evaluate the determinant of

I

I -1

-3

A --

2

4

-1]

1 -1

2 -5 -3 -2 6-4

8

.

I

Solution: Apply the elementary operations: 3 x row 1 (-2) x row 1

2 x row 1

+

row 2,

+

row 3,

+

row 4

to A; then

det A = det

1 -1 1 0 -3

o [

o

4

2-1]

_~ ~~ o -1

=

det

[1 7-4] -3 -7 10 4 0-1

.

Now apply the operation : 1 x row 1 + row 2, to the matrix on the right-hand side, and take the cofactor expansion along the second column to get

7

-7

o

-4 ] 10

=

-1

=

det

[-~4 ~0

-1

(_1)1+2.7. det [

-7(2 - 24) = Thus, det A = 154.

-:]

-~

154.

o

60

Chapter 2. Determinants

Example 2.7 Show that det A = (x - y)(x - z)(x - w)(y - z)(y - w)(z - w) for the Vandermonde matrix of order 4:

Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to rows 2, 3, and 4 of A :

3 detA =

=

det

[ 01 y -x x

o o

z-x w-x

J

_ x' y3 - x 3 ] [ y-x y' det Z - x Z2 _ x 2 Z3 _ x 3 w-x

=

y2 x' _ x 2 y3 x_ x3 Z2 - x 2 Z3 _ x 3 w 2 _ x 2 w 3 _x 3

w 2 _x 2 w 3 _x 3

(y - x)(z - x)(w - x) det [ :

y+x z+x w+x

Z2 +xz +x 2 w 2 +xw +x 2

~

y+x z-y w-y

(z - y)(z + y + x) (w - y)(w + y + x)

(x - y)(x - Z)(w - x)det [

y' +xy +x' ]

y' +xy +x'

]

(x - y)(x - z)(w _ x) det [ Z - Y (z - y)(z + y + x) ] w-y (w-y)(w+y+x)

~

=

(x - y)(x - z)(x - w)(y - z)(w - y) det [

=

(x - y)(x - z)(x - w)(y - z)(y - w)(Z - w).

z+y+x] w+y+x

0

Problem 2.9 Use cofactor expansions along a row or a column to evaluate the determinants of the following matrices:

(1) A

=

[~ ~ ~ ~] 2 2 0 1

2 220

Problem 2.10 Evaluate the determinant of

'

2.4. Cramer's rule

(2) B

==

[

a 0 -a 0 -b-d -c

-e

be] c

d

o 1 -I 0

61

.

2.4 Cramer's rule In Chapter 1, we have studied two ways for solving a system of linear equations Ax = b: (i) by Gauss-Jordan elimination (or by LDU factorization) or (ii) by using A -I if A is invertible. In this section, we introduce another method for solving the system Ax = b for an invertible matrix A . The cofactor expansion of the determinant gives a method for computing the inverse of an invertible matrix A. For i f. j, let A * be the matrix A with the j-th row replaced by the i-th row. Then the determinant of A* must be zero, because the i-th and j-th rows are the same. Moreover, with respect to the j-th row, the cofactors of A* are the same as those of A : that is, Ajk = Ajk for all k = 1, . .. , n. Therefore, we have O=detA* = ailAjl+aizAjZ+ .. ·+ainAjn =

ailAjl

+

«ns n

+ ... +

ainAjn .

This proves the following lemma. Lemma 2.8

ifi=j ifif.j . Definition 2.6 Let A be an n x n matrix and let A ij denote the cofactor of aij. Then the new matrix

[A~I A~z

~~: ~~~ ::: ~~] .'::



is called the matrix of cofactors of A. Its transpose is called the adjugate of A and is denoted by adj A. It follows from Lemma 2.8 that detA

0 detA

~

~

o

A · adj A =

[

If A is invertible, then det A

~

] = (detA)/.

detA

f. 0 and we may write A (de~ A adj A) =

I , Thus

62

Chapter 2. Determinants

and

A = (det A) adj (A -I)

by replacing A with A -I. Example2.8 (Computing A-I with adj A) For a matrix A

[ -cd-b] a

, and if det A = ad - be

= [~ ~], adj A

=

f:. 0, then

A-I _ 1 [ d -ab ] . - ad -bc -c Problem 2.11 Compute adj A and A -I for A

=[ ;

i ~] .

2 -2 1

Problem 2.12 Show that A is invertible if and only if adj A is invertible, and that if A is invertible, then (adj A)-I

= -A- = adj(A- 1 ) . detA

Problem 2.13 Let A be an n x n invertible matrix with n > 1. Show that (1) det(adj A) (2) adj(adj A)

= (detA)n-l ; = (det A)n-2 A.

Problem 2.14 For invertible matrices A and B, show that

=

(1) adj AB adj B . adj A; (2) adj QAQ-I Q(adj A)Q-l for any invertible matrix Q; (3) if AB BA , then (adj A)B B(adj A) .

=

=

=

In fact, these three properties are satisfied for any (invertible or not) two square matrices A and B. (See Exercise 6.5.)

The next theorem establishes a formula for the solution of a system of n equations in n unknowns . It may not be useful as a practical method but can be used to study properties of the solution without solving the system. Theorem 2.9 (Cramer's rule) Let Ax = b be a system ofn linear equations in n unknowns such that det A f:. O. Then the system has the unique solution given by detCj detA '

x'--]-

j = 1,2, ... , n,

where C j is the matrix obtainedfrom A by replacing the j -th column with the column matrixb = [bl b2 .. . bnf.

2.4. Crame r 's rule

63

=

Proof: If det A =1= 0, then A is invertible and x A-I b is the unique solution of Ax = b. Since x = A-l b = _ I_ (adj A)b, detA it follows that det Cj detA

o

Exam ple 2.9 Use Cramer 's rule to solve

I

+ + +

Xl 2Xl Xl

Solution:

+ + +

2X2 2X2 2 X2

X3 = X3 = 3X3 =

A~ [ ; 22I] 2 3 I

n

2 [ 50 60 2 90 2

Cl =

C2~

Therefore,

det CI - = 10 det A '

[i

,

50 1] 60 I , 90 3

det C2 detA

Xl = -

50 60 90.

X2 = - - = 10,

C3 ~

[i

detC3 det A

2 50 ] 2 60 . 2 90

X3 = - - = 20.

o

Cramer's rule provides a convenient method for writing down the solution of a system of n linear equations in n unknowns in terms of determina nts. To find the solution, however, one must evaluate n + I determinants of order n, Evaluating even two of these determi nants generally involves more computations than solving the system by using Gauss-Jordan elimination. Problem 2.15 Use Cramer 's rule to solve the following systems. (I)

(2)

I

3Xl - 2x 1 2 x 4 x

4x2 4X2 5X2

+ +

3

+ + 5

- + y z 7 2 + -y + z 2

-

y

-

I

-

z

-2

3X3 5X3 2X3

6

1. 3 0 2.

Problem 2.16 Let A be the matrix obtained from the identity matrix In with i-th column replaced by the column vector x = [Xl .. . xn f. Compute det A.

64

Chapter 2. Determinants

2.5 Applications 2.5.1 Miscellaneous examples for determinants Example 2.10 Let A be the Vandermonde matrix of order n:

A=

1 Xl Xf ... X~_l] 1 X2 X22 . . . X2n-l . . . . '

..

.. . . "

[

1

Xn

x nn -

x n2

l

Its determinant can be computed by the same method as in Example 2.7 as follows:

TI

detA =

(Xj -Xi) .

l~i 1, then 0 Xl X2 Xn xl au al2 al n n n a2n = - LLAijXiXj . det x2 a21 a22 i=l j=l x n anI an2

ann

Solution: First take the cofactor expansion along the first row, and then compute the cofactor expansion along the first column of each n x n submatrix. 0 Example 2.12 Let f I. For the matrix

12 , ... , fl(X) f{(x)

[

fn be n real-valued differentiable functions on JR.

h(x) f 2(x )

f?-:I) (x) j 2(n-

l)( )

x

fn(x)

f~(x)

.. ·

]

f~n-:l\X)

,

its determinant is called the Wronskian for (fl (x), h(x) , . . . , fn(x)} , For example , det

[

X 1

sin X C?S X

o - sm x

cos X ] sin x - cos x -

= - x,

but det

sin X X + sin X ] 1 + C?SX 0 - sm x - sm x

[ X 1

C?SX

= o.

In general, in fl , 12, ..., fn , if one of them is a constant multiple of another or a sum of such multiples, then the Wronskian must be zero.

2.5.1. Application: Miscellaneousexamples for determinants

65

Example 2.13 An n x n matrix A is called a circulant matrix if the i-th row of A is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general form of the circulant matrix is

A=

al a2 a3 an al a2 an-I an al

an an-I an-2

a2 a3 a4

al

where w = e2rri /3 is the primitive root of unity. In general, for n > 1,

n

n-I

detA =

(al +a2Wj +a3w7 +... +anwrl) ,

j =O

where W j = e2rrij In, j = 0, 1, ... , n - 1, are the roots of unity. (See Example 8.18 for the proof.)

Example 2.14 A tridiagonal matrix is a square matrix of the form

al bl 0 CI a2 b2

0 0

0 0

0 0

0

C2 a3

0

0

0

0 0 0

0 0 0

Tn =

0 0 0

0 an-2 bn-2 Cn-2 an-I bn-I 0 Cn-I an

The determinant of this matrix can be computed by a recurrence relation: Set Do = 1 and Dk = det for k 2: 1. By expanding with respect to the k-th row, one can have a recurrence relation

n

n

The following two special cases are interesting .

Case (1) Let all ai, b j , Ck be the same, say a;

D2 = 0 and

= bj = Ck = b >

O. Then, DI

= b,

66

Chapter 2. Determinants Dn = bDn-1 - b 2D n_2

for n ~ 3.

Successively, one can find D3 = = _b 4 , D5 = 0, . . .. In general, the n-th term D n of the recurrence relation is given by

_b 3 , D4

o; =

n

b [cos

n; + ~ n; l sin

Later in Section 6.3.1, it will be discussed how to find the n-th term of a given recurrence relation. (See Exercise 8.6.)

Case (2) Let all bj

= 1 and all Ck = -1, -1

a2

0 1

0 0

0 0

0 0

0

-1

a3

0

0

0

0 0 0

0 0 0

0 0 0

an-2 an-I

0 1 an

al

(al . .. an)

Then,

and let us write

= det

(ala2·· · an) (a2a3 .. . an)

-1

-1

0 1

=al+ a2

1

+ a3+

+

an-I

1

+-an

ih

Proof: Let us prove it by induction on n. Clearly, al + = «~~» show that 1 (ala2 . . . an) al + = , (a2a3 . . . an) (a2a3 . . . an) (a3a4 · · · an)

.It remains to

i.e., al (a2 . .. an) + (a3 .. . an) = (a,a2 . . . an) ' But this identity follows from the D previous recurrence relation, since (a,a2' . . an) = (an . . . a2a,) .

Example 2.15 (Binet-Cauchy formula) Let A and B be matrices of size n x m and m x n, respectively, and n ::: m. Then

where Ak! ...k n is the minor obtained from the columns of A whose numbers are kl' . . . ,kn and Bk! ...kn is the minor obtained from the rows of B whose numbers

2.5.2. Application:Area and volume

67

are kl' ... ,kn . In other words, det(AB) is the sum of products of the corresponding majors of A and B, where a major of a matrix is, by definition, a determinant of maximal order minor in the matrix. Proof: Let C = AB, Cij = L~=l aikbkj' Then

m

L

kt.....kn=l

alkt ' . . ankn

L

(_1) (4): Let ej , e2, ... , em be the standard basis for jRm. Then for each ei, one can find an Xi E jRn such that AXi = ei by the hypothesis (1). If B is the n x m matrix whose columns are these Xi'S: i.e., B = [Xj X2 • . . xm ] , then, by matrix multiplication,

(4) => (1): If B is a right inverse of A, then for any b s jRm, X = Bb is a solution ~~=~

0

Condition (2) means that A has m linearly independent column vectors, and condition (3) implies that there exist m linearly independent row vectors of A, since rank A m dim R(A) . Note that if C(A) ~ jRm, then Ax = b has no solution for b ¢ C(A) .

= =

Theorem 3.25 (Uniqueness) Let A be an m x n matrix. Then thefollowing statements are equivalent. (1) (2) (3) (4) (5) (6)

For each b E jRm, Ax = b has at most one solution x in jRn. The column vectors of A are linearly independent. dimC(A) = rank A = n (hence n :s m). R(A) = jRn. N(A) = {OJ. A has a left inverse (i.e., C such that C A = In).

Proof: (1) => (2): Note that the column vectors of A are linearly independent if and only if the homogeneous equation Ax = 0 has only the trivial solution . However,

3.8. Invertibility

107

Ax = 0 has always the trivial solution x = 0, and the statement (1) implies that it is the only one. (2) {} (3): Clear, because all the column vectors are linearly independent if and only if they form a basis for C(A), or dim C(A) = n ~ m. (3) {} (4): Clear, because dim R(A) R(A) = jRn (see Problem 3.10). (4) {} (5): Clear, since dim R(A)

+

= rank A = dimC(A) = n if and only if

dimN(A) = n.

(2) ::::} (6): Suppose that the columns of A are linearly independent so that rank A = n. Extend these column vectors of A to a basis for jRm by adding m -n additional independent vectors to them. Construct an m x m matrix S with those basis vectors in its columns. Then the matrix S has rank m, and hence it is invertible. Let C be the n x m matrix obtained from S-I by throwing away the last m - n rows. Since the first n columns of S constitute the matrix A, we have C A = In.

(6) ::::} (1): Let C be a left inverse of A. If Ax = b has no solution, then we are done . Suppose that Ax = b has two solutions, say Xt and X2. Then XI

= CAxI = Cb = CAX2 = X2·

o

Hence, the system can have at most one solution.

Remark: (1) We have proved that an m x n matrix A has a right inverse if and only if rank A = m, while A has a left inverse if and only if rank A = n. Therefore, if m 1= n, A cannot have both left and right inverses. (2) For a practical way of finding a right or a left inverse of an m xn matrix A, we will show later (see Remark (1) below Theorem 5.26) that if rank A = m, then (AAT )-I exists and AT(AAT)-I is a right inverse of A, and if rank A = n, then (AT A)-I exists and ( A T A)-I AT is a left inverse of A (see Theorem 5.26) . (3) Note that if m = n so that A is a square matrix, then A has a right inverse (and a left inverse) if and only if rank A = m = n. Moreover, in this case the inverses are the same (see Theorem 1.9). Therefore, a square matrix A has rank n if and only if A is invertible. This means that for a square matrix "Existence Uniqueness," and the ten statements listed in Theorems 3.24-3.25 are all equivalent. In particular, for the invertibility of a square matrix it is enough to show the existence of a one-sided inverse.

=

Problem 3.24 For each of the following matrices, find all vectors b such that the system of linear equations Ax = b has at least one solution . Also, discuss the uniqueness of the solution.

108

Chapter 3. Vector Spaces (I) A

(3) A

~

[J

~

[l

3 -2 5 4 1 3 7 - 3 6 13

~l

(2) A

-3]

2 -3 -2 0 -4 3 -2 8 -7 -2 _~; 1 -9 -10

, (4) A

~[ =

;

-;

-6

[

l ;l

1

1 1 ~ 5

2 -2

Summarizing all the results obtained so far about solvability of a system, one can obtain several characterizations of the invertibility of a square matrix. The following theorem is a collection of the results proved in Theorems 1.9,3.24, and 3.25.

Theorem 3.26 For a square matrix A oforder n, the following statements are equivalent. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)* (16)* (17)*

A is invertible. det A =I O. A is row equivalent to In . A is a product of elementary matrices. Elimination can be completed: PA LDU, with all d; =I O. Ax = b has a solution for every bE jRn . Ax = 0 has only a trivial solution, i.e., N(A) = (OJ. The columns of A are linearly independent. The columns of A span jRn, i.e., C(A) = jRn. A has a left inverse. rank A = n. The rows of A are linearly independent. The rows of A span jRn, i.e., R(A) = R". A has a right inverse. The linear transformation A : jRn -+ jRn via A(x) = Ax is injective. The linear transformation A : jRn -+ jRn is surjective. Zero is not an eigenvalue of A.

=

Proof: Exercise; where have we proved which claim? Prove any not covered. The numbers with asterisks will be explained in the following places: (15) and (16) in Remark on page 133 and (17) in Theorem 6.1. 0

3.9 Applications 3.9.1 Interpolation In many scientific experiments , a scientist wants to find the precise functional relationship between input data and output data. That is, in his experiment, he puts various

3.9.1. Application: Interpolation

109

input values into his experimental device and obtains output values corresponding to those input values. After his experiment, what he has is a table of inputs and outputs. The precise functional relationship might be very complicated, and sometimes it might be very hard or almost impossible to find the precise function. In this case, one thing he can do is to find a polynomial whose graph passes through each of the data points and comes very close to the function he wanted to find. That is, he is looking for a polynomial that approximates the precise function. Such a polynomial is called an interpolating polynomial. This problem is closely related to systems of linear equations. Let us begin with a set of given data: Suppose that for n + I distinct experimental input values xo, Xl, ... , x n, we obtained n + I output values YO = I(xo), YI = I(XI), . .. ,Yn = I(xn) . The output values are supposed to be related to the inputs by a certain (unknown) function I. We wish to construct a polynomial p(x) of degree less than or equal to n which interpolates I (x) atxo, Xl, . . . , Xn: i.e., p(Xi) = Yi = I (Xi) for i = 0, I , . . . , n. Note that if there is such a polynomial, it must be unique. Indeed, if q (x) is another such polynomial, then hex) = p(x) - q(x) is also a polynomial of degree less than or equal to n vanishing at n + I distinct points Xo, Xl, . .. , Xn. Hence h (x) must be the identically zero polynomial so that p(x) = q(x) for all X E JR. To find such a polynomial p(x), let

with n + I unknowns ai's. Then, p(Xi) = ao + alXi

+ ... +

anxf = Yi = I(Xi),

for i = 0, I, . .. , n. In matrix notation, XQ ., . . .. I Xl

[

... .. .

I

Xn

X~] [ Xl

..

ao] al

.

...

X~

an

-

[ Yo Yl ]

. ..

.

Yn

The coefficient matrix A is a square matrix of order n + I, known as Vandennonde's matrix (see Example 2.10), whose determinant is

detA =

n

(X) -Xi) .

O;S.i : V ~ V** defined by ct>(x) = is an isomorphism and it is not dependent on the choice of basis for V.

x

x

Theorem 4.20 The mapping ct> : V ~ V** defined by ct>(x) = from V to V**.

xis an isomorphism

Proof: To show the linearity of ct>, let x, y E V and k a scalar. Then , for any f E V*, ct>(x + ky)(f) =

..--....-

(x + ky)(f) = f(x

=

f(x)

+

=

(x

ky)(f) =

+

+

ky)

+ ky(f) (ct>(x) + kct>(y» (I).

kf(y) = x(f)

Hence, ct>(x + ky) = ct>(x) + kct>(y) . = 0 in V**, To show that ct> is injective, suppose x E Ker(ct» . Then ct>(x) = i.e., x(f) = 0 for all f E V* . It implies that x = 0: In fact, if x :f:. 0, one can choose a basis ex = {VI, V2, ... , vn } for V such that VI = X. Let ex* = {vi, vi, . . . , v~} be the dual basis of ex. Then

x

0= xCvi) = vi(x) = vi(vI) = 1, which is a contradiction. Thus , x = 0 and Ker(ct» = {OJ. Since dim V = dim V**, ct> is an isomorphism.

= ]R3 and define Ii E V* as follows : II (x, y, z) = x - 2y, h(x, y, Z) = x + y + Z,

D

Problem 4.21 Let V

Prove that {fl,

Iz.

f3(x, y, Z) = y - 3z.

13} is a basis for V*, and then find a basis for V for which it is the dual .

We now consider the matrix representation of the transpose S* : W* ~ V* of a linear transformation S : V ~ W. Let ex = {VI, V2, .. . , vn } and f3 = {WI, W2, . .. , w m} be bases for V and W with their dual bases ex* = {vi, vi, ... , v~} and f3* = {wi, wi, ... , w~}, respectively.

Theorem 4.21 The matrix representation of the transpose S* : W* transpose of the matrix representation of S : V

Proof: Let S(Vi) = L~=I

Then

auw«. so that

~

W, that is,

~

V* is the

4.7.1. Application: Dual spaces and adjoint

147

[S*lp: = [[S*(wj)la* ... [S*(w~)la*] ' Note that, for 1

s j s m, S*(wj) =

n

n

LS*(wj)(v;)vi =

LajiVi ,

;=1

;=1

since

= (wj 0 S)(Vi)

= wj(S(v;» m

=

wj (takiwk)

=

0'* Hence, we get [S*lfJ*

=

(

Lakiwj(Wk)

=

k=1

k=1

[SlafJ)T .

D

Example 4.27 (The transpose AT is the adjoint transformation of A) Let A : jRn ~ jRm be a linear transformation defined by an m x n matrix A. Let a and f3 be the standard bases for jRn and jRm, respectively. Then [Al~ = A. By Theorem 4.21, we have [A *lp: = ([Al~)T. Thus, with the identification jRk* = jRk via Ct* = Ct and

f3* = f3 as in Example 4.25, we have

[A *lp: = A * and A * = AT . In this sense, we

see that the transpose A T is the adjoint transformation of A.

D

As the final part of the section, we consider the dual space of a subspace. Let V be a vector space of dimension n, and let U be a subspace of V of dimension k. Then U* = {T : U ~ jR : T is linear on U} is not a subspace of V*. However, one can extend each T E U* to a linear functional on V as follows . Choose a basis a = {UI , U2 , ... , uk! for U . Then by definition Ct* = {uj,ui, , uk} is its dual , un} for V . For basis for U*. Now extend a to a basis f3 = {UI, U2, . . . , Uk> Uk+l, each T E U*, let T : V ~ jR be the linear functional on V defined by

ifi ~ k , if k + 1 ~ i

~

n.

Then clearly T E V* and the restriction Tlu ofT on U is simply T: i.e., Tlu = T E U*. It is easy to see that (T + kS) T + kS. In particular, it is also easy to see that {uj, u2' .. ., uk} is linearly independent in V* and uilu = ui E U*, i = 1,2, .. . ,k. Therefore, one obtains a one-to -one linear transformation

=

tp : U* ~

V*

given by cp(T) = T for all T E U*. The image cp(U*) is now a subspace of V*. By identifying U* with the image cp(U*) , one can say U* is a subspace of V*. Problem 4.22 Let U and W be subspaces of a vector space V. Show that U £:: W if and only ifW* £:: U*.

148

Chapter 4. Linear Transformations

Let S be an arbitrary subset of V, and let (S} denote the subspace of V spanned by the vectors in S. Let Sol = (f E V* : f(x) = 0 for any XES} . Then it is easy to show that Sol is a subspace of V*, Sol = (S}ol, and dim(S} + dim Sol = n. Let R be a subset of V*. Then Rol = {x E V : f(x) = 0 for any fER} is again a subspace of V such that Rol = (R}ol and dim Rol + dim(R} = n. Problem 4.23 For subspaces U and W of a vector space V, show that (1) (U + W)ol Uol n w(2) (U n W)ol Uol

=

=

+ w-.

4.7.2 Computer graphics One of the simple applications of a linear transformation is to animation or graphical display of pictures on a computer screen. For a simple display of the idea, let us consider a picture in the 2-plane ]R2. Note that a picture or an image on a screen usually consists of a number of points, lines or curves connecting some of them, and information about how to fill the regions bounded by the lines and curves. Assuming that the computer has information about how to connect the points and curves, a figure can be defined by a list of points. For example, consider the capital letters 'LA' as in Figure 4.8. They can be repre-

o Figure 4.8. Letter L.A. on a screen

sented by a matrix with coordinates of the vertices. For example, the coordinates of the 6 vertices of 'L' form a matrix:

6 vertices 1 2 3 4 5 x-coordinate [ 0.0 0.0 0.5 0.5 2.0 y-coordinate 0.0 2.0 2.0 0.5 0.5 0.0

2.0] = A.

Of course, we assume that the computer knows which vertices are connected to which by lines via some algorithm. We know that line segments are transformed to other line segments by a matrix, considered as a linear transformation. Thus, by multiplying A by a matrix, the vertices are transformed to the other set of vertices, and the line segments connecting the vertices are preserved. For example, the matrix B =

[b

0.;5] transforms the matrix A to the following form, which represents

new coordinates of the vertices:

4.7.2. Application: Computer graphics

149

3 4 5 6 1 2 0.0 0.5 1.0 0.625 2.125 2.0] [ 0.0 2.0 2.0 0.5 0.5 0.0 .

vertices BA =

Now, the computer connects these vertices properly by lines according to the given algorithm and displays on the screen the changed figure as the left-hand side of the Figure 4.9. The multiplication ofthe matrix C =

n L!!=Js

[005

~] to BA shrinks the width

3

BA

1

(CB)A

6

Figure 4.9. Tilting and Shrinking

of BA by half producing the right-hand side of Figure 4.9. Thus, changes in the shape of a figure may be obtained by compositions of appropriate linear transformations. Now, it is suggested that the readers try to find various matrices such as reflections , rotations, or any other linear transformations, and multiply A by them to see how the shape of the figure changes.

= [~ o.~s ] above, by the matrix

Problem 4.24 For the given matrices A and B

BT A

instead of BA, what kind of figure can you have ?

Remark: Incidentally, one can see that the composition of a rotation by 1( followed by a reflection about the x-axis is the same as the composition of the reflection followed by the rotation (see Figure 4.10). In general, a rotation and a reflection are not commutative, neither are a reflection and another reflection. The above argument generally applie s to a figure in any dimension. For instance, a 3 x 3 matrix may be used to convert a figure in )R3 since each point has three components. Example 4.28 (Classifying all rotations in )R3) It is easy to see that the matrices

R(x .a)

=

1 0 0 coso [ o sin o

- sin f3

o

cos f3 COS

R( z,y)

]

=

[

Y

Si~ Y

-sin y cos y

o

are the rotations about the x, y, z-axes by the angles a,

O~] f3 and y, respectively .

,

150

Chapter 4. Linear Transformations

!t \f

1t 2\

Rotationby 7f ~

Reflection

Reflection

Rotation by x

Figure 4.10. Commutativity of a rotationand a reflection

In general, the matrix that rotates JR3 with respect to a given axis appears frequently in many applications. One can easily express such a general rotation as a composition of basic rotations such as R(x ,a) , R(y ,{3) and R(z ,y) : First, note that by choosing a unit vector u in JR3, one can determine an axis for a rotation by taking a line passing through u and the origin O. (In fact, vectors u and -u determine the same line) . Let u = (cosacos{3, cos a sin {3, sin a), -~ ~ a ~ ~' o ~ {3 ~ 2rr in the spherical coordinates. To find the matrix R(u ,o) of the rotation about the u-axis bye, we first rotate the u-axis about the z-axis into the x z-plane by R(z,_{3) and then into the x-axis by the rotation R (y ,-a) about the y-axis , Then the rotation about the u-axis is the same as the rotation about the x-axis followed by the inverses of the above rotations, i.e., take the rotation R(x ,o) about the x-axis, and then get back to the rotation about the u-axis via R(y,a) and R(z,{3) ' In summary,

o z u

y x

Figure 4.11.A rotationabout the u-axis

4.7.2. Application: Computer graphics

Problem 4.25 Find the matrix R (0 , t ) for the rotation aboutthe line determined by u

151

= 0 , 1, 1)

T(

by

4'

So far, we have seen rotations, reflections, tilting (or say shear) or scaling (shrinking or enlargement) or their compositions as linear transformations on the plain ]R2 or the space ]R3 for computer graphics. However, another indispensable transformation for computer graphics is a translation: A translation is by definition a transformation T : jRn --+ ]Rn defined by T (x) x + Xo for any x E jRn , where Xo is a fixed vector in ]Rn . Unfortunately, a translation is not linear if Xo =1= 0 and hence it cannot be represented by a matrix . To escape this disadvantage , we introduce a new coordinate system , called a homogeneous coordinate. For brevity, we will consider only the 3-space ]R3. A point x = (x, y, z) E ]R3 in the rectangular coordinate can be viewed as the set of vectors x (hx , hy, hz; h), h =1= 0 in the 4-space ]R4 in a homogeneous coordinate. Most time, we use (x, y , z, I) as a representative of this set. Conversely, a point (hx, hy, hz, h), (h i= 0) in the 4-space ]R4 in a homogeneous coordinate corresponds to the point (x / h, y / h, z/ h) E ]R3 in the rectangular coordinate. Now, it is possible to represent all of our transformations including translations as 4 x 4 matrices by using the homogeneous coordinate and it will be shown case by case as follows.

=

=

(1) Translations: A translation T : jR3 --+ ]R3 defined by T (x) = x + xo, where Xo = (xo, yo , zo). can be represented by a matrix multiplication in the homogeneous

coordinate as

nR(""~[ :~F ~

(2) Rotations: With the notations in Example 4.28 ,

R(',·)~u ~~~; ~~raa

- sinf3

o

cos f3

o

Y - sin y 0 0] sin y cos y 0 0

COS

R (z,y )= [

0

0

I 0

o

0

0 I

are the rotations about the x , y, z-axes by the angles a , f3 and y in the homogeneous coordinate, respectively. (3) Reflections: An xy- reflection is represented by a matrix multiplication in the homogeneous coordinate as

152

Chapter 4. Linear Transformations

Similarly, one can have an x z-reflection and a yz -reflection. (4) Shear: An xy-shear can be represented by a matrix multiplication in the homogeneous coordinate as

Similarly, one can have an xz-shear and a yz-shear with matrices of the form

! ~ ~], 001

respectively. (5) Scaling: A scaling is represented by a matrix multiplication in the homogeneous coordinate as

In summary, all of the transformations can be represented by matrix multiplications in the homogeneous coordinate and also their compositions can be done by their corresponding matrix multiplications.

4.8 Exercises 4.1. Which of the following functions T are linear transformations?

i,

(1) T(x, y) (x 2 _ x 2 + y2) . (2) T(x, y, z) = (x + y, 0, 2x + 4z). (sin x , y) . (3) T(x , y) (4) T(x, y) (x + I, 2y, x + y ) . (5) T(x, y, z) (]»], 0) .

=

= = =

4.2. Let T : P2(lR) ~ P3(lR) be a linear transformation such that T(l) = 1, T(x) = x 2 and T(x 2 ) = x 3 + x . Find T(ax 2 + bx + c) . 4.3. Find SoT and/or T

0

S whenever it is defined.

4.8. Exercises (1) T(x , y,z) (2) T(x, y )

4.4. Let S : C(R)

= (x -

= (x ,

--+

y+ z, x +z), S(x, y )

= (x,

3y +x, 2x - 4y , y) , S(x, y, z)

153

x - y , y) ;

= (2x,

y ).

C(]R) be the function on the vector space C(]R)defined by, for f E C(]R), S(f )(x ) = f(x) _

~x uf (u)du .

Show that S is a linear transformation on the vector space C(]R).

4.5. Let T be a linear transformation on a vector space V such that T 2 = id and T U = (v E V : T(v) = v} and W = (v E V : T(v ) = -v} . Show that (1) at least one of U and W is a nonzero subspace of V; (2) U

(3) V

nW

f= id. Let

= {O};

=U +

W. 4.6. If T : ]R3 --+ ]R3 is defined by T(x , y, z) = (2x - z, 3x - 2y , x - 2y (1) determine the null space N (T) of T,

+

z),

(2) determine whether T is one-to-one, (3) find a basis for N(T) . 4.7. Show that each of the following linear transformations T on ]R3 is invertible, and find a formula for T- 1: (1) T (x,y ,z) = (3x , x- y, 2x+y + z). (2) T(x, y, z)

= (2x , 4x -

y, 2x

+

3y - z) .

4.8. Let S, T : V --+ V be linear transformations on a vector space V. (1) Show that if T 0 S is one-to-one , then T is an isomorphism. (2) Show that if T

0

S is onto, then T is an isomorphism.

(3) Show that if T k is an isomorphism for some positive k, then T is an isomorphism. 4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear transformation from ]R2 to ]R3 . Prove that the compo sition SoT is not invertible. 4.10. Let T be a linear transformation on a vector space V satisfying T - T 2 = id . Show that T is invertible.

4.11. Let A be an n x n matrix, which is a linear transformat ion on the n-space R" by the matrix multiplication Ax for any x E R", Suppose that rj , r2, . . . , r n are linearly independent vectors in ]Rn constituting a parallelepiped (see Remark (2) on page 70). Then A transforms this parallelepiped into another parallelepiped determined by Arl, Ar2, . . . , Arn . Suppose that we denote the n x n matrix whose j -th column is r j by B , and the n x n matrix whose j -th column is Ar j by C . Prove that vol(P(C»

= I det

AI vol(P(B» .

(This means that, for a square matrix A considered as a linear transformation, the absolute value of the determinant of A is the ratio between the volumes of a parallelepiped P(B) and its image parallelepiped P(C) under the transformation by A.1f det A = 0, then the image P(C) is a parallelepiped in a subspace of dimension less than n). 4.12 . Let T : ]R3 --+ ]R3 be the linear transformation given by T (x, y ,z)

= (x +

y,y + z ,x + z ).

Let C denote the unit cube in ]R3 determined by the standard basis ej , e2, e3. Find the volume of the image parallelepiped T (C) of C under T.

154

Chapter 4. Linear Transformations

4.13. With respect to the ordered basis a = {I, x, x 2 } for the vector space P2(lR), find the coordinate vector of the following polynomials: (1) f(x)

=x2 - x +

I,

= x 2 + 4x -

(2) f(x)

I,

(3) f(x)

= 2x + 5.

4.14. Let T : P3 (R) ~ P3 (R) be the linear transformation defined by Tf(x)

= f"(x) -

4f'(x)

+

f(x).

= {x, 1 + x, x + x 2 ,

x 3 }. 2 4.15. Let T be the linear transformation on lR defined by T(x, y) (-y, x). Find the matrix [Tl a for the basis a

=

(1) What is the matrix of T with respect to an ordered basis a (1, 2), V2 = (1, -I)?

= {VI, V2}, where vI =

(2) Show that for every real number c the linear transformation T - c id is invertible.

4.16. Find the matrix representation of each of the following linear transformations T on lR2 with respect to the standard basis {el, e2}. (1) T(x , y) (2y, 3x - y) . (2) T(x, y)

4.17. Let M

= = (3x -

= [~

4y, x

i ~l

+

5y).

(1) Find the unique linear transformation T : lR3 ~ lR2 so that M is the associated matrix of T with respect to the bases

"1

~

!Ul [il UJ I·

(2) Find T(x, y, z).

"2

~

([

n[: ]J

4.18. Find the matrix representation of each of the following linear transformations T on P2 (lR) with respect to the basis {I, x , x 2 }. (1) T: p(x)

~

p(x

+

1).

(2) T: p(x) ~ p' (x) .

(3) T: p(x)

~

p(O)x .

(4) T : p(x) ~ p(x) - p(O).

x

= {el , e2, ej] the standard basis and I , 1), U2 = (1, 1, 0), U3 = (1,0, O)}.

4.19. Consider the following ordered bases of lR3: a

fJ

= {UI = (I,

(1) Find the basis-change matrix P from a to fJ . (2) Find the basis-change matrix Q from (3) Verify that Q

= P- l .

fJ to a.

(4) Show that [vlfJ = P[vl er for any vector V E lR3. (5) Show that [T]fJ = Q-I [Tl a Q for the linear transformation T defined by T(x,y,z)=(2y+x, x-4y, 3x). 4.20 . There are no matrices A and B in Mnxn(lR) such that AB - BA

= In.

4.8. Exercises

155

4.21. Let T : ]R3 -+ ]RZ be the linear transformation defined by

T (x , y, z)

= (3x + 2y -

4z , x - 5y + 3z),

=

and let a {(l , I , 1), (1,1, D), (I, D, D) } and d and ]RZ, respect ively. (l) Find the associated matrix (2) Verify [T]~[v]a

= {(2, 3) ,

3), (2, 5)} bebasesfor]R3

[T]~ for T .

= [T (v)]/l for any v E ]R3.

4.22 . Find the basis-change matrix (l) a

= {(l,

[id]~ from a to .8, when

(D , I )}, .8 = {(6, 4) , (4,8)};

(2) a = {(5, 1), (I ,2) }, .8 = {(l , D), (D, I )};

= {(l, 1, 1), (l, I , D), (I, D, D)} , .8 = {(2, D, 3), (- 1, 4, I) , (3, 2, 5)}; (4) a={t , 1, t Z}, .8 = {3 +2t + t Z , t Z-4 , 2 +t}.

(3) a

4.23. Show that all matri ces of the form Ae = [ 4.24. Show that the matrix A =

4.25. Are the matrices

[~

c~s BB Sin

sin BB ] are similar. - cos

[~ ~] cannot be similar to a diagonal matrix.

i ~]

~ ~ ~]

and [ similar? I DID D 3 4.26. For a linear transformation T on a vector space V, show that T is one-to -one if and only if its tran spose T * is one-to-on e.

4.27. Let T : ]R3

-+ ]R3 be the linear transformation defined by

T (x, y , z)= (2y+ z , -x+4y+ z , x +z) . Compute [Tl a and [T*la* for the standard basis a = [e j , eZ, ej} , 4.28. Let T be the linea r transformation from ]R3 into ]Rz defined by T (XI, X2 , X3) = (XI +x2, 2X3 - XI ) ·

(l) For the standard ordered bases a and .8 for ]R3 and ]Rz respectivel y, find the associated matrix for T with respect to the bases a and .8.

(2) Let a = {XI, XZ , x3} and .8 = {YI , yz}, where XI = (l, D, -I), xz = (1, 1, 1), X3 = (1 , D, D), and YI = (D, I), yz = (I , D). Find the associated matrices [T]~ and

[T * l~: .

4.29 . Let T be the linear transformation from ]R3 to ]R4 defined by

T (x , y , z )= (2x+ y+4z , x +y +2z. y + 2z, x +y + 3z) . Find the image and the kernel of T . What is the dimen sion of Im(T)? Find [Tl~ and [T*l~: , where

= {(1 , D, D), (D, 1, D) , (D, D, I)}, fJ = {(l ,D,D,D), (I , I , D, D), (1 , I , I ,D),

a

4.30. Let T be the linear transforma tion on V respect to the standard ordered basis is

(1, 1, 1, I)}.

= ]R3, for which the

associated matrix with

A= [ -I~i3 4~ ] . Find the bases for the kernel and the image of the transpose T* on V* .

156

Chapter 4. Linear Transformations

= P2(lR) by I f3(p) = Jo- p(x)dx.

4.31. Define three linear functionals on the vector space V

II (p) = Jd

p(x)dx,

Show that (II,

12,

h(p)

= J0

2

p(x)dx ,

13j is a basis for V* by finding its dual basis for V .

4.32. Determine whether or not the following statements are true in general, and justify your answers . (1) For a linear transformation T : jRn --+ jRm, Ker(T)

= {OJ if m >

n.

(2) For a linear transformation T : jRn --+ jRm, Ker(T) =1= {OJ if m < n . (3) A linear transformation T : jRn --+ jRm is one-to-one if and only if the nullspace of

[T]g is {OJ for any basis ex for jRn and any basis f3 for jRm. (4) For any linear transformation T on jRn, the dimension of the image of T is equal to that of the row space of [T]a for any basis ex for jRn . (5) For any two linear transformations T : V --+ W and S : W --+ Z, if Ker(S then Ker(T) = O.

0

T)

= 0,

(6) Any polynomial p(x) is linear if and only if the degree of p(x) is less than or equal

to 1. (7) Let T : jR3 --+ jR2 be a function given as T(x) = (TI (x) , T2(X» for any x E jR3. Then T is linear if and only if their coordinate functions Ti , i = 1, 2, are linear. (8) For a linear transformation T : jRn --+ jRn , if [T]g = In for some bases ex and R" , then T must be the identity transformation.

f3 of

(9) If a linear transformation T : jRn --+ jRn is one-to-one , then any matrix representation of T is nonsingular. (10) Any m x n matrix A can be a matrix representation of a linear transformation T : jRn --+ jRm.

(11) Every basis-change matrix is invertible. (12) A matrix similar to a basis-change matrix is also a basis-change matrix. (13) det : Mn xn(jR) --+ jR is a linear functional. (14) Every translation in jRn is a linear transformation.

5

Inner Product Spaces

5.1 Dot products and inner products To study a geometry ofa vector space, we go back to the case ofthe 3-space ]R3 •The dot (or Euclidean inner) product of two vectors x = (XI, X2, X3) and Y = (YI, Y2, Y3) in ]R3 is a number defined by the formula

x- y

~ x , y1+ x,Y2 +X3Y3 ~ [x, X2'3] [ ~ ] ~ xTy,

where x T Y is the matrix product of x T and Y, which is also a number identified with the 1 x 1 matrix x T y. Using the dot product, the length (or magnitude) of a vector x (XI , X2, X3) is defined by

=

[x] = (x · x)

i = Jxl + x~ + xi '

and the Euclidean distance between two vectors x and Y in ]R3 is defined by d(x, y) =

IIx - YII .

In this way, the dot product can be considered to be a ruler for measuring the length of a line segment in the 3-space ]R3. Furthermore, it can also be used to measure the angle between two nonzero vectors: in fact, the angle 0 between two vectors x and Y in ]R3 is measured by the formula involving the dot product

x ·y IIxllllYII '

cos s = - - - 0 ~ 0 ~ it , since the dot product satisfies the formula

x· Y = IIxllllYII cos s,

o=

In particular, two vectors x and Y are orthogonal (i.e ., they form a right angle tt /2) if and only if the Pythagorean theorem holds:

J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

158

Chapter 5. Inner Product Spaces

By rewriting this formula in terms of the dot product, we obtain another equivalent condition: X. Y = XIYI + xzYz + X3Y3 = O. In fact, this dot product is one of the most important structures with which JR3 is equipped. Euclidean geometry begins with the vector space JR3 together with the dot product, because the Euclidean distance can be defined by the dot product. The dot product has a direct extension to the n-space JRn of any dimension n: for any two vectors x = (XI , Xz , . . . , xn) and y = (YI , Yz, .. . , Yn) in JRn , their dot product, also called the Euclidean inner product, and the length (or magnitude) of a vector are defined similarly as

XIYI +xzyz+ " ,+xnYn = xTy,

x ·y

[x] =

(x.x)! =

Jxr+xi+ "'+x~,

To extend this notion of the dot product to a (real) vector space, we extract the most essential properties that the dot product in JRn satisfies and take these properties as axioms for an inner product on a vector space V . Definition 5.1 An inner product on a real vector space V is a function that associates a real number (x, y) to each pair of vectors x and y in V in such a way that the following rules are satisfied: For any vectors x, y and z in V and any scalar k in JR, (x, y) = (y , x) (2) (x + y, z) = (x, z) + (y, z) (3) (kx , y) = k(x, y) (4) (x, x) ::: 0, and (x, x) = 0 ¢> x = 0 (1)

(symmetry), (additivity), (homogeneity), (positive definiteness) .

A pair (V, ( ,)) of a (real) vector space V and an inner product ( , ) is called a (real) inner product space. In particular, the pair (JRn , .) is called the Euclidean n-space. Note that by symmetry (1), additivity (2) and homogeneity (3) also hold for the second variable: i.e.,

(2') (x, Y+ z) = (x, y) (3') (x, ky) = k(x, y}.

+

(x, z),

It is easy to show that (0, y) = 0(0, y) = 0 and also (x, O) = O. Remark: In Definition 5.1, the rules (2) and (3) mean that the inner product is linear for the first variable; and the rules (2') and (3') above mean that the inner product is also linear for the second variable. In this sense, the inner product is called bilinear. Example 5.1 (Non-Euclidean inner product on JRz) For any two vectors x = (XI, xz) and y = (YI , yz) in JR z , define

5.1. Dot products and inner products {x, y} =

ax \y\

=

+

[XJX2]

159

C(X\Y2 + X2Y\ ) + bX2Y2

[~ ~] [ ~~ ] =

T

x Ay,

where a, band C are arbitrary real numbers. Then , this function { , } clearly satisfies the first three rules of the inner product , i.e., {, } is symmetric and bilinear. Moreover, if a > and det A = ab - c2 > hold, then it also satisfies rule (4) , the positive definiteness of the inner product. (Hint: The equation {x , x} = ax ?+ 2cxI x2+bxl :::: if and only if either X2 = or the discriminant of (x , x} / is nonpositive. ) In the case of C = 0, this reduces to (x, y} = aXIYI + bX2Y2. Notice also that a = (ej , ej}, b = {e2, e2} and c = {el, e2} = {e2, e.) , 0

°

°

°

°

xi

Problem 5.1 In Example 5.1, the conver se is also true : Prove that if (x, y) product in ]R2, then a > 0 and ab - c2 > O.

= x T Ay is an inner

Example 5.2 (Case ofx f. 0 f. Y but {x, y} = 0) Let V = C [0, 1] be the vector space of all real-valued continuous functions on [0, 1]. For any two functions f and g in V , define {f, g} =

1\

f (x) g(x)dx .

Then { , } is an inner product on V (verify this). Let

f(x) = Then

(

° s x ::: i,

I - 2x if . I If 2

°

f i= 0 i=

sxs

I,

g(x) =

and



if 2x - I if

°s

x

s i,

i s x s I.

o

g, but (f, g ) = O.

By a subspace W of an inner product space V , we mean a subspace of the vector space V together with the inner product that is the restriction of the inner product on Vto W. Example 5.3 (A subspace as an inner product space) The set W = D 1[0, 1] of all real-valued differentiable functions on [0, 1] is a subspace of V = qo, 1]. The restriction to W of the inner product on V defined in Example 5.2 makes W an inner product subspace of V. However, one can define another inner product on W by the following formula: For any two functions f(x) and g(x) in W, {{f, g}} =

1

1

f(x)g(x )dx

+

1

1

f '( x)g'(x)dx.

Then {{, }} is also an inner product on W , which is different from the restriction to W of the inner product of V, and hence W with this new inner product is not a subspace of the inner product space V. 0

160

Chapter 5. Inner Product Spaces

Remark: From vector calculus, most readers might be already familiar with the dot product (or the inner product) and the cross product (or the outer product) in the 3-space jR3. The concept of this dot product is extended to a higher dimensional Euclidean space jRn in this section. However, it is known in advanced mathematics that the cross product in the 3-space jR3 cannot be extended to a higher dimensional Euclidean space jRn. In fact, it is known that if there is a bilinear function I : jRn X jRn -+ jRn; I(x, y) = x x y satisfying the property: x x y is perpendicular to x and y, and IIx x yll2 IIxll 211yll2 - (x . y)2, then n 3 or 7. Hence, the cross product or the outer product will not be introduced in linear algebra.

=

=

5.2 The lengths and angles of vectors In this section, we study a geometry of an inner product space by introducing a length , an angle or a distance between two vectors. The following inequality will enable us to define an angle between two vectors in an inner product space V. Theorem 5.1 (Cauchy-Schwarz inequality)

If x and yare vectors in an inner

product space V , then (x, y)2 :5 (x, x){y, y). Proof: If x

= 0, it is clear. Assume x i= O. For any scalar t , we have 0:5 (tx

+

y, tx + y) = (x, x)t 2 + 2{x, y)t

+

(y, y).

This inequality implies that the polynomial (x, x)t 2 + 2{x, y)t + (y, y) in t has either no real roots or a repeated real root. Therefore, its discriminant must be nonpositive: (x, y)2 - (x, x){y, y) :5 0,

o

which implies the inequality.

Problem 5.2 Prove that the equalityin the Cauchy-Schwarz inequality holdsif and onlyif the vectorsx and yare linearlydependent.

The lengths of vectors and angles between two vectors in an inner product space are defined in a similar way to the case of the Euclidean n-space. Definition 5.2 Let V be an inner product space. (1) The magnitude [x] (or the length) of a vector x is defined by

[x] = J{x, x).

5.2. The lengths and angles of vectors

161

(2) The distance d(x , y) between two vectors x and Yis defined by d(x, y) = [x - YII.

(3) From the Cauchy-Schwarz inequality, we have -1 :s lI~illr;1I :s 1 for any two nonzero vectors x and y. Hence, there is a unique number () E [0, rr] such that

cos e

(x, y}

= IIxlillYIl

or (x,y}

= IIxIiIlYllcos(}.

Such a number () is called the angle between x and y. For example, the dot product in the Euclidean 3-space ]R3 defines the Euclidean distance in ]R3. However, one can define infinitely many non-Euclidean distances in ]R3 as shown in the following example. Example 5.4 (Infinitely many different inner products on ]R2 or ]R3) (1) In ]R2 equipped with an inner product (x, y} = 2X1Yl + 3X2Y2, the angle between x = (1, 1) and Y = (1,0) is computed as (x, y} 2 cos () = - - = - - = 0.6324 .. ..

J5:2

IIxllllYl1

Thus, () = cos"! (k) . Notice that in the Euclidean 2-space]R2 with the dot product, the angle between x = (1,1) and Y = (1,0) is clearly ~ and cos ~ = ~ = 0 .7071· . '. It shows that the angle between two vectors depends actually on an inner

[d

product on a vector space. (2) For any diagonal matrix A =

1

0

o

0 0]

da

0

0

d3

with all d, > 0,

defines an inner product on]R3. Thus, there are infinitely many different inner products on ]R3 . Moreover, an inner product in the 3-space]R3 may play the roles of a ruler and a protractor in our physical world ]R3. 0 Problem 5.3 In Example 5.4(2), show that x T Ay cannot be an inner product if A has a negative diagonal entry d; < O. Problem 5.4 Prove the following properties of length in an inner product space V : For any vectors x, y

E

V,

(1) [x] ~ 0, (2) [x] = 0 if and only if x (3) IIkxll = Iklllxll, (4) IIx+YII::::: [x] + IIYII

= 0, (triangle inequality).

162

Chapter 5. Inner Product Spaces

Problem 5.5 Let V be an inner product space. Show that for any vectors x, y and z in V , (I) d(x, y) 2: 0,

(2) d(x, y) = 0 if and only if x (3) d(x, y) = dey, x), (4) d(x, y) ::: d(x, z) + d(z , y)

= y, (triangle inequality).

Definition 5.3 Twovectorsx and Yin aninnerproductspacearesaid to beorthogonal (or perpendicular) if (x, y) = O. Note that for nonzero vectors x and y, (x, y)

= 0 if and only if 0 = IT /2.

Lemma 5.2 Let V be an inner product space and let x E V . Then, the vector x is orthogonal to every vector y in V (i.e., (x, y) 0 for all y in V) ifand only ifx O.

=

=

Proof: Ifx = 0, clearly (x, y) = 0 for all y in V . Suppose that (x, y) in V. Then (x, x) = 0, implying x = 0 by positive definiteness.

= 0 for all Y 0

Corollary 5.3 Let V be an inner product space, and let a = {Vj, . . . , vn } be a basis for V . Then, a vector x in V is orthogonal to every basis vector Vi in a ifand only if

x=o.

Proof: If (x, Vi) = 0 for i y = L:?=j YiV i E V.

= 1, 2,

. . . , n, then (x, y)

= L:?=j Yi (x, Vi) = 0 for any 0

Example 5.5 (Pythagorean theorem) Let V be an inner product space, and let x and y be any two nonzero vectorsin V with the angle e. Then, (x, y) = IIxlillYIl cosO gives the equality IIx + yf = IIxli z + lIyllZ + 211xllllYil cos e . Moreover, it deduces the Pythagorean theorem: IIx + yliZ = IIxli z + lIyllZ for any orthogonal vectors x and y. 0 Theorem 5.4 [fxj, Xz , .. . , Xk are nonzero mutually orthogonal vectors in an inner product space V (i.e., each vector is orthogonal to every other vector), then they are linearly independent. Proof: Suppose CjXj

o

+

CZXz

=

qXk

= O. Then for each i = 1,2, ... , k,

+ . ..+ qXk , Xi) Cj (Xj,Xi) + . .. + Ci(Xi , Xi} + . .. + =

(0, Xi)

=

+ . .. + (CjXj

q(Xk,

Xi}

2

cillxiU ,

because xi , Xz, . .. , Xk are mutually orthogonal.Since each Xi is not the zero vector, IIxdl ;6 0; so ci = 0 for i = 1, 2, .. . , k. 0

5.3. Matrix representations of inner products

r

163

Problem 5.6 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove (1) (2)

[fJ f(x)g(x)dx s [fJ f2(X)dx] [fJ g2(x)dx]. [fJ (f(x) + g(x»2dx] ~ ~ [fJ f 2(x)dx ] ~ + [fJ g2(x)dx] ~ . 1

I

I

Problem 5.7 Let V = C [0, 1] be the inner product space of all real-valued continuous functions on [0, 1] equipped with the inner product (f, g)

= 10

For the following two functions numbers k, t, (1) f(x) (2) f(x) (3) f(x)

f

1

f(x)g(x) dx for any f and gin V.

and g in V, compute the angle between them: For any natural

= kx and g(x) = lx, = sin 211'kx andg(x) = sin 211'lx, = cos 211'kx and g(x) = cos 211'lx.

5.3 Matrix representations of inner products Let A be an n x n diagonal matrix with positive diagonal entries. Then one can show that (x, y) = x T Ay defines an inner product on the n-space IRn as shown in Example 5.4(2). The converse is also true: every inner product on a vector space can be expressed in such a matrix product form. Let (V, ( ,)) be an inner product space, and let a = {VI , V2, ... , vn } be a fixed ordered basis for V. Then for any x = L:?=I xiv, and Y = LJ=l YjV j in V, n

(x, y) =

n

L :~~>i Yj (Vi,

V j)

i=1 j=1

holds. If we set aij = (Vi, V j) for i, j = I, 2, . . . , n, then these numbers constitute a symmetric matrix A = [aij], since (Vi, V j) = (v j, Vi) ' Thus, in matrix notation, the inner product may be written as n

n

(x,y) = LLxiYjaij = [xlrA[yla. i=l j=l

The matrix A is called the matrix representation of the inner product ( , ) with respect to the basis a.

Example 5.6 (Matrix representation of an inner product) (1) With respect to the standard basis {el,~, . .. , en} for the Euclidean nspace IRn , the matrix representation of the dot product is the identity matrix, since

164

Chapter 5. Inner Product Spaces

ei . ej = Dij . Thus, for x the matrix product x T y:

= Li Xiei and Y = Lj Yjej

ERn, the dot product is just

(2) On V = Pz([O, 1]), we define an inner product on Vas (f, g) =

i

l

f(x)g(x)dx.

Then for a basis a = (f1(X) = 1, fz(x) = x, h(x) find its matrix representation A = [aijl: For instance, aZ3

= (fz, 13) =

i

l

o

fz(x)h(x)dx

=

= xZ} for V,

11 0

x . xZdx

one can easily

1 = -. 4

D

The expression of the dot product as a matrix product is very useful in stating or proving theorems in the Euclidean space. For any symmetric matrix A and for a fixed basis a, the formula (x, y) = [xlr A[Yla seems to give rise to an inner product on V . In fact, the formula clearly is symmetric and bilinear, but does not necessarily satisfy the fourth rule, positive definiteness. The following theorem gives a necessary condition for a symmetric matrix A to give rise to an inner product. Some necessary and sufficient conditions will be discussed in Chapter 8. Theorem 5.5 The matrix representation A of an inner product (with respect to any basis) on a vector space V is invertible. That is, det A =P O. Proof: Let (x, y) = [xlr A[Yla be an inner product in a vector space V with respect to a basis a, and let A[Yla = 0 as a homogeneous system of linear equations. Then

(Y, y) = [YlrA[Yla = O. It implies that A[Yla = 0 has only the trivial solution Y = 0, or equivalently A is invertible by Theorem 1.9. D

Recall that the conditions a > 0 and det A = ab - c Z > 0 in Example 5.1 are sufficient for A to give rise to an inner product on R Z•

5.4 Gram-Schmidt orthogonalization The standard basis for the Euclidean n-space Rn has a special property: The basis vectors are mutually orthogonal and are of length 1. In this sense, it is called the

5.4. Gram-Schmidt orthogonalization

165

rectangular coordinate system for jRn. In an inner product space, a vector with length I is called a unit vector. If x is a nonzero vector in an inner product space V, the vector

II~II x is a unit vector. The process

of obtaining a unit vector from a

nonzero vector by multiplying the reciprocal of its length is called a normalization. Thus, if there is a set of mutually orthogonal vectors (or a basis) in an inner product space, then the vectors can be converted to unit vectors by normalizing them without losing their mutual orthogonality. Definition 5.4 A set of vectors xr, X2, . .. , Xk in an inner product space V is said to be orthonormal if (orthogonality) , (normality) .

A set [xj , X2, . . . , xn } of vectors is called an orthonormal basis for V if it is a basis and orthonormal. Problem 5.8 Determine whether each of the following sets of vectors in ]R2 is orthogonal, orthonormal, or neitherwith respectto the Euclidean inner product. (I) {[

~ ] , [ ~ ]}

(3) {[

~

l [-~ ]}

(4)

I/J2 ] [-I/J2]} {[ 1/J2 ' 1/J2

It will be shown later in Theorem 5.6 that every inner product space has an orthonormal basis , just like the standard basis for the Euclidean n-space jRn. The following example illustrates how to construct such an orthonormal basis. Example 5.7 (How to construct an orthonormal basis ?) For a matrix

find an orthonormal basis for the column space C(A) of A. Solution: Let Cl , C2 and C3 be the column vectors of A in the order from left to right. It is easily verified that they are linearly independent, so they form a basis for the column space C(A) of dimension 3 in jR4. For notational convention, we denote by Spanlx}, .. . ,xd the subspace spanned by {Xl, . . . , xd . (1) First normalize cl to get Cl

ul = ~ =

Cl

(I 1I I)

'2 = 2' 2' 2' 2 '

166

Chapter S. Inner Product Spaces

which is a unit vector. Then Spanluj] =Span{cI}, because one is a scalar multiple of the other. (2) Noting that the vector Cz - (UI, CZ}UI = Cz - 2uI = (0, 1, -1, 0) is a nonzero vector orthogonal to UI , we set

Then, {UI, uz} is orthonormal and Spanlu}, uz} = Spanjc}, cz}. because each u, is a linear combination of CI and C2, and the converse is also true. (3) Finally, note that C3 - (UI, C3}UI - (U2, C3}UZ = C3 - 4uI + J2uz = (0, 1, 1, -2) is also a nonzero vector orthogonal to both UI and uz. In fact, (UI, C3-4uI+Y'2uZ) =

(UI, c3}-4(UI,UI}+Y'2(UI, uz}=O,

(U2, C3-4uI+Y'2uZ) =

(uz, c3}-4(uz, UI}+Y'2(UZ,UZ} =0.

By the normalization , the vector

is a unit vector, and one can also show that Spanlu}, Uz, U3) = Spanlcj , Cz , C3} = C(A) . Consequently, {UI , Uz, U3} is an orthonormal basis for C(A).

0

The orthonormalization process in Example 5.7 indicates how to prove the following general case, called the Gram-Schmidt orthogonaIization.

Theorem 5.6 Every inner product space has an orthonormal basis. Proof: [Gram-Schmidt orthogonalization process] Let {XI,Xz, .. . , xnl be a basis for an n-dimensional inner product space V . Let XI UI = IIxIII'

Of course, Xz - (xz, UI}UI =f. 0, because {XI , xz} is linearly independent. Generally, one can define by induction on k = 1,2, . . . , n,

Then, as Example 5.7 shows, the vectors UI, Uz, ... , Un are orthonormal in the n-dimensional vector space V . Since every orthonormal set is linearly independent, 0 it is an orthonormal basis for V .

5.4. Gram-Schmidt orthogonalization

167

Problem 5.9 Use the Gram-Schmidt orthogonalization on the Euclidean space R" to transform the basis

{(O, 1, 1, 0), (-I, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)} into an orthonormal basis.

Problem5.10 Find an orthonormal basis for the subspace W of the Euclidean space x + 2y - z = O.

]R3 given

by

Problem 5.11 Let V = C[O, 1] with the inner product (f,g)

= 10

1

f(x)g(x)dx for any fandgin V.

Find an orthonormal basis for the subspace spanned by 1, x and x 2 .

The next theorem shows that an orthornormal basis acts just like the standard basis for the Euclidean n-space jRn. Theorem 5.7 Let {u I, U2, ... , Uk} be an orthonormal basis for a subspace V in an inner product space V. Then, for any vector x in V ,

Proof: For any vector x E V, one can write x = XIUI + X2U2 + I, linear combination of the basis vectors. However, for each i

=

(Uj,x)

=

(u., XI

= because

XIUI+ " '+XkUk)

(u. , uj )

+ . ..+

Xj(Uj, Uj)

+ ...+

xt{Uj,

+

XkUk,

as a

, n,

Uk)

xi ,

{UI, U2, . , . , Uk}

o

is orthonormal.

In particular, if a = {v], V2, ... , vn } is an orthonormal basis for V, then any vector x in V can be written uniquely as x = (VI, X)VI

+

(V2, X)V2

+ .. .+

(v n , x)v n .

Moreover, one can identify an n-dimensional inner product space V with the Euclidean n-space jRn. Let a = {VI, V2, . • . , vn } be an orthonormal basis for the space V. With this orthonormal basis a, the natural isomorphism : V....-+ jRn given by (Vj) = [v;]a = ej, i = I, 2, . . . , n preserves the inner product on vectors: For a vector x = I:7=1 xiv, in V with x; = (x, Vj), the coordinate vector ofx with respect to a is a column matrix

168

Chapter 5. Inner Product Spaces

Moreover, for another vector y =

2:7=1 YiVj in V,

The right-hand side of this equation is just the dot product of vectors in the Euclidean space JRn. That is, (x, y) = [x]~[yltx = (x) · (y) for any x, y E V. Hence, the natural isomorphism preserves the inner product, and we have the following theorem (compare with Corollary 4.8(1». Theorem 5.8 Any n-dimension inner product space V with an inner product (, ) is isomorphic to the Euclidean n-space JRn with the dot product ., which means that an

isomorphismpreserves the inner product. In this sense, someone likes to restrict the study of an inner product space to the case of the Euclidean n-space JRn with the dot product. A special kind of linear transformation that preserves the inner product such as the natural isomorphism from V to JRn plays an important role in linear algebra, and it will be studied in detail in Section 5.8.

5.5 Projections Let U be a subspace of a vector space V . Then, by Corollary 3.13 there is another subspace W of V such that V = U EB W, so that any x E V has a unique expression as x = u + w for u E U and w E W. As an easy exercise, one can show that a function T : V ~ V defined by T(x) = T(u + w) = u is a linear transformation, whose T(V) is the subspace U and kernel Ker(T) is the subspace W. image Im(T)

=

Definition 5.5 Let U and W be subspaces of a vector space V . A linear transformation T : V ~ V is called the projection of V onto the subspace U along W if V = U EB W and T(x) = u for x = u+w E U EB W. Example 5.8 (Infinitely many differentprojectionsofJR2 onto the x -axis) Let X, Y and Z be the l-dimensional subspaces of the Euclidean 2-space JR2 spanned by the vectors ej , e2, and v = el + e2 = (1, 1), respectively: X =

{reI : r

E

JR} = x-axis,

Y =

{re2 : r

E

1R} = y-axis ,

Z =

{reel

+

e2) : r E JR}.

5.5. Projections

169

Since the pairs leI, ez} and [ej , v} are linearly independent, the space ]R2 can be expressed as the direct sum in two ways : ]R2 = X $ Y = X $ Z .

y

Ty(x)

= (0, 1)

z _____ x

TX(x)

= (2, 1)

Sz(x)

= (1, 1)

x

x

= (2,0)

Figure 5.1. Two decompositions of]R2

Thus, a vector x = (2, 1) E ]R2 may be written in two ways:

x = (2, 1) = { 2(1,0) + (0, 1)

E X $ Y = ]R2 i (1,0) + (1,1) E X $ Z = ]R.

Let

or

Tx and Sx denote the projections of]R2 onto X along Y and Z, respectively. Then Tx(x) = 2(1,0) E X, Sx(x) =

(1,0)

E

X,

Ty(x) = (0, 1) E Y, and

Sz(x) = (1, 1) E Z.

This shows that a projection of ]R2 onto the subspace X depends on a choice of complementary subspace of X. For example, by choosing Zn = {r(n, I) : r E R] for any integer n as a complementary subspace, one can construct infinitely many different projections of ]R2 onto the x -axis. 0 Note that for a given subspace U of V, a projection T of V onto U depends on the choice of a complementary subspace W of U as shown in Example 5.8. However, by definition, T(u) = u for any u E U and for any choice of W. That is, ToT = T for every projection T of V . The following theorem shows an algebraic characterization of a linear transformation to be a projection.

Theorem 5.9 A linear transformation T : V T = T 2 (= ToT by definition).

~

=

V is a projection

if and only if

Proof: The necessity is clear, because ToT T for any projection T. For the sufficiency, suppose T 2 = T . It suffices to show that V = Im(T)$Ker(T) and T(u + w) = u for any u + w E Im(T) $ Ker(T) . First, one needs to prove Im(T) n Ker(T) = (OJ and V = Im(T) + Ker(T). Indeed, if y E Im(T) n Ker(T) , then there exists x E V such that T(x) = yand T(y) = O. It implies

170

Chapter 5. Inner Product Spaces

y = T(x)

= T 2(x) = T(T(x» = T(y) = O.

The hypothesis T 2 = T also shows that T(v) E Im(T) and v - T(v) E Ker(T) for any v E V . It implies V = Im(T) + Ker(T). Finally, note that T(o + w) = T(o) + T(w) = T(o) = 0 for any 0 + WE Im(T) EEl Ker(T). 0 Let T : V -+ V be a projection, so that V = Im(T) EEl Ker(T). It is not difficult to show that Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T) for the identity transformation idv on V. Corollary 5.10 A linear transformation T : V -+ V is a projection if and only if idv - T is a projection. Moreover, if T is the projection of V onto a subspace U along W, then idv - T is the projection of V onto W along U. 0 Proof: It is enough to show that (idv - T) (idv - T)

0

0

(idv - T) = idv - T. But

(idv - T) = (idv - T) - (T - T 2 ) = idv - T.

0

Problem 5.12 For V = U E9 W, let Tv denote the projection of V onto U along W, and let Tw denote the projection of V onto W along U. Prove the following.

(1) (2) (3) (4)

For any x E V , X = TV (x) + Tw(x) . Tv 0 (idv - Tv) = O. Tv 0 Tw = Tw 0 TV = O. For any projection T : V ~ V, Im(idv - T)

= Ker(T) and Ker(idv -

T)

= Im(T).

5.6 Orthogonal projections Let U be a subspace of a vector space V. As shown in Example 5.8, we learn that there are infinitely many projections of V onto U which depend on a choice of complementary subspace W of U. However, if V is an inner product space, there is a particular choice of complementary subspace W, called the orthogonal complement of U, along which the projection onto U is called the orthogonal projection, defined below. To show this, we first extend the orthogonality of two vectors to an orthogonality of two subspaces. Definition 5.6 Let U and W be subspaces of an inner product space V.

=

(1) Two subspaces U and W are said to be orthogonal, written U .1 W, if (0, w) 0 for each 0 E U and w E W. (2) The set of all vectors in V that are orthogonal to every vector in U is called the orthogonal complement of U, denoted by u-, i.e.,

u- = {v E V : (v,o) = 0 for all 0

E U}.

5.6. Orthogonal projections

171

One can easily show that U.i is a subspace of V , and v E u» if and only if (v, u) = 0 for every U E f3, where f3 is a basis for U. Moreover, W ..L U if and only if W ~

o-.

Problem 5.13 Let U and W be subspaces of an inner product space V. Show that

(1) If U .1 W,

un W = {OJ.

w-

(2) U ~ W if and only if

~

u-.

Theorem 5.11 Let U be a subspace ofan inner product space V . Then (1) (U.i).i = U. (2) V = U EEl U.i : that is, for each x E V, there exist unique vectors Xu E U and Xu.l E U.isuchthatx = xU+XuL This is called the orthogonal decomposition of V (or of X) by U.

Proof: Let dim U = k. To show (U.i).i = U, take an orthonormal basis for U, say a = {VI,V2, ... , vd, by the Gram-Schmidt orthogonalization, and then extend it to an orthonormal basis for V, say f3 = {VI,V2,"" Vb Vk+I, ... , Vn }, which is always possible. Then, clearly y = {Vk+I , " " vn } forms an (orthonormal) basis for which means that (U.i).i = U and V = U EEl 0

o-,

o-.

Definition 5.7 Let U be a subspace of an inner product space V, and let {UI, U2 , . .. , urn} be an orthonormal basis for U. The orthogonal projection Proju from V onto the subspace U is defined by Proju(x) = (UI, X)UI + (U2 , X)U2

+ .. . +

(urn, x)u m

for any x E V. Clearly, Proju is linear and a projection, because Proju 0 Proju over, Proju(x) E U and x - Proju(x) E u-, because (x - Proju(x), u.)

= (x, OJ)

-

(Proju(x), u.) = (x, OJ)

-

= Proju. More-

(Uj,x) = 0

for every basis vector u. . Hence, by Theorem 5.7, we have Corollary 5.12 The orthogonal projection Proju is the projection of V onto a subspace U along its orthogonal complement u-. Therefore, in Definition 5.7, the projection Proju(x) is independent of the choice of an orthonormal basis for the subspace U. In this sense, it is called the orthogonal projection from the inner product space V onto the subspace U . Almost all projections used in linear algebra are orthogonal projections.

172

Chapter 5. Inner Product Spaces

Example 5.9 (The orthogonal projectionfrom JR3 onto the xy-plane) In the Euclidean 3-space JR3, let U be the xy-plane with the orthonormal basis ex = [ej , fl}. Then, the orthogonal projection Proju(x) = (ej , x)el + (fl, X)fl is the orthogonal projection onto the xy-plane in a usual sense in geometry, and x- Proju (x) E U 1.., which is the zaxis. It actually means that Proju(XI, X2, X3) = (Xl , X2, 0) for any x = (XI,X2 , X3) E ~. 0 Example 5.10 (The orthogonal projection from JR2 onto the x-axis) As in Example 5.8, let X, Y and Z be the I-dimensional subspaces of the Euclidean 2-space JR2 spanned by the vectors eJ, e2, and v = el + fl = (1, I), respectively. Then clearly Y = Xl.. and V :f:. Xl... And, for the projections Tx and Sx ofJR2 given in Example 5.8, Tx is the orthogonal projection , but Sx is not, so that Tx Proh and SX:f:. Proh . 0

=

Theorem 5.13 Let U be a subspace of an inner product space V , and let x E V. Then, the orthogonal projection Proju(x) ofx satisfies IIx - Proju(x)1I ::: IIx - YII for all y E V . The equality holds ifand only ify = Proju(x).

Proof: First, note that for any vector x E V, we have Proju(x) E V and x Proju(x) E Thus, for all y E V,

u-.

IIx -

yf

= = >

+ (Proju(x) _ Y)1I 2 2 IIx - Proju(x) 11 + IIProju(x) _ yll2 IIx - Proju (x) 11 2, II (x - Proju(x»

where the second equality comes from the Pythagorean theorem for the orthogonality 0 (x - Proju(x» .L (Proju(x) - y) . (See Figure 5.2.)

It follows from Theorem 5.13 that the orthogonal projection Proju(x) ofx is the unique vector in U that is closest to x in the sense that it minimizes the distance from x to the vectors in V. It also shows that in Definition 5.7, the vector Proju(x) is independent of the choice of an orthonormal basis for the subspace U . Geometrically, Figure 5.2 depicts the vector Proju(x) .

°

Problem 5.14 Find the point on the plane x - y - z = that is closest to p

= (1, 2,

0).

Problem 5.15 Let U C ]R4 be the subspace of the Euclidean 4-space ]R4 spanned by (1, 1, 0, 0) and (1, 0, 1, 0), and let We ]R4 be the subspace spanned by (0, 1, 0, 1) and (0, 0, I , 1). Find a basis for and the dimension of each of the following subspaces: (1) U

+

W,

(2)

u-,

(3) Ul.. +

w-,

(4)

un w.

5.6. Orthogonal projections

x - Proju(x) E UJ..

x

.. ' ....

..

173

.... X-y

...... ,.:.:!'"

..JV

'

Proju(x)

U

Figure 5.2. Orthogonal projection Proju

Problem 5.16 Let U and W be subspaces of an inner product space V . Show that (I) (U

+

W)J..

= UJ.. n WJ...

(2) (U

n W)J.. = UJ.. +

WJ...

As a particular case, let V = jRn be the Euclidean n-space with the dot product and let U {ru : r E jR} be a l-dimensional subspace determined by a unit vector u. Then for a vector x in jRn, the orthogonal projection of x into U is

=

Proju(x)

= (u - x)u = (u T x)u = u(u Tx) = (uuT)x.

(Here, the last two equalities come from the facts that u . x = u TX is a scalar and the associativity of a matrix product uuT x, respectively.) This equation shows that the matrix representation of the orthogonal projection Proju with respect to the standard basis ex is [Projula = uuT . If U is an m-dimensional subspace of R" with an orthonormal basis {UI , U2 , " " urn}, then for any x E jRn, Proju(x) =

(UI' X)UI

=

Ut(u[x)

=

(UIU;

+

+

(U2 . X)U2

+ . .. +

(urn' x)um

+ u2(ufx) + ... + um(u~x) U2Ur

+ ... +

UmU~)X.

Thus, the matrix representation of the orthogonal projection Proju with respect to the standard basis ex is

Definition 5.8 The matrix representation [Proju la of the orthogonal projection Proju : jRn --+ jRn of jRn onto a subspace U with respect to the standard basis ex is called the (orthogonal) projection matrix on U. Further discussions about the orthogonal projection matrices will be continued in Section 5.9.3.

174

Chapter 5. Inner Product Spaces

=

Example 5.11 (Distancefrom apointto a line) Letax+by+c 0 be a line L in the plane JR2. (Note that the line L cannot be a subspace ofJR2 if c f= 0.) For any two points Q = (Xl. YI) and R = (X2 , Y2) on the line, the equality a(x2 - Xl) + b(Y2 - YI) = 0 ~lies that the nonzero vector D = (a , b) is perpendicular to the line L, that is,

QR..L D. Let P

=

(xo, YO) be any point in the plane JR2 . Then the distance d between the

point P and the line L is simply the length of the orthogonal projection of D, for any point Q = (Xl, YI) in the line. Thus,

d = = = =

QP into

II Projn(QP)II

I(QP·

11:11)1 (the dot product) la(xo - Xl) + b(yo - YI)I Ja2 + b2 laxo + byo + cl Ja2 + b2 p

= (xo. YO)

Figure 5.3. Distance from a point to a line

Note that the last equality is due to the fact that the point Q is on the line (i.e., aXI + bYI + c = 0). To find the orthogonal projection matrix, let u 11:11 ';)+b 2(a, b). Then the orthogonal projection matrix onto U = {ru : r E JR} is

=

uuT = a 2

~ b2 [ :

] [a b] = a 2

=

~ b2 [:; ~~

l

Thus, if x = (1, 1) E JR2, then Proju(x)

1 [ = (uuT)x = -2--2 a +b

2

ab

a

o

5.7. Relations of fundamental subspaces

175

Problem 5.17 Let V = P3(lR) be the vector space of polynomials of degree ~ 3 equipped with the inner product

(f, g)

= 10

1

f(x)g(x) dx for any f and gin V .

Let W be the subspace of V spanned by [I , x}, and define f(x) projection Proj w(f) of f onto W.

= x 2. Find the orthogonal

5.7 Relations of fundamental subspaces

=

We now go back to the study of a system Ax b of linear equations with an m x n matrix A. One of the most important applications of the orthogonal projection of vectors onto a subspace is the decompositions of the domain space and the image space of A by the four fundamental subspacesN(A), 'R(A) in JRn and C(A), N(A T) in JRm (see Theorem 5.16). From these decompositions, one can completely determine the solution set of a consistent system Ax = b.

Lemma 5.14 For an m x n matrix A, the null spaceN(A) and the row space'R(A) are orthogonal: i.e. , N(A) 1- R(A) in ]Rn. Similarly, N(A T) 1- C(A) in ]Rm. Proof: Note that WE N(A) if and only if Aw = 0, i.e., for every row vector r in A, r . W = O. For the second statement, do the same with AT. D From Lemma 5.14 , it is clear that N(A)

S; 'R(A) .L

(or'R(A) S; N(A).L), and

N(A T)

S; C(A).L

(or C(A) S; N(AT).L).

Moreover, by comparing the dimensions of these subspaces and by using Theorem 5.11 and Rank Theorem 3.17, we have dim R(A) + dimN(A)

n

dimC(A)

m =

+

dimN(A T) =

+ dim C(A) +

= dim'R(A)

dim 'R(A).L, dim C(A).L .

This means that the inclusions are actually equalities.

Lemma 5.15 (1) N(A)

=

R(A).L (or'R(A) (2) N(A T) = C(A).L (orC(A) = N(AT).L).

= N(A).L).

We show that the row space 'R(A) is the orthogonal complement of the null space N(A) in R", and vice-versa. Similarly, the same thing happens for the column space C(A) and the null space N (A T) of AT in JRm . Hence, by Theorem 5.11, we have the following orthogonal decomposition.

176

Chapter 5. Inner Product Spaces

Theorem 5.16 For any m x n matrix A,

(1) N (A) EB R(A) = ]Rn , (2) N(A T ) EBC(A) = ]Rm. Note that if rank A = r so that dimR(A) = r = dimC(A), then dimN(A) = n - rand dimN(A T ) = m - r. Considering the matrix A as a linear transformation A : ]Rn -+ ]Rm, Figure 5.4 depicts Theorem 5.16.

A

- - - -b-

N( A) :::: jRn - r Xn

Figure 5.4. Relations of four fundamental subspaces

Corollary 5.17 The set of solutions of a consistent system Ax = + N (A), where Xo is any solution of A x = b.

b is precisely

Xo

Proof: Let Xo E ]Rn be a solution of a system Ax = b. Now consider the set Xo + N(A), which isjust a translation of N(A) by xo. (1) For any vector Xo +0 in Xo +N(A) , it is also a solution because A(xo + 0) = Axo = b . (2) If x is another solution, then clearly x - Xo is in the null space N(A) so that 0 x = Xo + 0 for some 0 E N (A ), i.e., x E Xo + N(A). In particular, if rank A = m (so that m ~ n), then C(A) = ]Rm. Thus, for any b E ]Rm, the system Ax = b has a solution in ]Rn . (This is the case of the existence Theorem 3.24) . On the other hand, if rank A = n (so that n ~ m), then N(A) = (OJ and R (A) = ]Rn. Therefore, the system Ax = b has at most one solution, that is, it has a unique solution x in R(A ) if b E C(A), and has no solution if b rj. C(A). (Th is is the case of the uniqueness Theorem 3.25) . The latter case may occur when m > r = rank A; that is, N (A T) is a nontrivial subspace of]Rm, and will be discussed later in Section 5.9.1.

5.8. Orthogonal matrices and isometries

177

Problem 5.18 Prove the following statements .

(l) If Ax (2) If Ax

= b and AT Y = 0, then yTb = 0, = 0 and AT y = c, then x T c = 0,

i.e., y.l b. i.e., x .1 c.

Problem 5.19 Given two vectors (1 , 2, I, 2) and (0, - I , -1 , 1) in lR4 , find all vectors in ]R4 that are perpendicular to them. Problem 5.20 Find a basis for the orthogonal complement of the row space of A :

(1) A

=

1 28] [ 23 0-1 61 ,

(2) A

=

001] [0 0 I . 111

5.8 Orthogonal matrices and isometries In Chapter 4, we saw that a linear transformation can be associated with a matrix, and vice-versa. In this section , we are mainly interested in those linear transformations (or matrices) that preserve the length of a vector in an inner product space. Let A = [CI . . . cn] be an n x n square matrix, with columns CI, . . • , Cn' Then, a simple computation shows that

A T A=

[--

C~:

- - cT n

- -] [

__

I I

CI

cT

Hence, if the column vectors are orthonormal, C j = 8ij, then AT A = In, that is, AT is a left inverse of A, and vice-versa. Since A is a square matrix , this left inverse must be the right inverse of A, i.e., AA T = In. Equivalently, the row vectors of A are also orthonormal. This argument can be summarized as follows .

Lemma 5.18 For an n x n matrix A, the following are equivalent. (1) (2) (3) (4) (5)

The column vectors of A are orthonormal. AT A = In. AT A-I .

=

AAT=In .

The row vectors of A are orthonormal.

Definition 5.9 A square matrix A is called an orthogonal matrix if A satisfies one (and hence all) of the statements in Lemma 5.18. Clearly, A is orthogonal if and only if AT is orthogonal.

178

Chapter 5. Inner Product Spaces

Example 5.12 (Rotations and reflections are orthogonal) The matrices A = [

c~s 0

smO

- sin 0 ]

cosO

'

B = [

c~s 0

sin 0 ] smO -cosO

are orthogonal, and satisfy A-I

= AT = [ c~sO

B-1 = B T = [

sinO] - smO cosO '

c~sO

sinO] . smO -cosO

Note that the linear transformation T : ]R2 -+ ]R2 defined by T (x) = Ax is a rotation through the angle 0, while S : ]R2 -+ ]R2 defined by Sex) = Bx is the reflection about the line passing through the origin that forms an angle 0/2 with the positive x-axis.D Example 5.13 (All 2 x 2 orthogonal matrices) Show that every 2 x 2 orthogonal matrix must be one of the forms

cosO - sinO] [ sinO cosO

or

cos O sinO] [ sinO - cos e .

= [~ ~] is an orthogonal matrix, so that AA T = h = AT A. The first equality gives a 2 + b 2 = 1, ac + bd = 0, and c 2 + d 2 = 1. The second equality gives a 2 + c 2 = 1, ab + cd = 0, and b 2 + d 2 = 1. Thus, b = ±c. If b = -c, then we get a = d . If b = c, then we get a = -d. Now, choose 0 so that

Solution: Suppose that A

a = cos 0 and b = sin O.

0

Problem 5.21 Find the inverse of each of the following matrices . (1)

1

[

0

o -

0

0 ]

sinO sin 0 cos (}

cosO

,

1/./2 -1/./2 0]

(2)

[ -1/~ -1/~ ~

.

What are they as linear transformations on ]R3 : rotations, reflections , or other?

Problem 5.22 Find eight 2 x 2 orthogonal matrices which transform the square -1 onto itself.

~

x, y

~

1

As shown in Examples 5.12 and 5.13, all rotations and reflections on the Euclidean 2-space ]R2 are orthogonal and preserve intuitively both the lengths of vectors and the angle between two vectors. In fact, every orthogonal matrix A preserves the lengths of vectors :

IIAxll2 = Ax- Ax = (Ax)T (Ax) = x T AT Ax = x T X = IIxf .

5.8. Orthogonal matrices and isometries

179

Definition 5.10 Let V and W be two inner product spaces. A linear transformation T : V -+ W is called an isometry, or an orthogonal transformation, if it preserves the lengths of vectors , that is, for every vector x E V IIT(x)1I = [x]: Clearly, any orthogonal matrix is an isometry as a linear transformation. If T : V -+ W is an isometry, then T is one-to-one, since the kernel of T is trivial: T (x) = 0 implies [x] = IIT(x)1I = O. Thus, if dim V = dim W, then an isometry is also an isomorphism. The following theorem gives an interesting characterization of an isometry . Theorem 5.19 Let T : V -+ W be a linear transformation from an inner product space V to another W. Then, T is an isometry if and only if T preserves inner products, that is, (T(x), T(y)} = (x, y) for any vectors x, yin V.

Proof: Let T be an isometry. Then IIT(x)f = IIxll 2 for any x E V . Hence, (T(x+y), T(x+y )}

= IIT(x+Y)1I 2 = IIx+yll2 = (x+y,x+Y)

for any x, y E V. On the other hand , (T(x + y) , T(x + y)} =

(x + y, x + y) =

(T(x), T(x)} + 2(T(x), T(y)} + (T(y) , T(y)},

(x, x) + 2(x, y} + (y, y),

from which we get (T(x), T(y)} = (x, y). The converse is quite clear by choosing y = x.

o

Theorem 5.20 Let A be an n x n matrix. Then, A is an orthogonal matrix if and only if A : jRn -+ jRn, as a linear transformation, preserves the dot product. That is, for any vectors x, y E jRn, Ax - Ay =x .y. Proof: The necessity is clear. For the sufficiency, suppose that A preserves the dot product. Then for any vectors x, y E jRn,

Ax- Ay Take x

= xT ATAy = xTY = x · y.

= ei and y = ej . Then , this equation is just [AT A]ij = liij.

o

Since d(x , y) = IIx - YII for any x and y in V , one can easily derive the following corollary.

180

Chapter 5. Inner Product Spaces

Coronary 5.21 A linear transformation T : V d(T(x), T(y»

~

W is an isometry

if and only if

= d(x, y)

for any x and yin V. Recall that if () is the angle between two nonzero vectors x and y in an inner product space V, then for any isometry T : V ~ V, cos () =

{x, y} =

IIxlillYIl

{Tx, Ty} .

IITxllllTYIl

Hence, we have Coronary 5.22 An isometry preserves the angle. The converse of Corollary 5.22 is not true in general. A linear transformation = 2x on the Euclidean space jRn preserves the angle but not the lengths of vectors (i.e., not an isometry). Such a linear transformation is called a dilation. We have seen that any orthogonal matrix is an isometry as the linear transformation T (x) = Ax. The following theorem says that the converse is also true, that is, the matrix representation of an isometry with respect to an orthonormal basis is an orthogonal matrix.

T (x)

Theorem 5.23 Let T : V ~ W be an isometry from an inner product space V to another W of the same dimension. Let ex = {v}, .. . , vn } and fJ = {WI , . . . , w n } be orthonormal bases for V and W, respectively. Then, the matrix [T]~ for T with respect to the bases ex and fJ is an orthogonal matrix. Proof: Note that the k-th column vector of the matrix [T]~ is just [T (Vk)],8' Since T preserves inner products and ex, fJ are orthonormal, we get

o

which shows that the column vectors of [T]~ are orthonormal. Remark: In summary, for a linear transformation T : V equivalent:

~

W, the following are

(1) T is an isometry: that is, T preserves the lengths of vectors. (2) T preserves the inner product. (3) T preserves the distance. (4) [T]~ with respect to orthonormal bases ex and fJ is an orthogonal matrix.

Anyone (hence all) of these conditions implies that T preserves the angle, but the converse is not true.

5.9.1. Application: Least squares solutions

Problem 5.23 Find values r > 0, (1) Q =

S

> 0, a > 0, band c such that matrix Q is orthogonal.

° -s2ssa] b , c r

[

181

r

-s a] 3s b

.

-15 c

Problem 5.24 (Bessel's Inequality) Let V be an inner product space, and let {VI, . . . , vm } be a set of orthonormal vectors in V (not necessarily a basis for V) . Prove that for any x in V, IIxll 2 2: L:i"=ll(x, Vi)!2 .

Problem 5.25 Determine whether the following linear transformations on the Euclidean space are orthogonal.

]R3

(1) T(x, y, z)

(2) T(x, y, z)

= (z, :!,fx + iY, ! - :!,fy). 11 12 5 ) = (U5 x + UZ' UY - UZ' x .

5.9 Applications 5.9.1 Least squares solutions In the previous section, we have completely determined the solution set for a system Ax = b when b E C(A). In this section, we discuss what we can do when the system Ax b is inconsistent, that is, when b It C(A) £ IRm • Certainly, there exists no solution in this case, but one can find a 'pseudo' -solution in the following sense. Note that for any vector x in IRn, Ax E C(A) . Hence, the best we can do is to find a vector XO E jRn so that AXQ is the closest to the given vector b E jRm : i.e., II AXO - b II is as small as possible. Such a vector XQ will give us the best approximation Ax to b for all vectors x in IRn , and it is called a least squares solution of Ax = b. To find a least squares solution, we first need to find a vector in C(A) that is closest to b. However, from the orthogonal decomposition IRm = C(A) EEl N(A T ) , any b E IRm has the unique orthogonal decomposition as

=

b = be + b, E C(A) EElN(AT ) = IRm , where be = Prok(A)(b) E C(A) and b n = b - be E N(A T ) . Here, the vector be = PrOk(A) (b) E C(A) has two basic properties:

(I) There always exists a solution XQ E IRn of Ax = be, since be E C(A), (2) be is the closest vector to b among the vectors in C(A) (see Theorem 5.13). Therefore, a least squares solution XQ E IRn of Ax = b is just a solution of Ax = be. Furthermore, if XQ E IRn is a least squares solution, then the set of all least squares solutions is XQ +N(A) by Corollary 5.17. In particular, ifb E C(A), then b = be, so that the least squares solutions are just the 'true' solutions of Ax = b . The second property of be means that a least squares

182

Chapter 5. Inner Product Spaces

solution Xo E jRn of Ax = b gives the best approximation Axo = b, to b : i.e., for any vector x in jRn, IIAxo - b] :::: II Ax - b]. In summary, to have a least squares solution of Ax = b, the first step is to find the orthogonal projection b c = PrOjc(A) (b) E C(A) ofb, and then solve Ax = b c as usual . One can find b c from b E jRm by using the orthogonal projection if we have an orthonormal basis for C(A). But, such a computation of b c could be uncomfortable, because the only way we know so far to find an orthonormal basis for C(A) is the Gram-Schmidt orthogonalization (whose computation may be cumbersome). However, there is a bypass to avoid the Gram-Schmidt orthogonalization. For this, let us examine a least squares solution once again . If XO E jRn is a least squares solution of Ax = b, then

holds since AXO = bc. Thus, AT (Axo-b) = AT (-bn ) = A Tb, that is, XO is a solution of the equation ATAx=ATb.

oor equivalently AT Axo =

I

This equation is very interesting because it is also a sufficient condition for a least squares solution as the next theorem shows, and it is called the normal equation of Ax=b. Theorem 5.24 Let A be an m x n matrix, and let b e jRm be any vector. Then, a vector XO E jRn is a least squares solution of Ax = b ifand only ifxo is a solution of the normal equation AT Ax = ATb. Proof: We only need to show the sufficiency. Let Xo be a solution of the normal equation AT Ax = ATb. Then, AT (Axo - b) = 0, so AXO - b E N(A T). Say, AXO - b = n inN(A T), and let b = b c + b n E C(A) EBN(A T) . Then, Axo - b c·= n-j-b; E N(A T). Since Axo- b c is also contained in C(A) andN(AT)nC(A) = {OJ, AXO = b, = PrOjc(A) (b) , i.e., xo is a least squares solution of Ax = b . 0 Example 5.14 (The best approximated solution of an inconsistent system Ax = b) Find all the least squares solutions of Ax = b, and then determine the orthogonal projection b c ofb into the column space C(A), where

1 -2 2 -3 A = -1 1 [ 3 -5

5.9.1. Application: Least squares solutions

183

Solution: (The reader may check that Ax = b has no solutions).

-1 3] [ ~

2

1 -5

-3

2

-1

and ATb =

0

-1

3

[ 1 2-1 3] [~]~ [ 0] -2 -3 1 -1

1 -5 2 0

From the normal equation, a least squares solution of Ax ATb, i.e.,

[

-1 3

=

.

= b is a solution of A T Ax =

-~~ -;~ -~] [;~] = [ -~ ] . -3

3

6

3

X3

By sol ving this system of equations (left for an exercise), one can obtain all the least squares solutions, which are of the form:

for any number t E JR. Moreover,

Note that the set of least squares solutions is Xo 5/3 O]T andN(A) = {t[5 3 I f : t E JR}.

+

N(A ), where

=

[

10 2]

o

-1

-1

2 1

2

2

-1

0

'

b

= [-8/3 0

Problem 5.26 Find all least squares solutions x in ]R3 of Ax

A

xo

=

= b, where

[3] -3

O

'

-3

Note that the normal equation is always consistent by the construction, and, as Example 5.14 shows, a least squares solution can be found by the Gauss-Jordan elimination even though AT A is not invertible.lfb E C(A) or, even better, if the rows of A are linearly independent (thus, rank A = m andC(A) = JRm), then be = b so that

184

Chapter 5. Inner Product Spaces

the system Ax = b is always consistent and the least squares solutions coincide with the true solutions.Therefore,for any givensystem Ax = b, consistentor inconsistent, by solving the normalequation AT Ax = ATb, one can obtaineither the true solutions or the least squares solutions. If the square matrix AT A is invertible,then the normal equation A T Ax = ATb of the system Ax = b resolves to x = (AT A)-I ATb, which is a least squares solution. In particular, if AT A = In , or equivalently the columns of A are orthonormal (see Lemma5.18), then the normal equationreduces to the leastsquaressolution x = ATb. The following theorem gives a condition for A T A to be invertible. Theorem 5.25 Forany m x n matrix A, AT A is a symmetric n x n square matrix and rank (A T A) = rank A. Proof: Clearly, AT A is square and symmetric. Since the number of columns of A and AT A are both n, we have

rank A + dimN(A) = n = rank (AT A)

+ dimN(A T A).

Hence, it sufficesto showthat.V(A) = N(A T A) sothatdimN(A) = dimN(A T A). It is trivial to see that N (A) S; N (A T A), since Ax = 0 implies AT Ax = O. Conversely, suppose that AT Ax = O. Then Ax - Ax

= (Ax)T (Ax) = x T (AT Ax) = xTO = O.

Hence Ax = 0, and x E N(A) .

o

It follows from Theorem 5.25 that AT A is invertible if and only if rank A = n: that is, the columns of A are linearly independent. In this case, N(A) = {OJ and so the system Ax = b has a unique least squares solution XQ in 'R(A) = JRn , which is

This can be summarized in the following theorem: Theorem 5.26 Let A be an m x n matrix. If rank A = n, or equivalently the columns of A are linearlyindependent, then

(1) AT A is invertible so that (AT A)-I AT is a left inverse of A, (2) the vector xo = (AT A)-I ATb is the unique least squares solution of a system Ax = b, and (3) AXO = A(A T A)-IATb = be = ProjC(A)(b), that is, the orthogonal projection ofJRm onto C(A) is PrOjC(A) = A(A T A)-lAT. Remark: (1) For an m x n matrix A, by applying Theorem 5.26 to AT, one can say that rank A = m if and only if AA T is invertible. In this case AT (AAT)-I is a right inverse of A (cf. Remark after Theorem 3.25). Moreover, AA T is invertible if and only if the rows of A are linearly independent by Theorem 5.25.

5.9.1. Application: Least squares solutions

185

(2) If the matrix A is orthogonal, then the columns UI , . . . , Un of A form an orthonormal basis for the column space C(A ) , so that for any be lRm , be = (UI . bju,

+ . ..+

(Un'

+ . ..+

b)u n = (uluf

unu~)b

and the projection matrix is T . Pr oJC(A) = UIUI

+ ... +

T

unun·

In fact, this result coincides with Theorem 5.26: If A is orthogonal so that AT A = In, then

and the least squares solution is UF - - ]

: u~ --

_ [ UI.· b ]

b-:

Un'

,

b

which is the coordinate expression ofAXo = be = PrOk(A)(b) with respect to the orthonormal basis {UI , ... , un} forC(A). In general, the columns of A need not be orthonormal, in which case the above formula is not possible. In Section 5.9.3, we will discuss more about this general case. (3) If rank A = r < n, one can reduce the columns of A to a basis for the column space and work with this reduced matrix A (thus, AT A is invertible) to find the orthogonal projection PrOk(A) = A (A T A)-I AT of R" onto the column space C(A) . However, the least squares solutions of Ax = b should be found from the original normal equation directly, since the least squares solution Xo = (AT A)-I ,4Tb of Ax = b has only r components so that it cannot be a solution of Ax = be. Example 5.15 (Solving an inconsistent system Ax = b by thenormalequation) Find the least squares solutions of the system:

Determine also the orthogonal projection be of b in the column space C(A) . Solution: Clearly, the two columns of A are linearly independent and C(A) is the xy-plane. Thus , b ¢ C(A ). Note that

186

Chapter 5. Inner Product Spaces

which is invertible . By a simple computation one can obtain

Hence,

is a least squares solution, which is unique sinceN (A) = O. The orthogonal projection ofb in C(A) is

2] [

1 be = Ax = [ ~ ~

14/3] [ 4] -1/3 = ~ .

o

Problem 5.27 Find all the least squares solutions of the following inconsistent system of linear equations:

5.9.2 Polynomial approximations In this section, one can find a reason for the name of the "least squares" solutions, and the following example illustrates an application of the least squares solution to the determination of the spring constants in physics . Example 5.16 Hooke's law for springs in physics says that for a uniform spring, the length stretched or compressed is a linear function of the force applied, that is, the force F applied to the spring is related to the length x stretched or compressed by the equation F=a+kx , where a and k are some constants determined by the spring . Suppose now that, given a spring of length 6.1 inches, we want to determine the constants a and k under the experimental data: The lengths are measured to be 7.6, 8.7 and lOA inches when forces of 2, 4 and 6 kilograms , respectively, are applied to the spring . However, by plotting these data (x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (lOA , 6),

5.9.2. Application: Polynomial approximations

187

in the x F -plane, one can easily recognize that they are not on a straight line of the form F a + kx in the x F -plane, which may be caused by experimental errors. This means that the system of linear equations

=

r F2 F3 F4

= = = =

a + 6.lk a +7.6k a + 8.7k a + lOAk

= = = =

0 2 4

6

is inconsistent (i.e., has no solutions so the second equality in each equation may not be a true equality). It means that if we put b = (0,2,4,6) and F = (FI, F2, F3, F4) as vectors in R 4 representing the data and the points on the line at Xi'S, respectively, then II b - F I is not zero. Thus, the best thing one can do is to determine the straight line a + kx = F that 'fits' the data best: that is, to minimize the sum of the squares of the vertical distances from the line to the data (Xi, Yi) for i = I, 2, 3, 4 (See Figure 5.5) (this is the reason why we say least squares): (0 - FI)2

+

(2 - F2)2 + (4 - F3)2 + (6 - F4)2 = lib - F1I 2 .

Thus, for the original inconsistent system

Ax

=

6.1] [~] = [ 0] I~ ~:~ ~ =b ~

[I

lOA

C(A),

6

we are looking for F E C(A), which is the projection of b onto the column space C(A) of A, and the least squares solution xo, which satisfies Axo = F. F 10 8 6 4 2

Figure 5.5. Least squares fitting

It is now easily computed as (by solving the normal equation A T Ax = A Tb)

[ ~ ] = x = (AT A)-I ATb = [ -~:~ ] . It gives F = -8.6 + lAx .

o

188

Chapter 5. Inner Product Spaces

In general, a common problem in experimental work is to obtain a polynomial in two variables x and Y that best 'fits' the data of various values of Y determined experimentally for inputs x, say Y

= f(x)

plotted in the xy-plane. Some possible fitting polynomials are (1) by a straight line: y = a + bx, (2) by a quadratic polynomial : y = a + bx + cx 2 , or (3) by a polynomial of degree k: y = ao + alx + ...+ akxk, etc.

=

As a general case, suppose that we are looking for a polynomial y f(x) = ao + alX + a2x2 + . . . + akx k of degree k that passes through the given data. Then we obtain a system of linear equations,

I

f(XI) = ao + alxl + a2x~ + f(X2) ao + alx2 + a2x 2 +

+ +

f(x n) = ao + alXn + a2x; +"

. .+

=

or, in matrix form, the system may be written as Ax

[

x~

t] [

11 Xl X2 x 2

X x

1 Xn x;

x~

···

2

...

=

YI Y2

akx~ =

Yn,

akxt akx2

=

= b:

ao al ]

[ Y2 YI ]

,'' -

...

ak

'

Yn

The left-hand side Ax represents the values of the polynomial atx, 's and the right-hand side represents the data obtained from the inputs Xi'S in the experiment. If n ~ k + 1, then the cases have already been discussed in Section 3.9.1. If n > k + 1, this kind of system may be inconsistent. Therefore, the best thing one can do is to find the polynomial f(x) that minimizes the sum of the squares of the vertical distances between the graph of the polynomial and the data. But, it is equivalent to find the least squares solution of the system Ax = b, because for any C E C(A) of the form

xt ] [ [jX

n

x2

x; ....." xi

o] a al

a~

[ ao + alxl + .. . + akxt ] ao+alx2+ · ··+akx2

=

ao + alx

n

1.. .+ akX~

we have

lib - cII 2 =

(YI - ao - a\x\ - .. " - akxf)2

+ ...

+(Yn - ao - alXn - ". . - akx~)2"

= c,

5.9.2. Application: Polynomialapproximations

189

The previous theory says that the orthogonal projection be ofb into the column space of A minimizes this quantity and shows how to find be and a least squares solution

Xo· Example 5.17 Find a straight line y = a + bx that fits best the given experimental data, (1,0), (2,3), (3,4) and (4,4). Solution: We are looking for a line y = a + bx that minimizes the sum of squares of the vertical distances IYi - a - bx, I's from the line Y = a + bx to the data (Xi, Yi). By adapting matrix notation

we have Ax = b and want to find a least squares solution of Ax = b. But the columns of A are linearly independent, and the least squares solution is x = (AT A)-l ATb. Now,

10]

30

'

(ATA)_I=[

~ -~] ATb=[ll] 1

l'

--

-

2

34

.

5

Hence, we have

o Problem 5.28 From Newton's second law of motion, a body near the surface of the earth falls vertically downward according to the equation

set) = so + vot

+

1

zgt 2 ,

where set) is the distance that the body travelled in time t, andso, Vo are the initial displacement and velocity, respectively, of the body, and g is the gravitational acceleration at the earth's surface. Suppose a weight is released, and the distances that the body has fallen from some reference point were measured to be s -0.18, 0.31, 1.03, 2.48,3.73 feet at times t 0.1. 0.2, 0.3, 0.4. 0.5 seconds. respect ively. Determine approximate values of so. vo, g using these data.

=

=

190

Chapter 5. Inner Product Spaces

5.9.3 Orthogonal projection matrices In Section 5.9.1, we have seen that the orthogonal projection ProjC(A) of the Euclidean space lRm on the column space C(A) of an m x n matrix A plays an important role in finding a least squares solution of an inconsistent system Ax b. Also, the orthogonal projection ProjC(A) is the main tool in the Gram-Schmidt orthogonalization. In general , for a given subspace U of lRm , the computation of the orthogonal projection Proju oflRm onto U appears quite often in applied science and engineering problems. The least squares solution method can also be used to find the orthogonal projection: Indeed, by taking first a basis for U and then making an m x n matrix A with these basis vectors as columns , one clearly gets U = C(A) , and so by Theorem 5.26

=

I

Proju = ProjC(A) = A(A T A)-lAT .

In fact, this projection itself is the orthogonal projection matrix, that is, the matrix representation of Proju with respect to the standard basis ex,

Note that this projection matrix Proju is independent of the choice of a basis for U due to the uniqueness of the matrix representation of a linear transformation with respect to a fixed basis. Example 5.18 Find the projection matrix P on the plane 2x - y - 3z = 0 in the space lR3 and calculate Pb for b = (1, 0, 1). Solution: Choose any basis for the plane 2x - y - 3z VI

Let A

~[

_:

= (0, 3, -1)

and

V2

~] be the matrix with vr end (A TA )- I = [10 6

6 5

The orthogonal projection matrix P

P

= =

=

]-1

=

= 0, say,

= (1 , 2, 0).

V2 as columns,Thon

..!.- [ 14

5 -6] - 6 10 .

= ProjC(A) is

A(A T A)-IA T

[[ 14

[H -i ~ 0

2 -1 [102 13 14 6 -3

5 -6] [0 3

-6

-H

10

1 2

-~ ]

5.9.3. Application: Orthogonal projection matrices and

Pb

= .!..

14

[1~6 -31~ _~] [~] 5 1

= .!.. [ 14

2~11 ].

191

o

If an orthonormal basis for U is known, then the computation of Proju = A(A T A)-I AT is easy, as shown in Remark (2) on page 185: For an orthonormal basis f3 = {uI, U2, . .. , un} for U, the orthogonal projection matrix onto the subspace U is given as

=

=

=

Example 5.19 If A [CI c2l, where CI (1,0,0), C2 (0,1,0), then the column vectors of A are orthonormal, C(A) is the x y-plane, and the projection of b = (x, y, z) E 1R3 onto C(A) is be = (x , y, 0). In fact,

which is equal to CIcj

+

0

C2C{.

Note that, if we denote by Proju; the orthogonal projection of R" on the subspace spanned by the basis vector u, for each i, then its projection matrix is uiuT, and so

and

Problem5029 Let u U

ifi=/=j if i = j .

= ()z ,)z) be a vector in lR2 which determines l-dimensional subspace

= {au = (:72' :72)

:a

E R] . Show that the matrix

considered as a linear transformation on lR2 , is an orthogonal projection onto the subspace U.

Problem5.30 Show that if {VI , v2, .00. v m} is an orthonormal basis for lRm, then VIv[ +

T T_[m· V2 v2+" ,+vmv m-

Chapter 5. Inner ProductSpaces

192

In general, if a basis {CI, C2, . . . , cn} for U is given, but not orthonormal,then one has to directly compute Proju = A(A T A)-I AT, where A = [CI c2 . . . cn] is an m x n matrix whose columns are the given basis vectors c.'s. Sometimesit is necessaryto computean orthonormalbasisfromthe givenbasis for U by the Gram-Schmidt orthogonalization. Its computationgivesus a decomposition of the matrix A into an orthogonal part and an upper triangular part, from which the computation of the projection matrix might be easier. QR decomposition method: Let {CI, C2, ... ,cn} be an arbitrary basis for a subspace U. The Gram-Schmidt orthogonalization process to this basis may be written as the following steps: (1) From the basis {CI, C2, . . . , cn} , find an orthogonal basis {ql' q2, ... , qn} for Uby

CI

q2

= =

C2 -

(ql, C2) ql (ql,qJ)

qn

=

Cn -

(qn-I,Cn) (ql, cn) qn-I - . . . ql · (ql, ql) (qn-I , qn-I)

ql

(2) By normalization of these vectors: u, = qi/llqj II, one obtains an orthonormal basis {UI , . . . , un} for U. (3) By rewriting those equations in (I), one gets

CI = C2 =

h aij = were

ql al2ql

+

(qi,Cj ) ~ . (qi ,qi) lor I

bij

for i

=

q2

=

.

< ], au =

bllUI

bl2 uI

+

b22 U2

1 d ,an

(q j, Cj)

= aij IIqjII = -(--) IIqill = (u., qj,qj

Cj),

: s i. which is just the component of Cj in u, direction.

(4) Let A = [CI C2 . .. cn]' Then, the equations in (3) can be written in matrix

notation as bll

o

A = [CI C2 . . . cn] = [UI U2 . . . un]

~

[

bvi

bin] b2n

b22

o

bnn

=QR,

5.9.3. Application: Orthogonal projection matrices

193

where Q = [UI U2 .. . un] is the m x n matrix whose orthonormal columns are obtained from Cj 's by the Gram-Schmidt orthogonalization, and R is the n x n upper triangular matrix. (5) Note that rank A = rank Q = n ::: m , and C(Q) = V = C(A) which is of dimension n in IR m • Moreover, the matrix R is an invertible n x n matrix, since each b jj = (uj , C j) is equal to IIcj - ProjUj_l (Cj) II , where Vj_1 is the subspace of IR m spanned by {CI, C2, . .. , cj-d (or equivalently, by (UI , U2, . • •, uj-d), and so bjj ;6 0 for all j because Cj ~ Vj-I. One of the byproducts of this computation is the following theorem. Theorem 5.27· Any m x n matrix ofrank n can be factored into a product Q R, where Q is an m x n matrix with orthonormal columns and R is an n x n invertible upper triangular matrix. Definition 5.11 The decomposition A = QR is called the QR factorization or the QR decomposition of an m x n matrix A, (rank A = n), where the matrix Q [UI U2 .. . un] is called the orthogonal part of A, and the matrix R [bij] is called the upper triangular part of A .

=

=

Remark: In the QR factorization of A = QR, the orthononnality of the column vectors of Q means QT Q = In, and the j-th column of the matrix R is simply the coordinate vector of Cj with respect to the orthonormal basis f3 = {u I, U2 , • .. , un} for V : i.e., and so

o

o

With this QR decomposition of A, the projection matrix and the least squares solution of Ax = b for b E IR m can be calculated easily as

P

Xo

= =

A(A T A)-I AT (AT A)-I ATb

= QR(R T QT QR)-I R T QT = QQT, = (R T QT QR)-I R T QTb = R- 1 QTb.

Corollary 5.28 Let A be an m x n matrix of rank n and let A = Q R be its Q R factorization. Then, (1) the projection matrix on the column space of A is [Prok(A)]a = Q QT . (2) The least squares solution of the system Ax = b is given by XQ = R - I QTb, which can be solved by using back substitution to the system Rx = QTb.

194

Chapter 5. Inner Product Spaces

Example 5.20 (QR decomposition of A) Find the QR factorization A = QR and the orthogonal projection matrix P = [PrOk(A)]a for

A=

[CI C2 C3]

=

[

1 1 0] 101 0 1 1 . 001

Solution: We first find the decomposition of A into Q and R, the orthogonal part and the upper triangular part. Use the Gram-Schmidt orthogonalization to get the column vectors of Q:

ql = q2 q3 and IIqlll

CI

= (I, I, 0, 0)

= (~ , -~,

=

C2 - C2' ql ql ql . ql

=

C3 - C3 . q2 q2 _ C3 . ql ql = q2 . q2 ql . ql

2

1,

2

0) (-~ , ~ , ~, 1) , 3 3

3

= -/2, IIq211 = .J'J72, IIq311 = ../173. Hence, UI U2

=

.s, _

=

~_

U3 =

IIqJII IIq211 -

(_1 _1 00) -/2' -/2'

,

0)

(_1 _ _ 1 -/2 ../6 ' ../6'-.13'

q3 ( IIq311 = -

2

2

-.13)

2

.J2T' .J2T' .J2T' .;7 .

Then, CI = -/2uI, C2 = JzUI + .j[U2' C3 = JzUI + ~U2 + ftU3 ' In fact, these equations can also be derived from A = QR with an upper triangular matrix R . (It gives that the (i, j)-entry of R is bij = (Uj , Cj).) Therefore,

A =

[H :] o

= [UI U2 U3]

0 1

[~lj~ .;7/-.13 ~j~ ] 0

-2/.J2T] [ 2/.J2T o -/2/-.13 2/.J2T o 0 -.13/.;7

1/-/2 1/../6 1/-/2 -1/../6

= [

0

-/2 0 0

1/-/2

-.13/-/2 0

1/-/2] 1/../6

.;7/-.13

= QR,

and

6/7 p=QQT=

[

_~~~ -2/7

1/7

1/7

2/7

2/7

6/7 -1/7 -1/7 6/7

- 2/ 7 ] 2/7 2/7 .

3/7

o

5.9.3. Application: Orthogonal projection matrices Problem 5.31 Find the 2 x 2 matrix P that projects the xy -plane onto the line y

195

=x .

[i ~ l

Problem 5.32 Find the projection matrix P of the Euclidean 3-space]R3 onto the column space

C(A)focA~

Problem 5.33 Find the projection matrix P on the XJ, X2, X4 coordinate subspace of the Euclidean 4-space ]R4. Problem 5.34 Find the QR factorization of the matrix [ sin 88

cos

cos 0

8

]

.

As the last part of this section , we introduce a characterization of the orthogonal projection matrices. Theorem 5.29 A square matrix P is an orthogonal projection matrix if andonly if it is symmetric and idempotent, i.e., pT = P and p 2 = P. Proof: Let P be an orthogonal projection matrix . Then , the matrix P can be written as P = A(A T A)-J AT for a matrix A whose column vectors form a basis for the column space of P. It gives

pT

=

(A(A T A)-JATf

p2

=

PP

= A(A T A)-J T AT = A(A T A)-JA T = P,

= (A(A T A)-JA T )

(A(A T A)-JA T )

= A(A T A)-lA T = P.

In fact, this second equation was already shown in Theorem 5.9. Conversely, by Theorem 5.16, one has the orthogonal decomposition jRm = C(P)EB N(p T). But, N(p T) = N(P) since p T = P. Thus , for any u + 0 E C(P) EB N(p T) = jRm, P(u+o) = Pu+ Po = Pu = u, because p 2 = P implies Pu = u for u E C(P) . It shows that P is an orthogonal projection matrix. (Alternatively, one can use directly Theorem 5.9). 0 From Corollary 5.10, if P is a projection matrix on C(P), then 1- P is a projection matrix on the null space N(p) (= C(l - P)), which is orthogonal to C(P) (=

N(l - P)). Example 5.21 Let Pi : jRm --+ jRm be defined by

Pi(Xl,··· ,Xm)=(O, . . . ,O, Xi, 0, . .. ,0), for i = 1, 2, ... , m. Then, each Pi is the projection of jRm onto the i -th axis, whose matrix form looks like

Chapter5. Inner Product Spaces

196

Pi

=

o

o

o

o

I - Pi =

o

o

o

When we restrict the image to JR, Pi is an element in the dual space JRn*, and usually 0 denoted by Xi as the i-th coordinate function (see Example 4.25). Problem 5.35 Show that any square matrix P that satisfies p T P

= P is a projection matrix.

5.10 Exercises 5.1. Decide which of the following functions on lR.2 are inner products and which are not. For x = (XltX2), Y = (Yl, Y2) in]R2

= XlY l X2Y2. = 4Xl Yl + 4X2Y2 - Xl Y2 - X2Yl, (3) (x, y) = Xl Y2 - X2Yl, (4) (x, y) = XlYI + 3X2Y2, (5) (x, y) = XlYI - Xl Y2 - X2Yl + 3X2Y2· Show that the function (A, B) = treAT B) for A, B E Mnxn(]R) defines an inner product (1)

(x, y)

(2) (x, y)

5.2.

on Mnxn(lR) . 5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R5. 5.4. Determine the values of k so that the given vectors are orthogonal with respect to the Euclidean inner product in ]R4.

(l)IUlun (2)IUH -nl

5.5. Consider the space qo, 1] with the inner product defined by

ir. g) =

f

f(x)g(x)dx .

Compute the length of each vector and the cosine of the angle between each pair of vectors in each of the following :

= 1, g(x) = X; = x m, g(x) = x n , where m, n are positive integers; f( x) = sin,.,rrx, g(x) = cosrerx, where m, n are positive integers.

(1) f(x) (2) f(x) (3)

5.6. Prove that

(al for any real numbers al, a2,

+

+

an)2 ::: near

+ ... +

a;)

, an. When does equality hold?

5.10. Exercises

197

5.7. Let V = P2([0, 1]) be the vector space of polynomials of degree g 2 on [0, 1] equipped with the inner product

(f, g) (1) Compute (f, g) and

1If11

= 10

1

f(t)g(t)dt.

for f(x) = x + 2 and g(x) = x 2 - 2x - 3.

(2) Find the orthogonal complement of the subspace of scalar polynomials. 5.8. Find an orthonormal basis for the Euclidean 3-space ]R3 by applying the Gram-Schmidt orthogonalization to the three vectors x = (1, 0, 1), X2 = (1, 0, - 1), x3 = (0, 3, 4). 5.9. Let W be the subspace of the Euclidean space]R3 spanned by the vectors VI = (1, 1, 2) andv2=(I , 1, -1). Find Projg-Ib) forb = (1, 3, -2) . 5.10. Show that if u is orthogonal to v, then every scalar multiple of u is also orthogonal to v, Find a unit vector orthogonal to VI = (1, 1, 2) and v2 = (0, 1, 3) in the Euclidean 3-space ]R3. 5.11. Determine the orthogonal projection of VI onto v2 for the following vectors in the n-space ]Rn with the Euclidean inner product. (1) VI = (1, 2, 3), v2 = (1, 1, 2), (2) VI = (1,2, 1), V2 = (2, 1, -1),

(3) VI = (1, 0, 1, 0) , v2 = (0, 2, 2, 0). 5.12. Let S = {v;}, where Vi'S are given below. For each S, find a basis for Sol with respect to the Euclidean inner product on ]Rn. (1) VI

= (0, 1, 0),

v2

= (0, 0,

1),

(2) VI = (1, 1, 0), V2 = (1, 1, 1), (3) VI = (1, 0, 1,2), v2 = (1, 1, 1, 1), vs = (2, 2, 0, 1). 5.13. Which of the following matrices are orthogonal? 1/2 -1/3 ] 1/3' (1) [ -1/2

°

1/./2 (3)

°

-1/./2 [ -1/./2 1/./2

4/5 -3/5 ] 4/5' (2) [ -3/5 -1/./2] 1/./2 ,

°

(4)

1/ ./2 1/./3 -1/..;'6] 1/./2 -1/./3 1/..;'6 . [ 2/..;'6 1/./3

°

5.14. Let W be the subspace of the Euclidean 4-space ]R4 consisting of all vectors that are orthogonal to both x = (1, 0, -1 , 1) and Y = (2, 3, -1, 2). Find a basis for the subspace W. 5.15. Let V be an inner product space. For vectors x and Yin V , establish the following identities : (1) (x, y) = (2) (x,y) =

! IIx + yll2 - ! IIx -

yll2

1

(lIx+YII2 -lIxll 2 -IIYI12)

(3) IIx+ YII 2 + IIx - YII 2 = 2(lIx1l 2 + IIYII 2)

(polarization identity) , (polarization identity), (parallelogram equality).

5.16. Show that x + Yis perpendicular to x - Yif and only if [x] = IIYII .

198

Chapter 5. Inner Product Spaces

o Figure 5.6. n-dimensional parallelepiped P(A)

5.17. Let A be the m x

n matrix whose columns are Cl, C2, ... , Cn in the Euclidean m-space IRm . Prove that the volume of the n-dimensional parallelepiped P(A) determined by those vectors Cj 's in IRm is given by

vol (A)

=

J

det(AT A) .

(Note that the volume of the n-dimensional parallelepiped determined by the vectors c j , c2, ... , Cn in IRm is by definition the multiplication of the volume of the (n - 1)dimensional parallelepiped (base) determined by C2, . . . , Cn and the height of cl from the plane W which is spanned by C2, .. . , Cn' Here, the height is the length of the vector C = q - Proj W(q ), which is orthogonalto W . (See Figure 5.6.)lfthe vectors are linearly dependent, then the parallelepiped is degenerate, i.e., it is contained in a subspace of dimension less than n .)

5.18. Find the volume of the three-dimensional tetrahedron in the Euclidean 4-space

jR4

whose

vertices are at (0,0,0,0), (1,0,0,0), (0, 1,2,2) and (0,0, 1,2).

5.19. For an orthogonal matrix A, show that det A matrix A for which det A = -1.

= ±1. Give an example of an orthogonal

5.20. Find orthonormal bases for the row space and the null space of each of the following matrices.

(I)

243] 1 1 I ,

[2

0 I

(2)

[

1 40]

-2 -3 1 002

,

5.21. Let A be an m x n matrix of rank r. Find a relation among m, nand r so that Ax = b has infinitely many solutions for every b E jRm . 5.22. Find the equation of the straight line that fits best the data of the four points (0, 1), (1, 3), (2, 4), and (3, 4). 5.23. Find the cubic polynomial that fits best the data of the five points (-1 , -14), (0, -5), (1, -4), (2, 1), and (3, 22). 5.24. Let W be the subspace of the Euclidean 4-space IR4 spanned by the vectors Xj 'S given in each of the following problems. Find the projection matrix P for the subspace W and the null space N(P) of P. Compute Pb for b given in each problem.

5.10. Exercises

199

(1) xI = (1, I , 1, 1),X2=(1 , -I , 1, -1),X 3= (-I , 1, 1, O),and b = (1, 2, I, 1). (2) xI = (0, -2, 2, I ), X2 = (2, 0, -1, 2), and b = (1, 1, I , 1). (3) XI = (2, 0, 3, -6),X2 = (- 3, 6, 8, O),andb= (-I , 2, -I , 1).

5.25. Find the matrix for orthogonal projection from the Euclidean 3-space jR3 to the plane spanned by the vectors (1, 1, 1) and (1, 0, 2). 5.26. Find the projection matrix for the row space and the null space of each of the following

[{s -f ],

matrices: (1)

../5

../5

(2)

I] 1 1 I ' [ 24

(3)

1 4 0] 0 0 2 . [ 2 3 -1

5,27. Consider the space C[ -1, 1] with the inner product defined by

U, g) =

11

-I

f(x)g(x)dx.

A function f E C[-I, 1] is even if f(-x) = f(x), or odd if f(-x) = - f(x) . Let U and V be the sets of all even functions and odd functions in C[ -1, 1], respectively. (1) Prove that U and V are subspace s and C[-I , 1] = U

+

V.

(2) Prove that U .1 V. (3) Prove that for any f E C[-l , IJ.11fll 2 = IIh1l 2 + IIgll 2 where f = h+g E U Etl V.

5.28. Determine whether the following statements are true or false, in general, and justify your answers. (1) An inner product can be defined on any vector space. (2) Two nonzero vectors X and Y in an inner product space are linearly independent if and only if the angle between x and Yis not zero. (3) If V is perpendicular to W , then V .L is perpendicular to W .L. (4) Let V be an inner product space . Then IIx - YII ~ [x] - IIYII for any vectors Xand Y in V.

(5) Every permutation matrix is an orthogonal matrix . (6) For any n x n symmetric matrix A, x T Ay defines an inner product on R", (7) A square matrix A is a projection matrix if and only if A 2 = J. (8) For a linear transformation T : jRn --+ jRn, T is an orthogonal projection if and only if idlR.n - T is an orthogonal projection. (9) For any m x n matrix A , the row space R(A) and the column space C(A) are orthogonal. (10) A linear transformation T is an isomorphism if and only if it is an isometry . (11) For any m x n matrix A and bE jRm , AT Ax = ATb always has a solution. (12) The least squares solution of Ax = b is unique for any matrix A . (13) The least squares solution of Ax = b is the orthogonal projection of b on the column space of A.

6 Diagonalization

6.1 Eigenvalues and eigenvectors Gaussian elimination plays a fundamental role in solving a system Ax = b of linear equations. In general, instead of solving the given system, one could try to solve the normal equation A T Ax = A Tb , whose solutions are the true solutions or the least squares solutions depending on whether or not the given system is consistent. Note that the matrix A T A is a symmetric square matrix, and so one may assume that the matrix in the system is a square matrix . For this kind of reason, we focus on a diagonal matrix or a linear transformation from a vector space to itself throughout this chapter. Recall that a square matrix A, as a linear transformation on jRn , may have various matrix representations depending on the choice of the bases, which are all in similar relations. In particular, A itself is the matrix representation with respect to the standard basis. One may now ask whether there exists a basis f3 with respect to which the matrix representation [Alp of A is diagonal or not. But then A and a diagonal matrix D = [Alp are similar : i.e., there is an invertible matrix Q such that D = Q-l A Q. In this chapter, we will see which matrices can have diagonal matrix representations and how one can find such representations. For this we introduce eigenvalues and eigenvectors, which play important roles in their own right in mathematics and have far-reaching applications not only in mathematics, but also in other fields of science and engineering. Some specific applications of diagonalization of a square matrix A are to (1) (2) (3) (4) (5)

solving a system Ax = b of linear equations, checking the invertibility of A or estimation of det A, calculating a power An or the limit of a matrix series L~l An, solving systems of linear differential equations or difference equations , finding a simple form of the matrix representation of a linear transformation , etc.

One might notice that some of these problems are easy if A is diagonal. Definition 6.1 Let A be an n x n square matrix. A nonzero vector x in the n-space jRn is called an eigenvector (or characteristic vector) of A if there is a scalar A in lR J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

202

Chapter 6. Diagonalization

such that Ax = AX. The scalar A is called an eigenvalue (or characteristic value) of A, and we say X belongs to A. Geometrically, an eigenvector of a matrix A is a nonzero vector x in the n-space jRn such that the vectors x and Ax are parallel. In other words, the subspace W spanned by x is invariant under the linear transformation A : jRn ~ jRn in the sense A (W) C W. Algebraically, an eigenvector x is a nontrivial solution of the homogeneous system (A./- A)x = 0 of linear equations, that is, an eigenvector x is a nonzero vector in the null spaceN(A./ - A). There are two unknowns in the system (A./ - A)x = 0: an eigenvalue A and an eigenvector x. To find those unknowns , first we should determine an eigenvalue A by using the fact that the equation (A./ - A)x = 0 has a nontrivial solution x if and only if A satisfies the equation det(A./ - A) = 0, called the characteristic equation of A. Note that det(A./ - A) is a polynomial of degree n in A and it will be called the characteristic polynomial of A. Thus, the eigenvalues are just the roots of the characteristic equation det(A./ - A) O. Next, the eigenvectors of A can be determined by solving the homogeneous system (A./ - A)x = 0 for each eigenvalue A. In summary, by referring to Theorem 3.26 we have the following theorem.

=

Theorem 6.1 For any square matrix A. the following are equivalent: (1) (2) (3) (4)

A is an eigenvalue of A; det(A./ - A) = 0 (or det(A - A./) = 0); H - A is singular; the homogeneous system (A./ - A)x 0 has a nontrivial solution .

=

Recall that the eigenvectors of A belonging to an eigenvalue A are just the nonzero vectors x in the null space N(A./ - A). This null space is called the eigenspace of A belonging to A, and denoted by H(A) . Example 6.1 (Matrix having distinct eigenvalues) Find the eigenvalues and eigenvectors of

A=

[ .J22 .J2t2].

Solution: The characteristic polynomial is det(A./ - A) = det [ A - r,;2 -.J2] = A2 - 3A = A(A- 3) . -"\12 A- I Thus the eigenvalues are Al = 0 and A2 = 3. To determine the eigenvectors belonging to Ai 'S, we should solve the homogeneous system of equations (AiI - A)x O. Let us take Al = 0 first; then the system of equations (All - A)x = 0 becomes

=

6.1. Eigenvalues and eigenvectors {

-2 x I

= =

-/2X2

--/2 x I

x2

0,

0,

or

X2

=

203

-h XI .

Hence, XI = (XI , X2) = (-1, -/2) is an eigenvector belonging to AI = 0, and E(O) = {txI : t E R}. (Here, one can take any nonzero solution (XI, X2) as an eigenvector XI belonging to AI 0.) For A2 = 3, the system of equations (A2l - A)x = 0 becomes

=

XI { --/2 XI

-

= =

-/2x2 2 X2

+

0, 0,

or

XI

=

h

X2 .

Thus, by a similar calculation, X2 = (-/2, I) is one of the eigenvectors belonging to A2 = 3 and E (3) = {tx2 : t E R}. Note that the eigenvectors XI and X2 belonging to the eigenvalues AI and A2 respectively are linearly independent. 0 Example 6.2 (Matrix having a repeated eigenvalue butfull eigenvectors) Find a basis for the eigenspaces of A=

3-2 0]

[o -2

3 0 0 5

.

Solution: The characteristic polynomial of A is (A- l )(A- 5)2, so thattheeigenvalues of A are AI = 1 and A2 = 5 with multiplicity 2. Thus, there are two eigenspaces of A . By definition, X = (XI, X2, X3) is an eigenvector of A belonging to A if and only if Xis a nontrivial solution of the homogeneous system (Al - A)x = 0:

[

A-2 3 A -23 00] [ o

0

A- 5

XI] X2 X3

=

[

0] 0 . 0

If Al = I, then the system becomes

Solving this system yields XI = t, X2 = t , X3 = 0 for t E R. Thus , the eigenvectors belonging to AI = 1 are nonzero vectors of the form

so that (1, 1, 0) is a basis for the eigenspace E(AI) belonging to Al = 1. If A2 = 5, then the system becomes

204

Chapter 6. Diagonalization

Solving this system yields Xl = -s, X2 = s, X3 = t for s, t E R Thus, the eigenvectors of A belonging to )..2 = 5 are nonzero vectors of the form

for s, t E R Since (-I , 1, 0) and (0, 0, 1) are linearly independent, they form a basis for the eigenspace E()..2) belonging to)..2 = 5. 0 For each eigenvalue), of A in Examples 6.1 and 6.2, one can see that the dimension of the eigenspace E()") is equal to the multiplicity of ).. as a root of the equation det(H - A) = O. But, in general it is not true as the next example shows. Example 6.3 (Matrix having a repeated eigenvalue with insufficient eigenvectors) Consider the matrix

A~[H i~ n

A simple computation shows that the characteristic polynomial of the matrix A is ().. - 2)5 so that the eigenvalue X = 2 is of multiplicity 5. However, there is only one linearly independent eigenvector ej = (I , 0, 0, 0, 0) belonging to ).. = 2 because rank(2/ - A) = 4, which shows that dim E()") = dimN(21 - A) = 1 is less than the multiplicity of ).. . This kind of matrix will be discussed later in Chapter 8. 0 Note that the equation det()..] - A) = 0 may have complex roots, which are called complex eigenvalues. However, the complex numbers are not scalars of the real vector space. In many cases, it is necessary to deal with those complex numbers, that is, we need to expand the set of scalars to the set of complex numbers. This expansion of the set of scalars to the set of complex numbers leads us to work with complex vector spaces, which will be treated in Chapter 7. In this chapter, we restrict our discussion to the case of real eigenvalues, even though the entire discussion in this chapter applies in the same way to the complex vector spaces. Example 6.4 (Matrix having complex eigenvalues) The characteristic polynomial of the matrix A = [ c~s 0 - sin 0 ] smO cosO is )..2 - 2 cos 0)" + (cos 20 + sin 2 0) . Thus, the eigenvalues are X = cosO ± i sinO, which are complex numbers, so this matrix as a rotation of]R2 has no real eigenvalues 0 unless 0 = nn, n = 0, ±l, ±2, . . . .

6.1. Eigenvalues and eigenvectors

205

Problem 6.1 Let x be an eigenvalue of A and let x be an eigenvector belonging to )... Use mathematical induction to show that xm is an eigenvalue of Am and x is an eigenvector of Am belonging to ).. m for each m = 1, 2, . . . .

In the following, we derive some basic properties of the eigenvalues and eigenvectors. Lemma 6.2 (1) If A is a triangular matrix, then the diagonal entries are exactly the eigenvalues of A. (2) If A and B are square matrices similar to each other; then they have the same characteristic polynomial.

Proof: (1) The characteristic equation of an upper triangular matrix is

=

(A - all) . .. (A - ann) = O.

(2) Since there exists a nonsingular matrix Q such that B det(AI - B) =

= Q-I AQ ,

det (Q-I(Al)Q - Q-I AQ) det (Q-IO.. I - A)Q)

=

det Q-I det (AI - A) det Q

=

det(H - A).

0

Lemma 6.2(2) says that similar matrices have the same eigenvalues, i.e., the eigenvalues are invariant under the similarity. However, their eigenvectors might be different: in fact, x is an eigenvector of B belonging to A if and only if Qx is an eigenvector of A belonging to A, since A Q = Q B, and A Qx = QBx = AQx .

Theorem 6.3 Let an n x n matrix A have n eigenvalues AI, A2. . .. , An possibly with repetition. Then, (1) det A = Al A2 .. . An, (the product of the n eigenvalues), (2) tr(A) = Al + A2 + .. . + An, (the sum ofthe n eigenvalues).

Proof: (1) Since eigenvalues AI, A2, . .. • An are the zeros of the characteristic polynomial of A, we have

206

Chapter 6. Diagonalization

If we take A = 0 in both sides, then we get

(2) On the other hand,

(A - At}(A - A2) . .. (A - An) = det(U - A)

~a~lll

n

A

=

det

-al -a2n

.

[

A. ;

-anI

]

'

=

which is a polynomial of the form peA) An +Cn_IA n- 1+.. .+ CIA + Co in A. One can compute the coefficient Cn-I of An-I in two ways by expanding both sides, and 0 get AI + A2 + . .. + An = al1 + a22 + . .. + ann = tr(A) .

Problem 6.2 Show that

(1) for any 2 x 2 matrix A , det(A/ - A) (2) for any 3 x 3 matrix A, det(A/ - A)

= -A 3 +

= A2 + tr(A)A +

det A ;

1L.J " (aija ji - ai;a jj)A tr(A)A2 + 2

+

det A .

ii'j

In Theorem 6.3, we assume that the matrix A has n (real) eigenvalues counting multiplicities. But, by allowing the scalars to be complex numbers, which will be done in the next chapter, every n x n matrix has n eigenvalues counting multiplicities, so that Theorem 6.3 remains true for any square matrix.

Corollary 6.4 The determinant and the trace of A are invariant under similarity.

=

Recall that any square matrix A is singular if and only if det A O. However, det A is the product of its n eigenvalues . Thus a square matrix A is singular if and only if zero is an eigenvalue of A, or A is invertible if and only if zero is not an eigenvalue ofA. The following corollaries are easy consequences of this fact.

Corollary 6.5 For any n x n matrices A and B, the following are equivalent. (1) Zero is an eigen value of AB. (2) A or B is singular. (3) Zero is an eigenvalue of B A.

6.2. Diagonalization of matrices

207

Corollary 6.6 For any n x n matrices A and B, the matrices A Band B A have the same eigenvalues. Proof: By Corollary 6.5, zero is an eigenvalue of AB if and only ifit is an eigenvalue of BA. Let A be a nonzero eigenvalue of AB with (AB )x = AX for a nonzero vector x. Then the vector Bx is not zero, since A i= 0, but (B A)( Bx) = B(Ax) = A(Bx).

This means that Bx is an eigenvector of BA belonging to the eigenvalue A, and A is an eigenvalue of B A. Similarly, any nonzero eigenvalue of B A is also an eigenvalue of AB . 0

Problem 6.3 Find the matrices A and B such that det A

= det B, tr(A) = tr(B), but A is not

similar to B. Problem 6.4 Show that A and A T have the same eigenvalues. Do they necessarily have the same eigenvectors? Problem 6.5 Let AI, A2, . . . , An be the eigenvalues of an n x n matrix A . Then (1) A is invertible if and only if Ai

f

0 for all i

= 1,2, . . . , n.

I I (2) If A is invertible, then the inverse A -I has eigenvalues - , - ,

Al A2

An

Problem 6.6 For any n x n matrices A and B , show that AB and BA are similar if A or B is

nonsingular. Is it true for two singular matrices A and B?

6.2 Diagonalization of matrices In this section, we are going to show what kinds of square matrices are similar to diagonal matrices. That is, given a square matrix A, we want to know whether there exists an invertible matrix Q such that Q-I AQ is a diagonal matrix, and if so, how one can find such a matrix Q. Definition 6.2 A square matrix A is said to be diagonalizable if there exists an invertible matrix Q such that Q-I A Q is a diagonal matrix (i.e., A is similar to a diagonal matrix) . If a square matrix A is diagonalizable, then the similarity D = Q-I A Q gives an easy way to solve some problems related to the matrix A, like (1)-(5) on page 20l. For instance, let Ax = b be a system of linear equations with a square matrix A, and suppose that there is an invertible matrix Q such that Q-I A Q is a diagonal matrix D. Then the system Ax = b can be written as QDQ -I x = b, or equivalently DQ-I x = Q-1b. Hence, for c Q-1b the solution y of Dy = c yields the solution x = Qy of the system Ax = b. Note that Dy = c can be solved easily. The next theorem characterizes a diagonalizable matrix, and the proof shows a practical way of diagonalizing a matrix.

=

208

Chapter 6. Diagonalization

Theorem 6.7 Let A be an n x n matrix. Then A is dlagonalizable has n linearly independent eigenvectors.

if and only if A

Proof: (=» Suppose A is diagonalizable. Then there is an invertible matrix Q such that Q-I AQ is a diagonal matrix D, say

0

..

A2 ..

.

AI

o

Q-IAQ=D= [ or, equivalently, AQ

o

0

= QD. Let XI, .. . , Xn denote the column vectors of Q. Since AQ =

[AxI AX2

QD

[AIXI A2X2

=

AXnl, Anxnl.

the matrix equation AQ = QD implies AXi = AiXi for i = 1, ... , n. Moreover, since Q is invertible, their column vectors are nonzero and are linearly independent, that is, the Xi'S are n linearly independent eigenvectors of A . ( 1 so that the number of distinct eigenvalues is strictly less than n. In this case, if such a matrix still has n linearly independent eigenvectors, then it is also diagonalizable, because for a diagonalization all we need is n linearly independent eigenvectors (see Example 6.2 or try with the matrix Un) . In some cases , such a matrix does not have n linearly independent eigenvectors (see Example 6.3), so a diagonalization is impossible. This case will be discussed in Chapter 8. The next example shows a simple application of the diagonalization to the computation of the power An of a matrix A.

Example 6.6 Compute A 100 for A = [ ;

i].

Solution: Its eigenvalues are 5 and -2 with associated eigenvectors (1, 1) and (-4, 3), respectively. Hence Q =

[~

- : ] diagonalizes A, i.e.,

212

Chapter 6. Diagonalization

Therefore,

o Problem 6.9 For the matrix A =

[

5 -4 4]

12 -11 12 , 4 -4 5 (1) diagonalize the matrix A; and (2) find the eigenvalues of A 10 + A7 + 5A.

6.3 Applications 6.3.1 Linear recurrence relations Early in the thirteenth century, Fibonacci posed the following problem: "Suppose that a newly born pair of rabbits produces no offspring during the first month of their lives, but each pair gives birth to a new pair once a month from the second month onward. Starting with one (= XI) newly born pair in the first month, how many pairs of rabbits can be bred in a given time, assuming no rabbit dies?" Initially, there is one pair. After one month there is still one pair, but two months later it gives a birth, so there are two pairs. If at the end of n months there are X n pairs, then after n + 1 months the number will be the X n pairs plus the number of offspring of the Xn-I pairs who were alive at n - 1 months. Therefore, we have for n ::: 2, Xn+1

Here , if we assume become

Xo

= Xn +

Xn- I ·

= 0 and XI = 1, then the first several terms of the sequence

0, I, 1, 2, 3, 5, 8, 13, 21, 34, 55, .. .. This sequence is called the Fibonacci sequence and each term is called a Fibonacci number. Example 6.7 Find the 2000 th Fibonacci number. Solution: A standard trick is to consider a trivial extra equation X n = with the given equation: Xn+1 {

Equivalently in matrix notation ,

Xn

= =

+

Xn-I

Xn

together

6.3.1. Application: Recurrence relations

213

which is of the form

= AXn_1 = Anxo,

Xn

where Xn

= [ X::I

],

Xo

= [~]

n

and A

= 1,

2, . .. ,

= [~ ~] .

Thus, the problem is

reduced to computing An . However, a simple computation gives the eigenvalues Al = + ../5), A2 = ../5) of A and their associated eigenvectors VI = (AI, 1), V2 = (A2, 1), respectively. Moreover, the basis-change matrix and its inverse are found to be

to

to -

Q-I_ _ 1 [

-../5

With D = [

1+J5 ~

0

1

-1

_1-J5] 2 1+J5 . 2

]

12J5'

e-2oJ5f ] Q-. I

For instance, if n = 2000, then X2001 ] [ X2000

= X2000

It gives

In general, the Fibonacci numbers satisfy

for n ::: O. Note that since huge number

Js (1+J5f ooo, because e-2J5t is actually very small for large k . X2000

2

must be an integer, we look for the nearest integer to the

214

Chapter 6. Diagonalization

Historically, the number I+z.J5, which is very close to the ratio ~=, is called the golden mean. 0 Remark: The golden mean is one of the mysterious naturally occurring numbers, like e = 2.71828182··· or 1f = 3.14159265··· and it is denoted by ¢o Its decimal for 0 < s < l representation is ¢ = 1.61803398 .. · . It is also described as ¢ = satisfying =

f

its.

f

Definition 6.3 A sequence {Xn : n ~ O} of numbers is said to satisfy a linear recurrence relation of order k if there exist k constants aj , i = 1, ... , k with al and ak -ronzero such that Xn

= alXn-1

+ azxn-z + ... +

akXn-k

for all n ~ k ,

For example, the relation Xn = aXn-1 of order 1 forn = 1,2, . gives a geometric sequence x; = anxo, and the relation xs.i.; = Xn +Xn-l of order 2 withxo = 0, Xl = 1 for n = 1,2, gives the Fibonacci sequence. A solution to the recurrence relation is any sequence {xn : n ~ O} of numbers that satisfies the equation. Of course, a solution can be found by simply writing out enough terms of the sequence if k beginning values xo, Xl, • . . , Xk-l, called the initial values, are given. As in the case of the Fibonacci sequence, one can write the recurrence relation 0

0

0

••

= alXn-1

Xn

+

azxn-z

+

o

••

+

akXn-k

for all

n ~

k,

or equivalently, Xn+k-l

= aIXn+k-Z + aZXn+k-3

+ . .. + akXn-1

for all

n ~

1.

Its matrix form with some trivial extra equations is Xn+k-l

al

az

a3

ak-l

ak

Xn+k-Z

Xn+k-Z

1 0

0 1

0 0

0 0

0 0

Xn+k-3

0

0

0

1

0

Xn-l

=

Xn =

Xn

Xn+l Xn

for n ~ 1, or simply Xn = a recurrence relation Xn =

AXn-I oThe

matrix

A=

A

is called the companion matrix of

AXn-l,

we first compute the characteristic

AXn-l.

To solve a recurrence relation Xn = polynomial of a companion matrix A. Lemma 6.10

= AXn-1

For a companion matrix

al

az

a3

ak-l

ak

1 0

0 1

0 0

0 0

0 0

0

0

0

0

with al and ak nonzero,

6.3.1. Application: Recurrence relations

215

(1) the characteristic polynomial of A is Ak - a\A k-\ - ... - ak-IA - ak. (2) All eigenvalues of A are nonzero and for any eigenvalue A of A, X n = An is a solution of the recurrence relation Xn = AXn-i.

Proof: (1) Use induction on k. Clearly true for k = 1. Assume the equality for k = n - 1. Let k = n, By taking the cofactor expansion of det(Al - A) along the last column, the induction hypothesis gives det(Al - A)

= =

+

A(An- 1 - aIA n-2 - . .. - an-I) (-1)2n-I an nI An - aiA - . . . - an-IA - an,

(2) Clearly all eigenvalues are not zero, because ak f= O. It follows from (I) that for any eigenvalue A of A, X n = An satisfies the recurrence relation

o Remark: (1) By Lemma 6.10(1), every monic polynomial, a polynomial whose coefficient of the highest degree is 1, can be expressed as the characteristic polynomial of some matrix A. This matrix A is also called the companion matrix of the monic polynomial peA) = Ak - aiA k- I - . . . - ak-IA - ak. (2) From Lemma 6.10(1), one can see that if a recurrence relation

is given, then the characteristic equation of the associated companion matrix A can be obtained from the recurrence relation by replacing Xi with Ai and dividing the resulting equation by An-k . This relation between the recurrence relation and the characteristic equation of the matrix A can be a reason why {An : n :::: O} is a solution for each eigenvalue A. Lemma 6.11 IfAo is an eigenvalue ofthe companion matrix A ofa lineardijference equation of order k, then the eigenspace E(AO) is a I-dimensional subspace and contains [A~-I .". . AO If. Proof: An entry-wise comparison of Ax = AOX,

shows that Xi = AOXi-I = A~-I Xl for i = 2, ... ,k.

o

The recurrence relation Xn = AXn-I can be solved if the companion matrix A is diagonalizable.

216

Chapter 6. Diagonalization

Example 6.8 (Recurrence relation recurrence relation Xn

= 6Xn-1

with initial values Xo = 0,

Xl

-

= 1,

Xn

=

1 IXn-2 X2

AXn-1

+

with diagonalizable A) Solve the

6X n-3

for n

~

3

= -1.

Solution: In matrix form, it is

The characteristic polynomial of A is det(H - A) = .1. 3 - 6.1.. 2 + 11.1.. - 6 = (A - 1)(.1.. - 2)(.1.. - 3), by Lemma 6.10. Hence, the eigenvalues are Al = I, .1..2 = 2, .1..3 = 3 and their associated eigenvectors are

respectively, by Lemma 6.11. Moreover, the basis-change matrix and its inverse can be found to be Q

= [VI V2 V3] =

With

D~U one can get

0 2 0

[

149] 1 2 3 , III

Q-I =

6] 1 -5 -2 8 -6 . 2 [ 1 -3 2

~

n Xn

=

It implies that the solution is X n = -3 + 5 x 2n - 2 x 3 n.

o

6.3.1. Application: Recurrence relations

217

As a generalization of a recurrence relation Xn = AXn-1 with a companion matrix A, let us consider a sequence {xn} of vectors in JRk defined by a matrix equation Xn = AXn-1 with arbitrary k x k square matrix A (not necessarily a companion matrix). Such an equation Xn = AXn-1 is called a linear difference equation. A solution to the linear difference equation Xn = AXn-1 is any sequence {x n E Rk : n ~ O} of vectors that satisfies the equation. In fact, a solution of a linear difference equation is reduced to a simple computation of A n if the starting vector XO, called the initial value, is given. We first examine the set of its solutions. Theorem 6.12 For any k x k matrix A , the set ofall solutions ofthe linear difference equation Xn = AXn-1 is a k-dimensional vector space. In particular, the set ofsolutions of the recurrence relation of order k,

with nonzero al and at. is a k-dimensional vector space.

Proof: Since the proofs are similar, we prove this only for the recurrence relation. Let W be the set ofsolutions {x n } of the recurrence relation. Clearly, a sum oftwo solutions and any scalar multiplication of a solution are also solutions. Hence, the solutions form a vector space as shown in Example 3.1(4) . One can show that the function f : W ~ JRk defined by f({ xnD = (Xk-I, . . . , XI, xo) is a linear transformation. Clearly, itis bijective, because any given initial k values of Xo, XI, ... , Xk -I generate recursively a unique sequence {x n : n ~ O} of numbers that satisfies the equation. Hence , dim W = dim JRk = k . (For the linear difference equation Xn = AXn-I, see Problem 6.10) . 0 Problem 6.10 Let Xn = AXn_1 be any lineardifference equationwith a k x k matrix A. For eachbasisvector ej in [ej , . ... ek} in]Rk. thereis a uniquesolutionofxn = AXn-1 withinitial value ej. Showthat such k solutions form a basis for the solutionspace of Xn = AXn_l.

Definition 6.4 A basis for the space of solutions of a linear difference equation or a recurrence relation is called a fundamental set of solutions, and a general solution is described as its linear combination. If the initial value is specified , the solution is uniquely determined and it is called a particular solution. By Theorem 6.12, it is enough to find k linearly independent solutions in order to solve a given linear difference equation or a recurrence relation of order k, and its general solution is just a linear combination of those linearly independent solutions. First, we assume that the square matrix A is diagonalizable with k linearly independent eigenvectors VI , V2 , ... , Vk belonging to the eigenvalues AI, ).,2, . . . , ).,t.

218

Chapter 6. Diagonalization

respectively. Since {VI, vz , ... , vd is a basis for ]Rk, any initial vector Xo can be written as Xo = CIVI + CZV2 + .. . + CkVk · Since AVj = AjVj, we have

and, in general for all n = 1,2, .. . ,

In particular, if the companion matrix A ofthe recurrence relation Xn = al Xn-I + a2xn-2 +- . +akxn-k of order k has k distinct eigenvalues AI, ... , Ab then its solution X n is (as the (k, I)-entry ofthe vector x.) a linear combination of AI' A2, . .. , At. In fact, for each 1 ~ j ~ k, X n = A'J is a solution of the recurrence relation, and these k solutions are linearly independent, so that it forms a fundamental set of solutions by Theorem 6.12. (One can also directly show that X n = A'J satisfies the recurrence relation). Note that all these solutions are geometric sequences. We can summarize as follows. Theorem 6.13 Let A be a k x k diagonalizable matrix with k linearly independent eigenvectors VI, V2 , ... , Vk belonging to the eigenvalues AI, A2, . . . , Ak, respectively. Then, a general solution of a linear difference equation Xn AXn_1 can be written as X n = CIAIvI + C2A2v2 + . .. + CkAtVk

=

for some constants CI, C2, . .. , Ck. In particular; for the recurrence relation Xn = alXn-1

+

a2xn-2

+ .. . +

akXn-b

n 2: k

with nonzero al and ab if the associated companion matrix A has k distinct eigenvalues AI, A2, ... , Ab then its general solution is Xn = CIAI

+

C2A2

+ ... +

CkAt with constants c.'s.

Example 6.9 (Recurrence relation with distinct eigenvalues) Solve the recurrence relation Xn = Xn-I + 7Xn-2 - X n- 3 - 6Xn- 4 for n 2: 4, and also find its particular solution satisfying the initial conditions Xo = 0, XI = I, X2 = -I, X3 = 2. Solution: By Lemma 6.10, the characteristic polynomial of the companion matrix A associated with the given recurrence relation is

6.3.1. Application: Recurrence relations det(AI - A) = ). 4 - ).3 - 7).2

+ ). +

6 = (). - 1)()'

+

I)().

+

219

2)()' - 3),

so that A has four distinct eigenvalues). I = I, ).2 = -1, ).3 = -2, ).4 = 3. Hence, the geometric sequences {1 n}, {(_I)n}, {(_2)n}, {3n} are linearly independent and a general solution is a linear combination of them by Theorem 6.13: Xn

= CJ In + C2(-l)n +

c3(-2t

+

C43n

with constants CI, C2, C3, C4 . And the initial values give if if if if

n n n n

°

= =1 =2 =3

CI

CI CI CI

+ + + +

C2

(_l)l c2 (-1)2 c2

(-1)3 c2

+ + + +

C3

(-2)l c3 (-2)2 c3 (-2)3 c3

+ + + +

= = = =

C4 31c4

32C4 33C4

0, I, -I,

2,

which is a system oflinear equations with a 4 x 4 Vandermonde matrix as its coefficient . H I 'It to h ave CI = 12 5 ' C2 = -g' I C3 = -13' 4 C4 = -40 I matrix. ence, one can so ve and then its particular solution is X

n

5 1 n 4 n In = -1 n - -(-I) - -(-2) - -3 . 12 8 15 40

o

Problem 6.11 Let{an} be a sequence withao = I, al = 2, a2 = 0, and the recurrence relation an = 2an- 1 + an-2 - 2an-3 for n 2: 3. Find the n-th term an,

The next example illustrates how to solve a recurrence relation when its associated companion matrix A has a repeated eigenvalue. Example 6.10 (Recurrence relation with a repeated eigenvalue) Solve the recurrence relation Xn = -2Xn- 1 - Xn-2 for n :::: 2, and also find its particular solution satisfying the initial conditions Xo = 1, XI = 2. Solution: Its characteristic polynomial is

=

and), -1 is an eigenvalue of multiplicity 2. Hence, the geometric sequence {x n } = {(_1)n} is a solution of the recurrence relation . Since its solution space is of dimension 2 by Theorem 6.12, we should find one more solution which is independent of {x n } = {(_I)n}. But, in this case {xn} = {n( _I)n} is also a solution of the recurrence relation . In fact, for n :::: 2, -2Xn_ 1 - Xn-2

= -2(n -

1)(-l)n-1 - (n - 2)(_l)n-2

= n(_l)n = Xn.

220

Chapter 6. Diagonalization

Clearly, two solutions (_l )n} and (n ( _l)n} are linearly independent, and so X n = CI (_I)n +C2n( _I)n is a general solution. The initial condition gives CJ = 1, C2 = - 3 and X n = (_I)n - 3n( _I)n is the particular solution of the recurrence relation . 0 In Example 6.10, we show that A = -1 is an eigenvalue of multiplicity 2, and two geometric sequences (-l)n} and (n(-l)n} are linearly independent solutions of the recurrence relation. As a general case, let us consider a recurrence relation

with nonzero al and ak, and let the associated companion matrix A have s distinct eigenvalues Al, A2, ... , As with multiplicity ml, m2, ... , m s , respectively. For each eigenvalue Ai with multiplicity m i > 1, we have their derivatives f(Ai) . . . = f(m ;-I)(Ai) = 0, where f(A) = Ak - alA k- 1 - a2Ak-2 - . . . - ak_IA - ak is the characteristic polynomial of A. Hence, for a new function FI (A) defined by

= ro» =

FI (A)

= =

An-k f(A) An -aIA n- 1 _ . . . -ak_IA n-k+1 -ak An- k ,

one can see that the derivative F{ (A) = A = Ai. That is, n).n - al (n - 1)),

n-I -

... -

°

at A = Ai and then F2(A) = AF{ (A) = Oat

ak-I (n - k

+ 1)),

n-k+1 - ak(n - k))' n-k

= O.

°

It shows that X n = nAi is also a solution of the recurrence relation. Inductively, Fj(A) = AFj_I(A) = at A = Ai shows that X n = nj-IAi is also a solution for

j = 1, . . . , mi. Therefore, one can conclude that X n = Ai, nAi, .. . , nm;-I Ai are m, linearly independent solutions of the recurrence relation. Getting together such m ; linearly independent solutions for each eigenvalue Ai, one can get a fundamental set of solutions of the recurrence relation. In summary, we have the following theorem. Theorem 6.14 For any given recurrence relation

with nonzero al and ak, let the associated companion matrix A have s distinct eigen values AI, A2, . . . , As with multiplicity ml , m2, ... , m., respectively. Then

{{An, {nAi}, .. . , {nm;-I An I i

= 1,2, ...

-l

forms a fundamental set ofsolutions, and a general solution is a linear combination of them.

6.3.2. Application: Difference equations

221

Problem 6.12 Prove that if ).. = q is an eigenvalueof a recurrence relation with multiplicity m, then the m solutions {qn} , {nqn}, ... , {nm-Iqn} are linearly independent. Problem 6.13 Solve the recurrence relation Xn = 3Xn_1 - 4xn-3 for n XQ

= 1, xI = X2 = I?

~

3. What is it if

6.3.2 Linear difference equations A linear difference equation Xn = AX n_1 represents sometimes mathematical models of dynamic processes that change over time and are widely used in such areas as economics, electrical engineering, and ecology. In this case, vectors Xn give information about a dynamic process when time n passes. In this concept, a linear difference equation Xn

= AXn-l, n = 1,2, . . .

with a square matrix A is also called a discrete dynamical system. If the matrix A is a companion matrix , then it is nothing but a recurrence relation . If the matrix A is diagonal with diagonal entries AI, ... , At. then, by Theorem 6.13, a general solution of Xn AXn-1 is

=

with some constants CI , C2, .•. , ct . Throughout this section, we are concerned with only a linear difference equation X n = AXn-I, n = 1, 2, ... for a diagonalizable matrix A, because if A is not diagonalizable, it is not easy in general to solve it. However, it can be done after reducing A to a simpler form called the Jordan canonical form and this case will be discussed again in Chapter 8. Let A be a k x k diagonalizable matrix with k linearly independent eigenvectors VI, ... , Vk belonging to the eigenvalues AI, .. . , Ak , respectively. Then, by Theorem 6.13 again, a general solution of X n = AXn_1 is

with some constants CI, C2, • •• , Ck. Hence, if IAi I < 1 for all i, then the vector Xn must approach the zero vector as n increases . On the other hand, if there exists an eigenvalue Ai with IAi I > I , this vector Xn may grow exponentially in magnitude. Therefore, we have three possible cases for a dynamic process given by Xn = AXn-I , n = 1, 2, . .. . The process is said to be (1) unstable if A has an eigenvalue Awith IAI > I, (2) stable if IAI < 1 for all eigenvalues of A, (3) neutrally stable if the maximum value of the eigenvalues of A is 1.

222

Chapter 6. Diagonalization

To determine the stability of a dynamic process, it is often necessary to estimate the (upper) bound for the absolute values of the eigenvalues of a square matrix A. To do this, for any square matrix A = [aij] of order k, let k

1:::

i

s k},

laijl 1:::

j

s k},

R(A) =

max{Ri(A) = L laijl : j=1

c(A) =

max{cj(A) = L

k

s, =

Ri(A)

-Iaul.

i=l

Theorem 6.15 (Gerschgorin's Theorem) For any square matrix A oforder k, every eigenvalue ); of A satisfies I).. - au I ::: se for some 1 :::: l ::: k. Proof: Let ).. be an eigenvalue with eigenvector x = [Xl Xz . .. xkf. Then L'=l aijX j = sx; for i = 1, . .. , k. Take a coordinate xe of x with the largest absolute value. Then clearly xe =1= 0, and

I).. - aullxel = IAxe -

auxel

=

Since

Ixel

> 0,

I).. -

laejllxel =

LaejXj ::: L

Ne

selxel ·

Ne

o

aul :::: sr.

Corollary 6.16 For any square matrix A oforder k, every eigenvalue); of A satisfies 1)..1 ::: min{R(A), c(A)}. Proof: Note that 1)..1::: I).. - aul + since Xis also an eigenvalue of AT,

laul ::: Se + laul = Re(A) 1)..1 :::: R(A T) = c(A) .

:::: R(A) . Moreover, 0

Example 6.11 (Stable or unstable) Solve a discrete dynamical system xn = AXn-l, n = 1,2, . . . , where ( 1) A _ [0.8 0.0] 0.0 0.5 '

( 2) A _ [1.2 0.0]

-

0.0 0.6

'

(3) A

1[3 1]

="2

1 3

.

Solution: (1) Clearly, the eigenvalues of A are 0.8 and 0.5 with eigenvectors VI [

~]

and Vz = [

~]

, respectively. Hence, its general solution is

It concludes that the system x, = AXn-1 is stable. (See Figure 6.1.)

=

6.3.2. Application: Difference equations

223

10

Figure 6.1. A stable dynamical system

(2) Similarly, one can show that Xn

=

CI

(1.2)n [

in which the system is unstable if Ci

~ ] + c2(O.6)n [ ~ ]

,

i= O. (See Figure 6.2.) 10

5

-5

-10

Figure 6.2. An unstable dynamical system

(3) The eigenvalues of A are I and 2 with eigenvectors VI = [ [

~

] , respectively. Hence , a general solution of Xn =

AXn_1

is

~

] and V2

=

224

Chapter 6. Diagonalization

It is unstable if C2 =1= 0: For example, if CI Xo

= [ ~ ] , XI = [

i],

X2

= -1 , C2 = 1, then

= [ ; ] , x3 = [ ~ ] , X4 = [

~~ ] , ... .

D

The following example is a special type of a discrete dynamical system, called a Markov process. Example 6.12 (Markov process with distinct eigenvalues) Suppose that the population of a certain metropolitan area starts with Xo people outside a big city and YO people inside the city. Suppose that each year 20% of the people outside the city move in, and 10% of the people inside move out. What is the 'eventual' distribution of the population?

Solution: At the end of the first year, the distribution of the population will be XI { YI

= =

0.8 Xo + 0.1 YO

0.2 Xo + 0.9 YO .

Or, in matrix form, XI

= [ ;~ ] = [~:~

~:~] [ ;~ ] = Axo.

Thus if Xn = (x n , Yn) denotes the distribution in the metropolitan area of the popuAnxo. lation after n years, we get Xn

=

In this formulation , the problem can be summarized as follows : (1) The entries of A are all nonnegative because the entries of each column of A represent the probabilities of residing in one of the two locations in the next year, (2) the entries of each column of A add up to 1 because the total population of the metropolitan area remains constant.

Now, to solve the problem, we first find the eigenvalues and eigenvectors of A. They are A.I = 1, A.2 = 0.7 and VI = (1, 2), V2 = (-1, 1), respectively, so that its general solution is n[ 1]

xn = CJ (1)

2

+

n [-1]

c2(0.7)

1

-c2(0.7)n = [CJ(1)n 2CJ (I)" + c2(0.7)n

But, the initial condition Xo = (xo, YO) gives CJ that

x

=~+

~ and

C2

-+

(~O + ~O) [;

l

as n -+

00.

.

= -to +

(~O + ~O) [; ] + (-~XO + ~o) (0.7)n [ -~

n

]

]

~, so

6.3.2. Application: Difference equations

225

Note that, since X n + Yn = a is fixed for all n, the process in time remains on the straight line x + Y = a. Thus , for a given initial total population a, the eventual ratio X n : Yn of the populations tends to I : 2 which is independent of the initial distribution. For initial populations of a = 3,4,5,6 million people, the processes are shown in Figure 6.3. 0

Figure 6.3. A Markov process AnXO

Recall that the matrix A in Example 6.12 satisfies the following two conditions: (1) all entries of A are nonnegative, (2) the entries of each column of A add up to 1. Such a matrix A is called a Markov matrix (or, a stochastic matrix). In general , a dynamical system Xn = AXn-l with a Markov matrix A is called a Markov process. The next theorem follows directly from Gerschgorin's Theorem 6.15 . Theorem 6.17 If Ais an eigenvalue ofany Markov matrix A, then

IAI

:s 1.

In fact, every Markov matrix has an eigenvalue A = 1. To show this, let A be any Markov matrix . Then the entries of each column of A add up to 1. It means that the sum of each column of A - I is 0, or equivalently rl + r2 + ... + r n = 0 for the row vectors ri of A - I. This is a nontrivial linear combination of the row vectors, and so these row vectors are linearly dependent, so that det(A - l) = O. Consequently, A = 1 is an eigenvalue of A. If x is an eigenvector of A belonging to A = I, then x. This is called the equilibrium state. Ax

=

Theorem 6.18

If A is a stochastic matrix, then

=

(1) A 1 is an eigenvalue of A, (2) there exists an equilibrium state x that remains fixed by the Markov process. Problem 6.14 Suppose that a land use in a city in 2000 is

Residential Commercial Industrial

XO = 30%, YO 20%,

Zo

= = 50%.

226

Chapter 6. Diagonalization

Denote by Xb Yb Zk the percentage of residential, commercial, and industrial, respectively, after k years, and assume that the stochastic matrix is given as follows:

[

Xk+l] Yk+l Zk+l

=

[ 0.8

0.1 0.0] [ Xk ] Yk . 0.1 0.7 0.1 Zk 0.1 0.2 0.9

Find the land use in the city after 50 years.

Problem 6.15 A car rental company has three branch offices in different cities. When a car is rented at one of the offices, it may be returned to any of three offices. This company started business with 900 cars, and initially an equal number of cars was distributed to each office. When the week-by-week distribution of cars is governed by a stochastic matrix

A

=

0.6 0.1 0.2] 0.2 0.2 0.2 , [ 0.2 0.7 0.6

determine the number of cars at each office in the k-th week. Also, find lim Ak. k-vco

6.3.3 Linear differential equations I A first-order differential equation is a relation between a real-valued differentiable function yet) of time t and its first derivative Y'(t) , and it can be written in the form

, Y (t)

df(t)

= dt = f(t, y(t».

As a special case, if it can be written as y' (t) = get) for an integrable function get) , then its solution is yet) J g(t)dt + c for a constant c. However, it is difficult to solve it in most other cases such as y'(t) = sin( ty 2). As another case.If y'(r) = 5y(t), then it has a general solution y = ce 5t , wherec is an arbitrary constant. Ifan additional condition yeO) = 3, called an initial condition, is given, then its solution is y = 3e5t , called a particular solution. The second case can be generalized to a system of n linear differential equations with constant coefficients, which is by definition of the form

=

I

y~ = Y2 =

a 2lYI

+ +

a12Y2 a22Y2

+ +

+ +

alnYn a2nYn

y~ =

anlYI

+

an2Y2

+

+

annYn,

allYl

where Yi = Ii (t) for i = 1,2, . . . , n are real-valued differentiable functions on an interval I = (a, b). In most cases, one may assume that the interval I contains 0, and some initial conditions are given as Ii (0) = di at 0 E f. Let y = [fl h . . . fnf denote the vector whose entries are the differentiable functions y, = J;'sdefinedonanintervalf = (a , b) : thus, for each t E f,y(t) = [fl (t) h (t) . . . fn (t) f is a vector in jRn. Its derivative is defined by

6.3.3. Application: DitTerentialequations I

f{ ]

y'

~ [ )~

f{(t) ]

f~(t)

, Y (t) =

or

227

:

[

.

f~(t)

If A denotes the coefficient matrix of the system of linear differential equations, the matrix form of the system can be written as Y'

= Ay,

or y'(t)

= Ay(t)

for all tEl.

An initial condition is given by Yo = y(O) = (dt, .. . , dn ) E jRn. A differentiable vector function y(t) is called a solution of the system y'(t) = Ay(t) if it satisfies the equation. In general, the entries of the coefficient matrix A could be functions. However, in this book, we restrict our attention to the systems with constant coefficients. Example 6.13 Consider the following three systems:

yi { y~

=

=

=

2Yt- 3Y2 {Yi 2Yl + Y2 ' y~

=

2Y2 tYt +t { Yi = 3 2Yl + t Y2' y~ =

3yi

2Yt sin yj + 5Y2

The first two systems are linear, but the coefficients of the second are functions of t. The third is not linear because of the terms and sin YI. 0

yi

Example 6.14 (Population model) Let p(t ) denote the population of a given species like bacteria at time t and let r (t , p) denote the difference between its birth rate and its death rate at time t. If r (t , p) is independent of time t, i.e., it is a constant, then d~~t) = rp(t) is the rate of change of the population and its general solution is p(t) = p(O)e r t • 0 Some basic facts about a system y' (t) = Ay(t) , A is any n x n matrix, oflinear differential equations defined on I = (a , b) are listed below. (I) (The fundamental theorem for a system of linear differential equations) The system y' (t) = Ay(t) always has a solution. In addition, if an initial condition Yo is given, then there is a unique solution y(t) on I which satisfies the initial condition. If y = [YI Y2 ... Ynf is a solution on I, then it draws a curve in jRn passing through the initial vector Yo = y(O) = (dl' ... , dn ) as t varies in the interval I . (II) (Linear independence ofsolutions) Let {YI, .. . , Yn} be a set of n solutions of the system y' = Ay on I. The linear independence of the solutions YI, . . . , Yn on I is defined as usual: if c lYI + ... + CnYn = 0 implies Cl = ... = Cn = O. Or, equivalently, they are linearly dependent if and only if one of them can be written as a linear combination of all the others. Define

Y (t)

= [Yt (t)

... Yn(t)]

=

ru(t)

Y12(t )

:

:

Y21(t ) Y22 (t)

[

Ynl (t ) Yn2(t )

::: ~~~~~ ] Ynn(t )

for i

«t.

228

Chapter 6. Diagonalization

If the n solutions are linearly dependent, then det Y (t) = 0 for all i e I, Or, equivalently, if det Y (t) f. 0 for at least one point tEl, then the solutions are linearly independent. However, the next lemma says that det Y (t)

f. 0 for all t

e I

if and only if det Y (t) f. 0 at one point t

e I,

The determinant of Y (t) is called the Wronskian ofthe solutions, denoted by W (t) = det Y (t) for tEl. Note that the Wronskian W (t) is a real-valued differentiable function on I .

o

Lemma6.19 W'(t) = tr(A)W(t). Proof: W'(t)

=

(det Y(t))'

=

L sgn(a)(Yla(l) ' " Yna{n»'

aeSn

=

Lsgn(a)Y~a{I)' " Yna{n)

=

i: i: I

=

=

Y;j Yij =

J

+ ... +

i: (t I

Lsgn(a)Yla(l) '" Y~a{n)

n

Y;j [adj Y]ji) =

L[Y' . adj flii

J

tr(Y'. adj Y) = tr(A · (Y . adj Y)

tr(det Y(t)A)

= tr(A)W(t),

where Yij (t) is the cofactor of Yij, and the equalities in the last two lines are due to the fact that Y' (t) yet) adj yet)

= =

[y~ (t) ...

det Y(t)In

= A[YI (t) = W(t)In·

y"(t)]

.. . Yn(t)]

= AY(t),

o

From Lemma 6.19, it is clear that the Wronskian W(t) is an exponential function of the form Wet) = cetr{A)t with an initial condition W(O) = c. It implies that the value of Wet) is zero for all t or never zero on I depending on whether or not c = O. Thus, we have the following lemma.

Lemma 6.20 Let {Yl, Y2, . . . , Yn} be a set ofn solutions ofthe system Y' = Ayon I , where A is any n x n matrix. Then the following are equivalent. (1) The vectors Yl, Y2 , . . . , Yn are linearly independent. (2) Wet) f. Ofor some t, that is, Yl(t), Y2(t), ... , Yn(t) are linearly independent in JRn for some t. (3) W (t) f. 0 for all t , that is, Yl(t), Y2(t) , . . . , Yn (t) are linearly independent in JRn for all t.

6.3.3. Application: Differential equations I

229

(III) (Dimension ofthe solution space) Clearly, the set of all solutions of Y' (t) = Ay(t) is a vector space. In fact, for any two solutions YI, Y2 of the system Y' (t) = Ay(t), we have (cIYI + c2yd = CIYl + C2Y2 = cIAYI + C2 AY2 = A(CIYI Thus, CIYI + C2Y2 is also a solution for any constants Cj's.

+

C2Y2) .

Let {el, e2, ... , en} be the standard basis for R". For each ei, there exists a unique solution Yi of Y'(t) = Ay(t) such that Yi (0) = ei , by (I). All such solutions YI, Y2, ... , Y« are linearly independent by Lemma 6.20. Moreover, they generate the vector space of solutions. To show this, let Y be any solution. Then the vector y(O) can be written as a linear combination of the standard basis vectors: say, Yo = CI el + C2e2 + ... + cnen. Then , by the uniqueness of the solution in (I), we have y(t) = cm (t) + C2Y2(t) + ... + cnYn(t). This proves the following theorem. Theorem 6.21 Forany n x n matrix A, the set ofsolutions ofa system Y' = Ay on

I is an n-dimensional vector space. Definition 6.5 A basis for the solution space is called a fundamental set of solutions. The solution expressed as a linear combination of a fundamental set is called a general solution of the system . The solution determined by a given initial condition is called a particular solution. By Theorem 6.21, it is enough to find n linearly independent solutions to solve a system Y' = Ay on I , and then its general solution is just a linear combination of those linearly independent solutions . This may be considered in three steps: (1) A is diagonal, (2) A is diagonalizable, and finally (3) A is any square matrix . (1 ) First suppose that A is a diagonal matrix D. Then y'(t) = Ay(t ) is

This system is just n simple linear differential equations of the first order:

y;(t) = AiYi(t),

i=I,2, ... , n,

and their solutions are trivial: Yi(t) = CieA;1 with a constant c; for i = 1 ,2 , . .. ,n. On the other hand, the diagonal matrix A has n linearly independent eigenvectors ej , .. . , en belonging to the eigenvalues AI, ... , An , respectively. One can see that Yi(t) = eAj1ei is a solution of the systemy'(t) = Ay(t) for i = 1,2 , . .. , n. Moreover, at t = 0, the solution set (yl(t) , . .. , Yn(t)} = {el , ... , en} is linearly independent. Hence , by Lemma 6.20, a general solution of the system y'(t) = Ay(t) is

y(t)

= ClYI (t) + . ..+

cnYn(t) = cleA\/ e l

+ . .. +

cneA./en

230

Chapter 6. Diagonalization

with constants Ci'S . Or in matrix notation,

yet) =

[

y'~t) ] :

A1t

=

[ e

;., ] [ ::] =

0

Yn(t)

,'D yo,

where et D is by definition,

o Example 6.15 (A predator-prey problemas a modelofa system ofdifferentialequations) One of the fundamental problems of mathematical ecology is the predator-prey problem. Let x(t) and yet) denote the populations at time t of two species in a specified region, one of which x preys upon the other y. For example, x(t) and yet) may be the number of sharks and small fishes, respectively, in a restricted region of the ocean. Without the small fishes (preys) the population of the sharks (predators) will decrease, and without the sharks the population of the fishes will increase. A mathematical model showing their interactions and whether an ecological balance exists can be written as the following system of differential equations: X ' (t) {

y'(t)

= =

a x(t) - b x(t)y(t) -c yet)

+ d x(t)y(t).

In this equation, the coefficients a and C are the birth rate of x and the death rate of y, respectively. The nonlinear x(t)y(t) terms in the two equations mean the interaction of the two species such as the number of contacts per unit time between predators and prey, so the coefficients b and d are the measures of the effect of the interaction between them . A study of this general system of differential equations leads to very interesting developments in the theory of dynamical systems and can be found in any book on ordinary differential equations . Here, we restrict our study to the case of x and y very small, i.e., near the origin in the plane. In this case, one can neglect the nonlinear terms in the equations, so the system is assumed to be given as follows: x'(t) ] = [ y' (t)

[a0

0] [

-c

x(t) ] .

yet)

Thus, the eigenvalues are Al = a and AZ = -c with their associated eigenvectors and ez, respectively. Therefore, its general solution is

[ ;~~~ ] = [ ~~::ct ] = [e~t e~ct] [ ~~ ] = c,eate, + cze-ctez.

e, 0

(2) We next assume that a matrix A in the system y'(t) = Ay(t) is diagonalizable, that is, it has n linearly independent eigenvectors VI, • •• , Vn belonging to the

6.3.3. Application: Differential equations I

231

= [VI

' " vn]

eigenvalues AI , .. ., An, respectivel y. Then the basis-change matrix Q diagonalizes A and

0]

A = QDQ-I = Q [AI...

o

Q_I .

An

Thus the system becomes Q-I y' = DQ-I y. If we take a change of variables by the new vector x = Q-I y (or y = Qx), then we obtain a new system

x' = Dx, with an initial condition Xo = Q-I yO= (cj , . . . , cn). Since D is diagonal , its general solution is x = elDxo CjeAllel + .. . + CneAnlen.

=

Now, a general solution of the original system y'

Y

=

Qx

=

QeIDQ-I yO

=

[v,

=

cleA11vI

= Ay is

-:

,:, ] [ ;: ]

+ c2eA21v2 + ... + cneAnlvn.

Remark: One can check directly that each vector function Yi (t ) = eAilvi is the particular solution ofthe system with the initial condition Yi (0) = Vi for i = I, , n. Since Yi (0) = Vi for i = 1, . . . , n are linearly independent, eA;1 Vi for i = 1, ,n form a fundamental set of solutions. Thus, we have obtained the following theorem :

Theorem 6.22 Let A be a diagonalizable n x n matrix with n linearly independent eigenvectors VI , V2 , .. . , Vn belonging to the eigenvalues AI, A2 , ... , An, respectively. Then, a general solution of the system of linear differential equations y'(t) = Ay(t) is

with constants CI, C2, . . . , Cn ' Note that a particular solution can be obtained from a general solution by determining the coefficients depending on the given initial condition.

Example 6.16 (y' = Ay with a diagonalizable matrix A) Solve the system of linear differential equations

4Y2 IIY2 4Y2

+ + +

4Y3 12Y3 5Y3.

232

Chapter 6. Eigenvectors and Eigenvalues

Solution: In matrix form, the system may be written as y'

A=

[

5 -4 4]

12 -11 12 4 -4 5

= Ay with

.

The eigenvalues of A are Al = A2 = 1, and A3 = -3, and their associated eigenvectors are vj = (1, 1,0), V2 = (-1 ,0,1) and V3 = (1,3, 1), respectively, which are linearly independent (see Problem 6.9). Hence, by Theorem 6.22, its general solution is

o (3) A system y' = Ay of linear differential equations with a non-diagonalizable matrix A will be discussed in Section 6.5.

6.4 Exponential matrices Just like the Maclaurin series ofthe exponential function eX, we define the exponential ofa matrix. Definition 6.6 For any square matrix A, the exponential matrix of A is defined as the series 00 Ak A2 A3 eA = = I + A + - + - + ... . k=O k! 2! 3!

L-

That is, the exponential matrix e A is defined to be the (entry-wise) limit of the sequence:

[e A ] IJ. . =

lim [Lkm=O Ak:] for all i, j. • ij

m-->oo

Example 6.17 If

D

= [AI

0],

o

then D k

An

=

A1 [ o

Thus, the exponential matrix eD is

Ak

00

00

Ak

k=O

k!

eD=L-=

L k~

k=O

o

o

for any k

~

0.

6.4. Exponential matrices

233

0

which coincides with the definition given on page 230.

Practically, the computation of e A involves the computation of the powers A k for all k 2: 1, and hence it is not easy in general. Nevertheless, one can show that the limit e A exists for any square matrix A. Theorem 6.23 For any square matrix A, the matrix e A exists. In other words, each (i, j)-entry ofe A is convergent. Proof: Since A has only n 2 entries, there is a number M such that laij I =s M for all (i, j)-entries aij of A. Then one can easily show that [Ak]ij =s n k- t M k for all k and i, j . Thus 00 1 1 [e A ] . . < '"' _n k - t Mk = _e nM IJ -

so by the comparison test, each entry of eA

Example 6.18 If A =

= =

[6 ; l

00

0

2

2

1~3]+ ...

It is a good exercise to calculate the missing entry Problem 6.16 Let A

is absolutely convergent for

then

+A + ~ A +.. . [6 ~]+[~ ;]+~[~

I

Ak

= L k! n=O

any square matrix A.

eA

n'

~k' k=O •

=

[~~l

* directly from the definition.

0

= [~ ~] . Find k~~ A k if it exists. (Note that the matrix A is not

diagonalizable.)

The following theorem is sometimes helpful to compute eA. Theorem 6.24 Let At, A2, A3, .. . be a sequence of m x n matrices such that lim Ak = L. Then k-+oo

lim BAk = BL

k-+oo

and

lim AkC = LC

k-+oo

for any matrices B and C for which the products can be defined.

234

Chapter 6. Eigenvectors and Eigenvalues

Proof: By comparing the (i , j)-entries of both sides lim [BAk]ij = lim (t[B]il[Ak]£j)

k->oo

k->oo

= t[B]il lim [Akl£j

£=1

£=1

k-scx:

m

=~)B]il[L]£j =

[BL]ij,

£=1

we get lim BAk = BL. Similarly lim AkC = LC. k->oo

D

k-e-tx:

For example, if A is a diagonalizable matrix and Q -I A Q = D is diagonal for some invertible matrix Q, then, for each integer k ::: 0, A k = Q D k Q - I and lim

k->oo

)..1

lim A k = Q (lim D k ) Q-I = Q

k->oo

k->oo

Thus, lim A k exists if and only if lim k->oo

".

[

k-vtx:

o

)..f exists for i =

1,2, . .. , n.

Also, by Theorem 6.24,

whose computation is easy.

Example 6.19 (Computing e A for a diagonalizable matrix A) Let A

=!

Then its eigenvalues are 1 and 2 with associated eigenvectors UI = [ -

~

[~ l

respectively. Thus A = QDQ-I with D =

[i ;].

] and U2 =

[~ ~] and Q = [-~ ~

Therefore,

l

The following theorem shows some basic properties of the exponential matrices, whose proofs are easy, and are left for exercises.

Theorem 6.25 (1) eM

B

= eAe B provided that AB = BA.

6.5.1. Application: Differential equations II

235

(2) e A is invertible for any square matrix A, and (e A) -I = e" A. (3) e Q- 1AQ = Q-IeAQforany invertible matrix Q. (4) If AI, A2 , .. . , An are the eigenvalues ofa matrix A with their associated eigenvectors VI, V2, . •. , Vn , then eAi 's are the eigenvalues ofe A with the same associated eigenvectors Vi 'S for i = I, 2, . . . , n. Moreover, det e A = eAI ••. eA. = etr(A) f:. 0 for any square matrix A. Problem 6.17 Prove Theorem 6.25. Problem 6.18 Finish the computation of e A for the matrix A in Example 6.18. Problem 6.19 Prove that if A is skew-symmetric, then e A is orthogonal.

In general, the computation of eA is not easy at all if A is not diagonalizable, However, if A is a triangular matrix, it is relatively easy as shown in the following example. Example 6.20 (Computing e A for a triangular matrix of the form A = >..I For A

= [~ ~], compute eA.

Solution: Write A = 21

+

N with N =

[~ ~

l

+

N)

Since (2I)N = N(2I), by

Theorem 6.25(1), e A = e2l eN . From the direct computation of the series expansion, we get e 21 = e 2I . Moreover, since N k = 0 for k ~ 2, eN = 1+ N + ~~ + ... =

I

+

N =

[~ ~

l

Thus,

A 2 e = e (l

+

Problem6.20 Compute e A for A

2]. 2 N) = e [ 0I 31]=[eo2 3 2 ee

=

o

2 3 0] 0 2 3 . [ 002

6.5 Applications continued 6.5.1 Linear differential equations II One of the most prominent applications of exponential matrices is to the theory of linear differential equations. In this section, we show that a general solution of y' (t) = Ay(t) is of the form y(t) = elAyo .

236

Chapter 6. Diagonalization

Lemma 6.26 For any t

E

IR and any square matrix A, the exponential matrix

etA = 1+ tA

2

3

2!

3!

+ ~A2 + ~A3 + ...

d is a differentiable function oft, and dt etA = Ae tA.

Proof: By absolute convergence of the series expansion of etA one can use term by term differentiation, i.e.,

d tA -e = dt

=

d t2 t3 -(l+tA+-A 2+-A3+ ... ) dt 2! 3!

A +t A 2 +-t2A3 21

+ .. .

= A e tA .

o

=

As a direct consequence ofLemma 6.26, one can see that y(t) etAyo is a solution of the linear differential equation y' = Ay. In fact, by taking the initial vector Yo as the standard basis vector ei, 0 ::: i ::: n , the n columns of e/ A become solutions and they are clearly linearly independent, so they form a fundamental set of solutions. Hence, we have the following .

Theorem 6.27 For any n x n matrix A, the linear differential equation y' = Ay has a general solution y(t) = etAyo. In particular, if A is diagonalizable, say Q-I A Q = D is diagonal with a basischange matrix Q = [VI' • . vn ] consisting of n linearly independent eigenvectors of A belonging to the eigenvalues Ai'S, then a general solution of a system y' = Ay is

y(t)

= etAyo =

=

etQDQ-t Yo

Qe tD Q-I yO A1t

= =

[VI • . .

CJeAt/vl

vn ]

+

[

0 ] [

e

o C2eA2/v2

eA. /

+ ...+

CI ] Cn

cneA.tv n .

In fact, {Yi (t) = e Aj / Vi : i = I , ... , n} forms a fundamental set of solutions, and the constants (CJ, .•• , cn) = Q-l yO can be determined if an initial condition is given. Note that this just rephrases Theorem 6.22.

Example 6.21 (y' = Ay for a diagonalizable matrix A) Solve the system

[~l~~~ ] [~ b] [ ~~g~ J. =

with initial conditions YI (0) = I, Y2(0) =

o.

or

6.5.1. Application: Differential equations II

Solution: (l ) The eigen values of A = eigenvectors VI = [1

If and V2 =

are X]

=

I andA2 = -1 with associated

If, respectively.

[I -

[~ _~

l

Q- IAQ

=

1 1 ] [ e0 e-0] I Q-I Yo = CJe [ 1 -1

1 [

(2) By setting Q

= [VI V2] =

b]

[~

237

[b

_~] = D.

(3) A general solution yet) = elAyo is

elAyo =

eIQDQ-lyO = QeIDQ -I yO l

11 ]

+

C2 e_I

[

-11]

with constants CI , C2. The initial conditions YI(O) = 1, Y2(0 ) = 0 determine CI C2 = so that

1,

D

Problem 6.21 Solvethe system { Y}

Y2

Problem 6.22 Solve the system

I

y'

4YI

Y3

- 2YI -2YI

Y~

+

Y2

+

Y3

+

Y3,

and find the particular solution of the system satisfying the initial conditions YI (0) -I, Y2(0)

= I,

Y3(0)

= o.

If A is not diagonalizable, then it is not easy in general to compute etA directly. However, one can still reduce A to a simpler form called the Jordan canon ical form, which will be introduced in Chapter 8, and then the computation of el A is made relatively easy. The following example shows that the computation of eA is possible for some triangular matrices A even if they are not diagonalizable. A general case will be treated again in Chapter 8.

Example 6.22 (y' = Ay for a triangular matrix A = AI + N) Solve the system = Ay of linear differential equations with initial condition yeO) = Yo, where

y'

A=[~

.I

Yo = [ : ] .

Solution: First note that A has an eigenvalue Aof multiplicity 2 and is not diagonalizable. One can rewrite A as

Then, by the same argument as in Example 6.20,

238

Chapter 6. Diagonalization

[6

etA = et(A/+N ) = eAt etN = eAt

~] .

Therefore, the solution is

_

_ tA

At

y - e Yo - e

[1

t] [ a ] _ [ (a + bt)e b beAt

At

] _ At [ a ] - e b

0 1

In terms of components , Yl

= (a +

bt)e At, Y2

+

te

At [

b ]

O'

= be",

D

Example 6.23 (y' = Ay with A having complex eigenvalues)Find a general solution of the system y' = Ay, where a A= [ b

-b] a

'

Solution: Note that the eigenvalues of A are a ± ib , which are not real. However, one can compute etA directly without using diagonalization. We first write A as

-b] a a [1 0] + b [0 -1]

A = [ ab

0 1

=

1

0

= al

+

bJ.

Then clearly I J = J I and et A = eatI+bt J = eatebt J . Since J2 =

[-1 0] 0 -1

= -I,

01]

J3 = [ - 1 0

= -J,

one can deduce Jk = Jk+4 for all k = 1,2, .. . , and

bt J

(bt)2 J2

(bt)3J3

(bt)4 J4

1+-+--+--+--+ ..· I! 2! 3! 4! - (bt)+ -(bt)3 - -(bt)5 + ..· ] 3! 5! (bt)2

(bt)4

2!

4!

1---+--- .. · _ -

- sinbt ] cosbt

[cosbt sinbt

for any constant band t . Thus, a general solution of y' = Ay is Y=

i Ac =

b J = eat cosbt eatetc . b [ SIn

t

- sin bt ] [ Cl cosbt C2

] •

In terms of components,

Yl = { Y2 =

eat(Cl cos bt - C2 sin bt) eat(CJ sin bt + C2 cos bt).

D

6.5.1. Application: Differential equations II

Problem 6.23 Solvethe system y' for (1) A

~ [~ =;J.

YO

~

239

= Aywithinitialcondition yeO) = Yo by computing etAyo

[ :

1

(2) A

~

U~ j l

Yo

~

[:] •

Remark: Consider the n-th order homogeneous linear differential equation dny dt n

+

dn -ly al dt n- l

+

d n-2y a2 dt n-2

+ ... +

anY

= 0,

where a, are constants and yet) is a differentiable function on an intervall = (a, b). A fundamental theorem of differential equations says that such a differential equation has a unique solution yet) on I satisfying the given initial condition: For a point to in I and arbitrary constants Co , .. . , Cn-I , there is a unique solution y yet) of the equation such that y(to) = co, y'(to) = CI, . . . ,y(n-l)(to) = Cn-I . This can be confirmed as follows : Let

=

Yl =

y,

Y2 =

y' =

dr '

=

y" =

dr'

y(n-I)

=

Y3

Yn =

dYI

dY2

dYn-l dt

Then the original homogeneous linear differential equation is nothing but dYn

dr =

dny dtn = -alYn - a2Yn-l -

. . . - an-lY2 - anYl·

In matrix notation,

y(I)=

Y~ ] [

~

-an-l

o o

=

o

-an

0 0

[J. ]

= Ay(I),

o

0

which is just a system of linear differential equations with a companion matrix A. It is treated in Section 6.3.3 (see Theorem 6.22). Therefore, the solution of the original differential equation is just the solution of y'(t) = Ay(t), which is of the form: yet) = cle Att

+ . .. +

cneAnt

if A has distinct eigenvalues AI, . . . , An. In Chapter 8, we will discuss the case of eigenvalues with multiplicity.

240

Chapter 6. Eigenvectors and Eigenvalues

6.6 Diagonalization of linear transformations Recall that two matrices are similar if and only if they can be the matrix representations of the same linear transformation, and similar matrices have the same eigenvalues . In this section, we aim to find a basis ex so that the matrix representation of a linear transformation with respect to ex is a diagonal matrix. First, we start with the eigenvalues and the eigenvectors of a linear transformation. Definition 6.7 Let V be an n-dimensional vector space, and let T : V ~ V be a linear transformation on V . Then the eigenvalues and eigenvectors of T can be defined by the same equation, Tx = AX, with a nonzero vector X E V. Practically, the eigenvalues and eigenvectors of T can be computed as follows: Let ex = {VI , V2, ... , vn } be a basis for V. Then the natural isomorphism 4> : V ~ IRn identifies the associated matrix A = [Tl a : IRn ~ IRn with the linear transformation T : V ~ V via the following commutative diagram. T

V

~l~ IRn

·V

~l~ A = [Tl a

• IRn

Now, the eigenvalues of T are those of its matrix representation A = [Tl a because if [Tl a is similar to [Tlp for any other basis f3 for V, then their eigenvalues are the same by Theorem 6.3. For eigenvectors of T, note that X (Xl, X2, . . . ,Xn ) E IRn is an eigenvector of A belonging to A (Ax = AX) if and only if 4>-1(X) = V = XlVI + X2V2 + ... + XnV n E V is an eigenvector of T (T(v) = AV), because the commutativity of the diagram shows

=

[T(v)la

= [Tla[vl a = Ax = AX = [Avla .

Therefore, if XI. X2, . . . xk are linearly independent eigenvectors of A = [T]a, then 4>-I(XI), 4>-I(X2), ... , 4>-1(Xk) are linearly independent eigenvectors of T. Hence, the linear transformation T has a diagonal matrix representation if and only if it has n linearly independent eigenvectors, by Theorem 6.7. I

The following example illustrates how to find a diagonal matrix representation of a linear transformation on a vector space .

Example 6.24 Let T : P2(1R)

~

P2(1R) be the linear transformation defined by

(Tf)(x) = f(x) +xf'(x) + f'(x). Find a basis for P2(1R) with respect to which the matrix of T is diagonal.

6.6. Diagonalization of linear transformations

241

Solution: First of all, we find the eigenvalues and the eigenvectors of T. Take a basis for the vector space P2(lR), say ex {I, x, x 2}. Then the matrix of T with respect to ex is

=

[Tl a =

1 1 0] 0 2 2 , [ 003

which is upper triangular. Hence, the eigenvalues of T are AI = 1, A2 = 2 and A3 = 3. By a simple computation, one can verify that the vectors XI = (l, 0, 0), X2 (1, 1, 0) and X3 (1, 2, 1) are eigenvectors of [Tl a in ]R3 belonging to eigenvalues AI, A2, A3, respectively. Their associated eigenvectors of Tin P2(lR) are II(x) = 1, h(x) = 1 + x , f3(x) = 1 + 2x + x 2 , respectively . Since the eigenvalues AI, A2, A3 are all distinct, the eigenvectors {XI,X2, X3} of [Tl a are linearly independent and so are {3 = {ft, 12, f3} in P2(lR). Thus , each Ij is a basis for the eigenspace E(Aj) of T belonging to Ai for i = 1, 2, 3, and the basis-change matrix is

=

=

Q

= [idl p = [XI X2 x3l = [[fila [hla [f3la l =

1 1 1]

0 1 2 [ 001

.

Hence, by changing the basis ex to (3, the matrix representation of T is a diagonal matrix : [T],B

= [idl~[Tla[idlp =

Q-I[T]aQ

=

I 00] 0 2 0 = [ 003

D.

o

Note that, if T = A is an n x n square matrix written in column vectors, A = . .. cnl, then the linear transformation A : IRn ~ IRn is given by A(ej) Cj, i = 1, . .. , n, so that A itself is just the matrix representation with respect to the standard basis ex = [ej , . . . , en} for IRn , say A = [Ala . Now if there is a basis {3 = [xj , .. . ,xn } of n linearly independent eigenvectors of A, then the natural isomorphism : IRn ~ IRn defined by (X j) ej is simply a change of basis by the basis-change matrix Q = [idlp and the matrix representation of A with respect to (3 is a diagonal matrix :

=

[CI

=

Problem 6.24 Let T be the lineartransformation on lR 3 defined by T(x, y, z)

= (4x + z, 2x + 3y + Zz, x + 4z) .

Find all the eigenvalues and their eigenvectors of T and diagonalize T. Problem 6.25 Let M2x2(lR) be the vectorspace of all real 2 x 2 matrices and let T be the linear transformation on M2x2 (R) defined by

242

Chapter 6. Eigenvectors and Eigenvalues

-I:

b]=[a+b+d a+b+C] d b+c+d a+c+d .

C

Find the eigenvalues and basis for each of the eigenspaces of T , and diagonalize T .

=

Problem 6.26 Let T : P2(lR) ~ P2(lR) be the linear transformation defined by T(f(x)) lex) x/'(x). Find all the eigenvalues of T and find a basis ex for P2(lR) so that [T]a is a

+

diagonal matrix .

6.7 Exercises 6.1. Find the eigenvalues and eigenvectors for the given matrix, if they exist.

(1)

(3)

[_~ ~

l

(2)

[

311

1 - 33 ] ,

l

[! ~ ! ~], [~1 ~ ~ ~], [-! ~~ -! ~~], [i -1 ~ j]. (4)

1010

(5)

-i

III

(6)

-1

0 -1

2

0

0 2

1 .

n

6.2. Find the characteristic polynomial, eigenvalue s and eigenvectors of the matrix

A

6.3. Show that a 2 x 2 matrix A

=[

-2 0 3 2 4 -1

= [; ~] has

(1) two distinct real eigenvalues if (a - d)2

+ 4bc >

(2) one eigenvalue if (a - d)2 + 4bc

= O.

(3) no real eigenvalues if (a - d)2

4bc < 0,

+

(4) only real eigenvalues if it is symmetric (i.e., b

0,

= c).

6.4. Suppose that a 3 x 3 matrix A has eigenvalues -1, 0, 1 with eigenvectors u, v, w, respectively. Describe the null space N(A) , and the column space C(A) . 6.5. For any two matrices A and B, show that (1) adj AB

= adj B . adj A; = Q(adj A)Q-l for any invertible matrix Q;

(2) adj QAQ-l (3) if AB

= BA, then (adj A)B = B(adj A) .

6.7. Exercises

243

(Hint : It was mentioned for any two invertible matrices A and B in Problem 2.14)

6.6. If a 3 x 3 matrix A has eigenvalues 1, 2, 3, what are the eigenvectors of B

= (A -l)(A-

2l)(A - 31) ?

6.7. Show that any 2 x 2 skew-symmetric nonzero matrix has no real eigenvalue .

= 1, AZ = 2, A3 = 3 with the associated eigenvectors XI = (2, -1 ,0), Xz = (- 1, 2, -1) , x3 = (0, -1,2 ). 6.9. Let P be the projection matri x that project s lRn onto a subspace W. Find the eigenvalues and the eigenspaces for P. 6.8. Find a 3 x 3 matrix that has the eigenvalues Al

=

6.10. Let u ,

Y be n x 1 column vectors, and let A UyT. Show that u is an eigenvector of A, and find the eigenvalues and the eigenvectors of A. 6.11. Show that if A is an eigenvalue of an idempotent n x n matrix A (i.e., A Z = A) , then A must be either 0 or 1.

6.12. Prove that if A is an idempotent matrix, then tr(A) = rank A. 6.13. Let A = [aU] be an n x n matrix with eigenvalues AI, . . . , An . Show that Aj = ajj

+

L (au -

for j

Ai)

= I,

.. . , n.

i f.j

6.14. Prove that if two diagonalizable matrices A and B have the same eigenvectors (i.e., there exists an invertible matrix Q such that both Q-I AQ and Q-I BQ are diagonal ; such BA . In fact, matrices A and B are said to be simultaneously diagonalizable), then AB the converse is also true. (See Exercise 7.17.) Prove the converse with an assumption that the eigenvalues of A are all distinct.

=

6.15. Let D : P3 (lR) --+ P3 (lR) be the differentiation defined by Df (x ) = !,(x) for f Find all eigenvalues and eigenvectors of D and of DZ.

E P3(lR).

6.16. Let T : Pz(R) --+ Pz (R) be the linear transformation defined by T(azx

z

+

al x

+

ao)

= (ao + al )x z +

(a l

+

az )x

+

(ao + az ) ·

Find a basis for Pz(lR) with respect to which the matrix representation for T is diagonal.

6.17. Determine whether or not each of the following matrices is diagonalizable. (1) [

i b -;],

-I

2

(2)

3

[i ~ ~] , 0

I

~ ~ ~]

(3) [

2

.

-2 0 -I

6.18. Find an orthogonal matrix Q and a diagonal matrix D such that Q T A Q (1) A

= [-;

4

-~ ~ ] , (2) A = [; 2 -3

6.19. Calculate A lOx for A =

[bo ~

=;] ,

6 -2

;

~],

(3) A =

0 2 3 x= [

[b

= D for

~ ~] .

0 1 1

~] . 7

6.20. For n

~ 1, let an denote the number of subsets of {I , 2 , . . . , n) that contain no consecutive integers. Find the number an for all n ~ 1.

6.21. Find a general solut ion of each of the following recurrence relations .

244

Chapter 6. Eigenvectors and Eigenvalues (1) Xn = 6Xn- l - llx n _ 2 + 6x n- 3' n 2: 3, (2) Xn = 3Xn - l - 4Xn - 2 + 2x n- 3 , n 2: 3, (3) Xn 4Xn- l - 6Xn - 2 + 4Xn - 3 - Xn- 4 , n 2: 4.

6.22.

= LetA = [0~6

0;3] . Find a value X so that A has an eigenvalue A = 1. For X() = (1,1),

calculate lim Xb where Xk

k-..oo 6.23 . Compute e A for

(1) A

= [~ ~

l

(2) A

= AXk-l , k = 1, 2, . ...

=

[i

~

l

6.24 . In 2000, the initial status of the car owners in a city was reported as follows: 40% of the car owners drove large cars, 20% drove medium-sized cars, and 40% drove small cars. In 2005, 70% of the large-car owners in 2000 still owned large cars, but 30% had changed to a medium-sized car. Of those who owned medium-sized cars in 2000, 10% had changed to large cars , 70% continued to drive medium-sized cars, and 20% had changed to small cars . Finally, of those who owned the small cars in 2000, 10% had changed to medium-sized cars and 90% still owned small cars in 2005. Assuming that these trends continue, and that no car owners are born, die or otherwise add realism to the problem, determine the percentage of car owners who will own cars of each size in 2035. 6.25 . Let A

= [~ ;

J

(1) Compute eA directly from the expansion. (2) Compute eA by diagonalizing A. 6.26 . Let A(t) be a matrix whose entries are all differentiable functions in t and invertible for all t . Compute the following : (2)

(1) :t (A(t)3) ,

6.27. Solve y'

= Ay, where

(1)A=

-6 24 -1 8 [ 2 -12

(2) A

-:J

= [; -~]

I

and

y'

6.28. Solve the system

-

Y~:: Y3 =

with initial conditions Yl (0) 6.29 . Let f(A) (1) A

= det(Al -

= [; I

dt

m l

ODd y(l)

y(O) = [ Yl

~

Y2

3Yl

~

+ +

2Y3 4Y3

2Yl + Y2 = 0, Y2(0) = 2, Y3(0) = 1.

A) be the characteristic polynomial of A . Evaluate f(A) for

~ ~] , 1 3

~(A(t)-l).

(2) A

=[

~

-1

; 1

i]. 4

In fact, f(A) = 0 for any square matrix A and its characteristic polynomial f(A). (This is the Cayley-Hamilton theorem).

6.30. Determine whether the following statements are true or false, in general, and justify your answers.

6.7. Exercises

245

(1) If B is obtained from A by interchanging two rows, then B is similar to A . (2) If A is an eigenvalue of A of multiplicity k , then there exist k linearly independent eigenvectors belonging to A. (3) If A and Bare diagonalizable, so is AB . (4) Every invertible matrix is diagonalizable. (5) Every diagonalizable matrix is invertible. (6) Interchanging the rows of a 2 x 2 matrix reverses the signs of its eigenvalues. (7) A matrix A cannot be similar to A + I.

(8) Each eigenvalue of A + B is a sum of an eigenvalue of A and one of B.

(9) The total sum of eigenvalues of A + B equals the sum of all the eigenvalues of A and of those of B. (10) A sum of two eigenvectors of A is also an eigenvector of A. (11) Any two similar matrices have the same eigenvectors . (12) For any square matrix A, det eA

= edet A .

7 Complex Vector Spaces

7.1 The n-space

en and complex vector spaces

So far, we have been dealing with matrices having only real entries and vector spaces with real scalars. Also , in any system of linear (difference or differential) equations, we assumed that the coefficients of an equation are all real. However, for many applications of linear algebra, it is desirable to extend the scalars to complex numbers. For example, by allowing complex scalars, any polynomial of degree n (even with complex coefficients) has n complex roots counting multiplicity. (This is well known as the fundamental theorem of algebra). By applying it to a characteristic polynomial of a matrix, one can say that all the square matrices of order n will have n eigenvalues counting multiplicity. For instance, the matrix A =

[~

-

~

] has no real

eigenvalues, but it has two complex eigenvalues A. = I ± i . Thus, it is indispensable to work with complex numbers to find the full set of eigenvalues and eigenvectors. Therefore, it is natural to extend the concept of real vector spaces to that of complex vector spaces, and develop the basic properties of complex vector spaces. The complex n-space en is the set of all ordered n-tuples (zr, Z2 , ... ,Zn) of complex numbers: en = {(ZI,Z2 , .. . ,Zn):

u

E

C, i = 1,2, .. . ,n},

and it is clearly a complex vector space with addition and scalar multiplication defined as follows : (Zl , Z2 , . . . , Zn)

+

(Z'l '

z;,

k(ZI , Z2,

, z~)

=

, Zn) =

(Zl

+ zl' Z2 +

z; , ... , Zn

+

z~)

(kZI, kZ2, ... ,kzn) for k E Co

The standard basis for the space en is again {ej , e2, .. . , en} as the real case, but the scalars are now complex numbers so that any vector z in en is of the form z = Lk=l Zkek with Zk = Xk + iYk E C, i.e., Z = x + iy with x , y E jRn . In a complex vector space, linear combinations are defined in the same way as the real case except that scalars are allowed to be complex numbers. Thus the same J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

248

Chapter 7. ComplexVectorSpaces

is true for linear independence, spanning spaces, basis, dimension, and subspace. For complex matrices, whose entries are complex numbers, the matrix sum and product follow the same rules as real matrices. The same is true for the concept of a linear transformation T : V -+ W from a complex vector space V to a complex vector space W. The definitions of the kernel and the image of a linear transformation remain the same as those in the real case, as well as the facts about null spaces , column spaces, matrix representations of linear transformations, similarity, and so on. However, if we are concerned about the inner product , there should be a modification from the real case. Note thatthe absolutevalue (or modulus) of a complex number

Z = x+iy is defined as the nonnegative real number lzl = (Zz)! = Jx 2 + y2, where f

is the complex conjugate of z. Accordingly, the length of a vector z = (Zt, Z2, •• • , Zn) in the n-space with Zk = Xk + iYk E e has to be modified: if one would take an inner product in as IIzll2 = + .. . then a nonzero vector (1, i) in 2 would 2 2 have zero length : 1 + i = 0. In any case, a modified definition should coincide with the old definition, when the vectors and matrices were real. The following is the definition of a usual inner product on the n-space

en en

zI

e

+z;,

en.

Definition7.1 For two vectors U=[UI U2 . .. unf andv=[vi Vz .. . vnf in en, uk. Vk E C, the dot (or Euclidean inner) product u . v ofu and v is defined by

u -v = UlVI where ii = [iiI U2 . . . tude) of a vector u in

+

UZVz + ..

.+

UnVn = iiTv,

unf, the conjugate ofu. The Euclidean length (or magni-

en is defined by

where by

IUkl 2

=

UkUk,

and the distance between two vectors u and v in en is defined

d(u, v) = [u - vII. In an (abstract) complex vector space, one can also define an inner product by adopting the basic properties of the Euclidean inner product on as axioms.

en

Definition 7.2 A (complex) inner product (or Hermitian inner product) on a complex vector space V is a function that associates a complex number (u, v) with each pair of vectors u and v in V in such a way that the following rules are satisfied: For all vectors u, v and w in V and all scalars k in C, (1) (u, v) = (v, u), (2) (u + v, w) = (u, w) (3) (ku , v) k(u, v)

=

+

(4) (v, v) 2:: 0, and (v, v) =

(v, w)

°

if and only if v = 0

(additivity), (antilinear) , (positive definiteness).

A complex vector space together with an inner product is called a complex inner product space or a unitary space. In particular, the n-space en with the dot product is called the Euclidean (complex) n-space.

7.1. The n-space

en

249

The following properties are immediate from the definition of an inner product: (5) (0, v) = (v,O) = 0, (6) (0, v + w) = (0, v) + (0, w), (7) (0 , kv) = k(o , v) .

Remark: There is another way to define an inner product on a complex vector space.

If we redefine the dot product 0 . v on the n-space en by

o· v =

UIVI

+

U2V2

+ . .. +

UnV n,

then the third rule in Definition 7.2 should be modified to be (3') (0, kv)

= k{o, v), so that (ko, v) = k(o , v).

But these two different definitions do not induce any essential difference in a complex vector space. In a complex inner product space, as the real case, the length (or magnitude) of a vector 0 and the distance between two vectors 0 and v are defined by

11011

1

= (u, o)~ ,

d(o, v)

= 110 -

vII,

respectively. Example 7.1 (A complexinner product onafunction space) Let Cc[a, b] denote the set of all complex-valued continuous functions defined on [a, b]. Thus an element in Cc[a , b] is of the formf (x) = II (x)+ih(x), where I I (x) and h(x) are real-valued and continuous on [a, b]. Note that f is continuous if and only if each component function Ii is continuous. Clearly, the set Cc[a , b] is a complex vector space under the sum and scalar multiplication of functions . For a vector f(x) = II (x) + ih(x) in Cc[a, b] , its integral is defined as follows:

l

b

f(x)dx =

l

b

[II (x)

+

ih(x)]dx =

l

b

II (x)dx +

i

l

b

h(x)dx .

It is an elementary exercise to show that, for vectors f(x) = II(x) + ih(x) and g(x) = gl (x) + ig2(X) in the complex vector space Cc[a , b], the following formula defines an inner product on Cc[a, b] :

(f, g) = =

l l l

b

f(x)g(x)dx b

[[I (x ) - ih(x)][gl (x) b

+i

[II (x )gl (x)

l

+

+

ig2(X)] dx

h(X )g2(X)] dx

b

[II (X)g2(X) - h(x)gl(x)]dx.

D

250

Chapter 7. ComplexVectorSpaces

Problem 7.1 Show that the Euclidean inner product on en satisfies all the inner product axioms.

The definitions of such terms as orthogonal sets, orthogonal complements, orthonormal sets, and orthonormal basis remain the same in complex inner product spaces as in real inner product spaces. Moreover, the Gram-Schmidt orthogonalization is still valid in complex inner product spaces, and can be used to convert an arbitrary basis into an orthonormal basis. If V is an n-dimensional complex vector space, then by taking an orthonormal basis for V, there is a natural isometry from V to en that preserves the inner product as in the real case. Hence, without loss of generality, one may work only in en with the Euclidean inner product, and we use · and (, ) interchangeably. On the other hand, one may consider the set en as a real vector space by defining addition and scalar multiplication as (ZI , Z2, . .. , Zn)

+

(z;, z~,

r(ZI, Z2,

,z~) =

=

,Zn)

(ZI

+

z;, Z2

+

z~ , . .. , Zn

+

(rZI, rZ2, .. . , r zn) for r E

z~)

R

Two vectors el = (1,0, . .. ,0) and iel = (i, 0, . . . ,0) are linearly dependent when the space en is considered as a complex vector space. However, they are linearly independent if en is considered as a real vector space. In general,

forms a basis for en considered as a real vector space. In this way, en is naturally identified with the 2n-dimensional real vector space lR2n . That is, dim en = n when en is considered as a complex vector space, but dim en = 2n when en is considered as a real vector space. Note that when en is considered as a 2n-dimensional real vector space, the space lRn {(XI, X2, ... ,xn) : Xi E lR} is a subspace of en, but not when en is considered as an n-dimensional complex vector space.

=

Example 7.2 (Gram-Schmidt orthogonalization on a complex vector space) Consider the complex vector space e 3 with the Euclidean inner product. Apply the Gram-Schmidt orthogonalization to convert the basis XI = (i, i, i), X2 = (0, i, i), X3 (0, 0, i) into an orthonormal basis.

=

Solution: Step 1: Set (i, i, i)

XI

ul

= IIxIIi = .J3

(i

i

i)

= .J3' .J3 ' .J3 .

Step 2: Let WI denote the subspace spanned by UI. Then X2 - ProjWl X2

=

X2 -

=

.. (0, I, I) -

=

2i i i) ( -3' 3' 3" .

(UI, X2)UI

2(i i i) ../3 ../3' ../3' ../3

7.1. The n-space

en

251

Therefore, U2 =

x2-ProjW1x2 3 ( 2i i i) (2i i i) Projw\x211 =./6 -3' 3' 3 = - ./6 ' ./6 ' ./6 .

II x2 -

Step 3: Let W2 denote the subspace spanned by {UI, U2} . Then X3 - ProjW2x3 =

X3 - (UI, X3)UI - (U2 , X3)U2

=

(0, 0, i) -

1 (i

i

i)

./3 ./3' ./3'./3 -

1 ( 2i

i

i)

./6 - ./6' ./6' ./6

= (o,-~ ,~) . Therefore,

Thus,

iii ) (2i i ./3' -Jj , U2= - ./6 ' ./6'

i )

UI= ( ./3'

./6 '

(,

U3=,0, -

i

../2'

i ) .j2

o

form an orthonormal basis for (; 3.

Example 7.3 (An orthonormal set in the complex-valuedfunction space Cc[O, 2rrD Let Cc[O, 2rr] be the complex vector space with the inner product given in Example 7.1, and let W be the set of vectors in Cc[O, 2rr] of the form eikx = coskx + i sinkx, where k is an integer. The set W is orthogonal. In fact, if

gk(X) = eikx are vectors in W, then 2 (gk. ge) = 1l" eikxeil xdx =

i =i =

2

I

1l"

i

cos(i - k)xdx

[ l~k sin(i -

k)x

if k

2

+

1l"

i

J:

e-ikxeilxdx =

i +

2

i

1l"

i

2 1l"

sin(i - k)xdx

[ l--~ cos(i -

k)x

ei(l-k)xdx

J:

if k

i= i ,

if k = l.

10271" dx

if k

gl(X) = eilx

and

i= i,

= e.

252

Chapter 7. ComplexVectorSpaces

Thus, the vectors in Ware orthogonal and each vector has length $ . By normalizing each vector in the orthogonal set W, one can get an orthonormal set. Therefore, the vectors 1 k lO fk(X)

=

r-:t=e x,

v2rr

k

= 0,

±1, ±2,

form an orthonormal set in the complex vector space CdO, 2rr].

D

Problem 7.2 Prove that in a complexinner productspace V,

(Cauchy-Schwarzinequality), (triangleinequality), (Pythagorean theorem).

I(x, y}!2 S (x, x}(y,y} (2) IIx+YII S [x] + IIYII (3) IIx + YII 2 = IIxll 2 + IIYII 2 if x and Yare orthogonal

(1)

The definitions of eigenvalues and eigenvectors in a complex vector space are the same as the real case, but the eigenvalues can now be complex numbers. Hence , for any n x n (real or complex) matrix A , the characteristic polynomial det(A./ - A) has always n complex roots (i.e., eigenvalues) counting multiplicities. For example, consider a rotation matrix

A = [ c~s () - sin () ] sm ()

cos ()

with real entries. This matrix has two complex eigenvalues for any () E JR, but no real eigenvalues unless () kn for an integer k. Therefore, all theorems and corollaries in Chapter 6 regarding eigenvalues and eigenvectors remain true without requiring the existence of n eigenvalues explicitly, and exactly the same proofs as the real case are valid since the arguments in the proofs are not concerned with what the scalars are. For example , one can have a theorem like 'for an n x n matrix A, the eigenvectors belonging to distinct eigenvalues are linearlyindependent', and 'if the n eigenvalues of A aredistinct, then the eigenvectors belonging to themform a basisfor n so that A is diagonalizable ',

=

c

An n x n real matrix A can be considered as a linear transformation on both JRn and C":

T : JRn

JRn

defined by

T(x) = Ax,

S:cn~cn

defined by

S(x) = Ax.

~

Since the entries are all real, the coefficients of the characteristic polynomial f(A) = det(A./ - A) of A are all real. Thus, if A is a root of f(A) = 0, then its conjugate). is also a root because f().) = f(A) = 0. In other words, if A is an eigenvalue ofa real matrix A, then). is also an eigenvalue. In particular, any n x n real matrix A has at least one real eigenvalue if n is odd. Moreover, if x is an eigenvector belonging to a complex eigenvalue A, then the complex conjugate i is an eigenvector belonging to).. In fact, if Ax = AX with x =1= 0, then

7.1. The n-space en

253

x

where denotes the vector whose entries are the complex conjugates of the corresponding entries of x. Using this fact, the following example shows that any 2 x 2 matrix with no real eigenvalues can be written as a scalar multiple of a rotation. Example 7.4 Show that if A is a 2 x 2 real matrix having no real eigenvalues, then A is similar to a matrix of the form r cos B rsin B ] [ -r sin B r cos B .

Solution: Let A be a 2 x 2 real matrix having no real eigenvalues, and let A = a + ib and X a - ib with a, b E jR and b =1= 0 be two complex eigenvalues of A with associated eigenvectors x = u + iv and = u - iv with u, v E jR2, respectively. It follows immediately that

=

x

u

=

!(x

+

a

=

2(A

I

+

x), -

A),

v

=

-~(x - x),

b

=

-2(A - A).

i

-

x

Since A =1= X, the eigenvectors x and are linearly independent in the complex vector space (;2 , as they are when (;2 is considered as a real vector space. It implies that the vectors X + and x - are linearly independent in the real vector space (;2. (see Problem 7.3 below), so that the real vectors u and v are linearly independent in the subspace jR2 of the real vector space (;2. Thus a [u , v} is a basis for the real vector space jR2, and

x

x

=

1

_

2(Ax + Ax) =

Au =

A

1

2(AX

+

-_)

AX

(U ~ iV) + X(U ~ iV)

= au _ by.

Similarly, one can get Av = bu + av, implying that the matrix representation of the linear transformation A : jR2 ~ jR2 with respect to the basis Ci is

That is, any 2 x 2 matrix that has no real eigenvalues is similar to a matrix of such form. Now, by setting r = .Ja 2 + b2 > 0, one can get a = r cos Band b = r sin B for some B E R, so

[A] = [ a

r c~s B r sin B ] .

-r SIn B r cos B

o

Problem 7.3 Let x and y be two vectors in a vector space V . Show that x and y are linearly independent if and only if x + y and x - y are linearly independent.

254

Chapter 7. ComplexVectorSpaces

Problem 7.4 Find the eigenvalues and the eigenvectors of

2 01] , [ 0i O

[1 . . :

(2) - i 2 0 . 1 0 -i 1- i 0 1 Problem 7.5 Prove that an n x n complex matrix A is diagonalizable if and only if A has n linearly independent eigenvectors in the complex vector space en. (1)

7.2 Hermitian and unitary matrices

=

Recall that the dot product of real vectors x, y E lRn is given by x . y x Ty in matrix form . For complex vectors u, v E en, the Euclidean inner product is defined by u . v = UI VI + .. . + Un Vn = iiT v, which involves the conjugate transpose, not just the transpose. Definition 7.3 For a complex matrix A, its complex conjugate transpose, A H = is called the adjoint of A.

AT,

Note that A is the matrix whose entries are the complex conjugates of the corresponding entries in A . Thus, [aij]H = [a jil. With this notation, the Euclidean inner product on en can be written as

H u- v = -T U v = u v, Problem 7.6 Show that (AB)H

= B HAH when AB can be defined.

Problem 7.7 Prove that if A is invertible, so is A H, and (A H)-I

= (A -I)H .

For complex matrices, the notion of symmetry and skew-symmetry real matrices are replaced by Hermitian and skew-Hermitian matrices , respectively. Definition 7.4 A complex square matrix A is said to be Hermitian (or self-adjoint) if A H = A , or skew-Hermitian if A H = -A. For matrices A= [ 4

~ i 4 j i]

and

B= [

_/+ i

1': i

J.

one can see that A is Hermitian and B is skew-Hermitian. A Hermitian matrix with real entries is just a real symmetric matrix, and conversely, any real symmetric matrix is Hermitian. Like real matrices , any m x n (complex) matrix A can be considered as a linear transformation from en to em, and (Ax). y = (Ax)H y = x H AHy =

for any x E en and y E .of Hermitian matrices.

x. (AHy)

em . The following theorem lists some important properties

7.2. Hermitian and unitary matrices

255

Theorem 7.1 Let A be a Hermitianmatrix. (1) Forany (complex) vector x E en, x HAx is real. (2) All (complex) eigenvalues of A are real. In particular; an n x n real symmetric matrix has precisely n real eigenvalues. (3) The eigenvectors ofA belongingto distinct eigenvalues are mutuallyorthogonal. Proof: (1) Since x HAx is a 1x 1 matrix, (x HAx) = (x HAx)H = x HAx . (2) If Ax = AX, then x HAx = x HAX = AX HX = AIIxf. The left-hand side is real and IIxll 2 is real and positive, because x =F O. Therefore, Amust be real. (3) Let x and y be eigenvectors of A belonging to eigenvalues Aand u, respectively. Let A =F p: Because A = A H and A is real, it follows that A(X . y) = (AX) . Y = Ax Y = x • Ay = j.L(x . y).

Since A =F u, it gives that x . y = x H Y = 0, i.e., x is orthogonal to y.

o

In particular, eigenvectors belonging to distinct eigenvalues of a real symmetric matrix are orthogonal.

Remark: Condition (1) in Theorem 7.1 (i.e., x HAx is real for any complex vector x E en) is equivalent to saying that the diagonals of A are real :

::][:J

=

'LaijiiXj i.]

=

"L..,aiilx;l 2 + C + C, -

where C = Li (3): It is clear that the columns of the basis-change matrix U are n orthonormal eigenvectors of A. (3) => (1): Let U be the unitary matrix whose columns are the n orthonormal eigenvectors . Then AU = U D or A = U DU H, and AA H

= =

UDUHUDHU H = UDDHU H = UDHDU H UDHUHUDU H AHA .

D

That is, A is normal.

Note that there exist infinitely many non-normal complex matrices that are still diagonalizable, but of course not unitarily. One can find such examples among the triangle matrices having distinct diagonal entries. Recall that any n x n real matrix A can be written as the sum S + T of a symmetric matrix S = !(A +A T) and askew-symmetric matrix T = !(A - AT) . (See Problem 1.11.) The same kind of expression is also possible for a complex matrix. A complex matrix A can be written as the sum A = HI + i H2, where

1 H HI = -(A +A ), 2

i H H2=--(A-A );

2

. 1 H or 1H2=-(A-A ).

2

Clearly both HI and H2 are Hermitian, and i H2 is skew-Hermitian. Problem 7.20 Show that the matrix A

= [~ ~],

uniIarily diagonalizable. But it is diagonalizable.

x E JR, is not normal, so it cannot be

7.5.1. Application: The spectral theorem

265

[Ii i]

Problem 7.21 Determine whether or not the matrix A =

iii iii

is unitarily diagonalizable. If it is, find a unitary matrix U that diagonalizes A.

Problem 7.22 Let HI and H2 be two Hermitian matrices . Show that A if and only if HIH2 = H2HI.

= HI + i H2 is normal

Problem 7.23 For any unitarily diagonalizable matrix A, prove that (1) A is Hermitian if and only if A has only real eigenvalues; (2) A is skew-Hermitian if and only if A has only purely imaginary eigenvalues; (3) A is unitary if and only if 1> . 1 I for any eigenvalue Xof A .

=

7.S Application 7.5.1 The spectral theorem As shown in the previous section, the normal matrices are the only matrices that can be unitarily diagonalized. That is, A is normal if and only if there exists a basis ex for en consisting of orthonormal eigenvectors of A such that the matrix representation [A]a of A with respect to ex is diagonal.

Theorem 7.12 (Spectral theorem) Let A be a normal matrix, and let {UI, U2, .. . , un} be a set oforthonormal eigenvectors belonging to the eigenvalues AI, A2, ... , An of A, respectively. Then A can be written as

A = U DU

H

= A,IUIUf + A,2 U2U¥ + . . . + A,nUn u !{

I

and uiuf is the orthogonal projection matrix onto the subspace spanned by the eigenvector Uj for i = 1, .. . , n. Proof: Note that U = [UI U2 ... un] is a unitary matrix that transforms A into a diagonal matrix D, i.e., U- I AU = U H AU = D . Then

A

~ =

UDU"

~

[A,., A2.2

A,IUluf + A,2U2U¥ +

A••• l

[~

+ A,nUn u !{

A,I PI+A,2 P2+"'+A,n Pn,

]

266

Chapter 7. Complex Vector Spaces

where

uli ]

Pi = UiU!l = : I . [ Uni

[Uli' . . Un;]

[IUliI

=

Z

: •

.. . Uliuni]

'.

:





,

lunilZ

UniUIi

which is a Hermitian matrix. Now, for any x E en and i, j = 1, . . . , n,

PiX PiPj

(PI + .. . + Pn)x

= =

= (Ui, X}Ui, uiuf UjUf = (Ui , Uj }Uiuf

=

lUiuf = Pi { OUiuf = 0

=

PIX + .. . + Pnx

uiufx

ifi = j, ifi =P j,

=

n

~)Ui' X}Ui

=

X

=

id(x) .

i= 1

Therefore, each Pi is nothing but the orthogonal projection onto the subspace spanned by the eigenvector Ui. 0

=

Note that the equation PI + ... + Pn id means that if one restricts the image of the Pi to be the subspace spanned by Ui which is isomorphic to C, then (PI, .. . , Pn) defines another orthogonal coordinate system with respect to the orthonormal basis {UI, .. . , un} just like (Zl, . . . , Zn) of the en (see Sections 5.6 and 5.9.3). Note that any X E en has the unique expression x = ~::: 1 such that 1 :::: dim E(A) < rnA,

so that the number of linearly independent eigenvectors belonging to A must be less than ms, However, even if a matrix A is not diagonalizable, one may try to find a matrix similar to A which has as many zero entries as possible except diagonals. Schur's lemma says that any square matrix is (unitarily) similar to an upper triangular matrix. But, it is a fact that any square matrix A can be similar to a matrix much "closer" to a diagonal matrix, called a Jordan canonical form . Its diagonal entries are the eigenvalues of A, the entry just above each diagonal entry is 0 or I, and all other J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

274

Chapter 8. Jordan Canonical Forms

entries are O. In this case, the columns of a basis-change matrix Q are something like eigenvectors, but not the same in general. They are called generalizedeigenvectors. Theorem 8.1 Forany square matrix A, A is similar to a matrix J of the following form, called the Jordan canonical form of A or a Jordan canonical matrix,

in which (1) s is the number oflinearlyindependent eigenvectors of A, and (2) each Jt is an upper triangularmatrix oftheform

where Ai isa singleeigenvalue of Jj withonlyone linearlyindependent associated eigenvector. Such Jj is called a Jordan block belonging to the eigenvalue Aj. The proof of Theorem 8.1 may be beyond a beginning linear algebra course. Hence, we leave it to some advanced books and we are only concerned with how to find the Jordan canonical form J of A and a basis-change matrix Q in this book. First, note that if an n x n matrix A has a full set of n linearly independent eigenvectors (that is s = n), then there have to be n Jordan blocks so that each Jordan block is just a 1 x 1 matrix, and an eigenvalue A appears as many times as its multiplicity. In this case, the Jordan canonical form J of A is just the diagonal matrix with eigenvalues on the diagonal and a basis-change matrix Q is defined by taking the n linearly independent eigenvectors as its columns. Hence, a diagonal matrix is a particular case of the Jordan canonical form. Remark: (1) For a given Jordan canonical form J of A, one can get another one by permuting the Jordan blocks of J, and this new one is also similar to A. It will be shown later that any two Jordan canonical forms of A cannot be similar except this possibility. In other words, any (complex) square matrix A is similar to only one Jordan canonical matrix up to permutations of the Jordan blocks. In this sense, it is called the Jordan canonical form of a matrix A. (2) As an alternative way to define the Jordan canonical form of A, one can take the transpose JT of the Jordan canonical matrix J given in Theorem 8.1. In this case, each Jordan block becomes a lower triangular matrix with a single eigenvalue. But this alternative definition does not induce any essential difference from the original one.

8.1. Basic properties of Jordan canonical forms

275

If J is the Jordan canonical form ofa matrix A , then they have the same eigenvalues and the same number of linearly independent eigenvectors, but not the same set of them in general. (Note that x is an eigenvector of J = Q -I A Q if and only if Qx is an eigenvector of A). Actually, for a given matrix A, its Jordan canonical form J is completely determined by the number s of linearly independent eigenvectors of A and their ranks (which will be defined in Section 8.2): each eigenvector corresponds to a Jordan block in J, and the rank of an eigenvector determines the size of the corresponding block. The following example illustrates which matrices A have the Jordan canonical form J and how the s linearly independent eigenvectors of A (or J) correspond to the Jordan blocks in J. Example 8.1 (Each Jordan block to each eigenvector) Let J be .a Jordan canonical matrix of the form :

J=

[~ ~] [

]

[~ ~]

[ JI

=

h h

]

.

[2]

Find the number of linearly independent eigenvectors of J and determine all matrices whose Jordan canonical forms are J . Solution: Since J is a triangular matrix , the eigenvalues of J are the diagonal entries 6 and 2 with multiplicities 2 and 3, respectively. The eigenspace E(6) has a single dimN(1-6I) 5-rank(Jlinearly independent eigenvector because dim E(6) 6/) = 1, by the Rank Theorem 3.17. In fact, el = (l, 0, 0, 0, 0) is such a vector and A = 6 appears only in a single block h . Similarly, one can see that the eigenspace E (2) has two linearl y independent eigenvectors e3 and es with dim E (2) = 5 - rank(1 - 2/) = 2, and A = 2 appears in two blocks h and h Hence, one can conclude that if a matrix A is similar to J , then A is a 5 x 5 matrix whose eigenvalues are 6 and 2 with multiplicity 2 and 3 respectively, but there is only one linearly independent eigenvector belonging to 6, (i.e., dim E(6) = 1) and there are only two linearly independent eigenvectors belonging to 2, (i.e., dim E(2) = 2). Moreover, the converse is also true by Theorem 8.1. In general, one can say that if a matrix A is similar to a Jordan canonical matrix J, then both matrices have the same eigenvalues of the same multiplicities and dimN(U - A) = dimN(U - J) for each eigenvalue A. D

=

=

As shown in Example 8.1, the standard basis vectors ej's associated with the first column vectors of the Jordan blocks Jj'S of J form linearly independent eigenvectors of the matrix J, and then the vectors Qe j 's form linearly independent eigenvectors of the matri x A, where Q-1AQ = J.

276

Chapter 8. Jordan Canonical Forms

Problem 8.1 Note that the matrix

A

=

[~o - i-~ j -~] 0 0

o

0 0

2 0

0 2

has the eigenvalues 6 and 2 with multiplicities 2 and 3, respectively. Moreover, there are two linearly independent eigenvectors uj (0, -1 , 1, 0, 0) and U2 (0, 1, 0, 0, 1) belonging to A = 2, and vI = (-1, 0, 0, 0, 0) is an eigenvector belonging to A = 6. Show that the Jordan canonical form of A is the matrix J given in Example 8.1 by showing Q-I A Q J with an invertible matrix

=

=

=

Q

=

[-~o 1-! ~! ]. o

0 0

0 -1 0 001

Problem 8.2 Show that for any Jordan block J, the order of J is the smallest positive integer k such that (J - )..J)k = 0 , where A is an eigenvalue of J .

An eigenvalue A. may appear in several blocks. In fact, the number of Jordan blocks belonging to an eigenvalue A. is equal to the number of linearly independent eigenvectors of A belonging to A., which is the dimension of the eigenspace E(A.) = N(A./ - A) . Moreover, the sum of the orders of all Jordan blocks belonging to an eigenvalue A. is equal to the multiplicity ms; of A.. Next, one might ask how to determine the Jordan blocks belonging to a given eigenvalue A.. The following example shows all possible cases of Jordan blocks belonging to an eigenvalue A. when its multiplicity rnA is fixed.

Example 8.2 (Classifying Jordan canonical matrices having a single eigenvalue) Classify all possible Jordan canonical forms of a 5 x 5 matrix A that has a single eigenvalue A. of multiplicity 5 (up to permutations of the Jordan blocks). Solution: There are seven possible Jordan canonical forms as follows . (I) Suppose A has only one linearly independent eigenvector belonging to A.. Then the Jordan canonical form of A is of the form

J(l)

A. o A.I OI O0 = Q-I AQ = 0 0 A. I [ o 0 0 A.

o

0O ] 0 , I 0 0 0 A.

which consists of only one Jordan block with eigenvalue A. on the diagonal. And, both A and J (l ) have only one linearly independent eigenvector belonging to A. . (Note that rank(J (l) - A./) = 4.)

8.1. Basic properties of Jordan canonical forms

277

(2) Suppose it has two linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is either one of the forms

each of which consists of two Jordan blocks belonging to the eigenvalue A. Note that J (2) has two linearly independent eigenvectors el and ej, while J (3) has el and e2. These two matrices J (2) and J (3) cannot be similar, because (1 (2) - A1)3 = 0, but (1(3 ) - A1)3 i: O. (One can justify it by a direct computation.) (3) Suppose it has three linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is either one ofthe forms

each of which consists of three Jordan blocks belonging to the eigenvalue A. Note that J(4 ) has three linearly independent eigenvectors ej , e2 and ea, while J(5) has el, e2 and e3. These two matrices J (4) and J (5) are not similar, because (1 (4) - A1)2 = 0, but (1 (5) - A1)2 f= O. (4) Suppose it has four linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is of the form

which consists of four Jordan blocks with eigenvalue A. (5) Suppose it has five linearly independent eigenvectors belonging to A. Then the Jordan canonical form of A is the diagonal matrix J(7 ) with diagonal entries A. Note that all of these seven possible Jordan canonical matrices have the same trace, determinant and the same characteristic polynomial , but any two of them are not similar to each other. D As shown in the case (2) (also in (3)) of Example 8.2, two Jordan canonical matrices J (2) and J (3) have the same eigenvalue of the same multiplicity and they also have the same number of linearly independent eigenvectors , but they are not similar. The problem of choosing one of the two possible Jordan canonical forms

278

Chapter 8. Jordan Canonical Forms

which is similar to the given matrix A depends on the sequence ofrank(A - ).,1)i for 1,2, . . . , n.

e=

The next example illustrates how to determine the orders of the Jordan blocks belonging to the same eigenvalue A.

Example 8.3 (Determine Jordan blocks belonging to the sameeigenvalue) Let J be a Jordan canonical matrix with a single eigenvalue ).,:

[).,] Then,

and (J - 'A./)3

= O.

0

[0]

Thus, one can get rank(J - H) = 4, rank(J - ).,1)2 = 1 and rank(J - H)3 = O. This sequence of ranks, rank(J - H)i for = 1,2,3, determines completely the orders of blocks of J . In fact, one can notice that

e

(i) the fact that (J - ).,1)3 = 0 but (J - H)2 f= 0 implies that the largest block has order 3, (li) rank( J - ).,1)2 1 is equal to the number of blocks of order 3, (iii) rank(J - H) = 4 is equal to twice the number of blocks of order 3 plus the number of blocks of order 2, so there are two of them, (iv) the number of blocks of order 1 is 8 - (2 x 2) - (3 xl) = 1.

=

Recall that two similar matrices have the same rank. Hence, if J is the Jordan canonical form of A, then for any eigenvalue X and for any positive integer e, we have rank(A - ).,1)i = rank (J - ).,1)i. Furthermore, in a sequence of matrices J - H, (J - ).,1)2, (J - ).,1)3, . .. , all Jordan blocks belonging to the eigenvalue x will terminate to a zero matrix but all other blocks (belonging to an eigenvalue different from X)remain as upper triangular matrices with nonzero diagonal entries. Hence, the sequence ofrank(J - ).,1)k must stop decreasing at n - ms, (Note: rank(J _ H)k = n - rnA, when k = rnA')

8.1. Basic properties of Jordan canonical forms Let

CA

279

= n - m A for simplicity. Then, the decreas ing sequence {rank(A - Al)k - CA : k

= 1, . . . , m A}

determines completely the orders of the blocks of J belonging to A as shown in Example 8.3:

(i) The order of the largest block belonging to A is the smallest positive integer k such that rank(A - Al)k - CA = O. And the number of such largest blocks is equal to rank(A - Al)k-I - CA, (say = ij) . (ii) The number of blocks of order k - 1 is equal to rank(A - Al)k-2 - CA 2£1, (say = i2). (iii) The number of blocks of order k - 2 is equal to rank(A - Al)k-3 - CA - 3i I 2£2, (say = i3), and so on. In general, if iI, i2, ... , i j are given, one can determine i j+ I inductively as follows : (iv) The number of blocks of order k - j is equal to rank(A - Al)k-(j+I) - CA (j + 1)il - ji2 - .. . - 2ij, (say = ij+l) with io = 0 for j = 0, . .. , k - 1. In summary, one can determine the Jordan canonical form J of an n x n matrix

A by the following procedure. Step 1 Find all distinct eigenvalues AI, ... , At of A . Let their multiplicities be mAl' ... , m A" respectively, so that mAl + .. . + mAt = n , and let c Aj = n - mA j ' Step 2 For each eigenvalue A, the Jordan blocks belonging to the eigenvalue A are determined by the following criteria: (i) The order of the largest block belonging to A is the smallest positive integer k such that rank(A - Al)k - c ).. = 0, and (ii) the number of blocks of order k - j is inductively determined as rank(A - Ul-(j+I) - CA - (j + l)il - ji2 - . . . - 2£j, (say = ij+l) with io = 0 for j = 0, .. . , k - 1. This is a general guide to determine the Jordan canonical form of a matrix. However, for a matrix oflarge order, the evaluation ofrank(A - Al)k might not be easy at all, while, for matrices of lower order or relatively simple matrices, the computations may be accessible. Example 8.4 (Jordan canonicalform ofa triangularmatrix) Find the Jordan canonical form J of the matrix

A=

2 1 4] 0 2 -1 .

[ 003

Solution: Since A is triangular, the eigenvalues of A are the diagonal entries AI = A2 = 2, A3 = 3. Hence, there are two possibilities of the Jordan canonical form of

A:

280

Chapter 8. Jordan Canonical Forms J{I)

=

200] [210] [0 0 3 0 0 3 0 2 0

or

J (2)

=

0 2 0

.

But, one can see that rank(A - 2l) = 2. It implies that rank(J - 2/) Jordan canonical form of A must be J(2).

= 2 and so the 0

Example 8.5 (Jordan canonicalform of a companion matrix)Find the Jordan canonical form J of the matrix

0]

0 1

A

=[

0 o 1 0 0 0 o 1 -1 4 -6 4

o

.

Solution: The characteristic polynomial of the matrix A is det(AI - A)

= >.. 4 -

4>..3

+

6>.. 2 - 4>" + 1 = (>" - 1)4.

The eigenvalue of A is >.. = 1 of multiplicity 4. Note that the rank of the matrix

A -I

=

[

-1o 1 00] 0 -1

-1 1 0 0 -1 1 4 -6 3

is 3; by noting that the first three rows are linearly independent, or the determinant of the 3 x 3 principal submatrix of the upper left part is not zero. Hence, the rank of J - I is also 3 and so J - I must have three 1's beyond diagonal entries. It means that 1 1 0 J

=[

011 0 0 1 000

Also, one can check the following equations:

(A _ 1)3 =

[

-1 3-3 1] -1 3 -3 1 - 1 3 -3 1 -1 3 -3 1

i- 0,

but

which says that the order of the largest Jordan block is 4.

(A - 1)4

= 0, o

Problem 8.3 Let A be a 5 x 5 matrix with two distinct eigenvalues: ).. of multiplicity 3 and J.L of multiplicity 2. Find all possible Jordan canonical forms of A up to permutations of the

Jordan blocks.

8.2. Generalized eigenvectors

n

281

Problem 8.4 Find the Jordan canonical form for each of the following matrices:

(2)

412] 0 4 2 , [ 004

(3)

U~ !

8.2 Generalized eigenvectors In Section 8.1, assuming Theorem 8.1, we have shown how to determine the Jordan canonical form J of a matrix A . In this section, we discuss how to determine a basischange matrix Q so that Q -I A Q = J is the Jordan canonical form of A. In fact, if J is given, then a basis-change matrix Q is a nonsingular solution of a matrix equation AQ QJ.

=

The following example illustrates how to determine a basis-change matrix Q when a matrix A and its Jordan canonical form J are given. Example 8.6 (Each Jordan block to each chain of generalized eigenvectors) Let A be a 5 x 5 matrix similar to a Jordan block of the form

Q-IAQ=J=

A o AI OOO] I 0 0 0 0 A 1 0

[o o

.

0 0 A 1 0 0 0 A

Determine a basis-change matrix Q = [XI X2 X3 X4 xs].

Solution: Clearly, two similar matrices A and J have the same eigenvalue A of multiplicity 5. Since rank(J - U) = 4, dimN(J - AI) = dim E(A) = 1. In fact, J has only one linearly independent eigenvector , which is el = (1, 0, 0, 0, 0). Thus Qel = Xl is a linearly independent eigenvector of A. Also, note that the smallest positive integer k such that (A - U)k = (J - ul = 0 is 5 which is the order of the block J. To see what the other columns of Q are, we expand A Q Q J as

=

[Ax] AX2 AX3 AX4 AxS] = [AXI Xl + AX2 X2

+

AX3 X3

+

AX4 X4

By comparing the column vectors, we have

Axs = X4 + AX4 = X3 + AX3 = X2 + AX2 = Xl + AXI = AXI,

AXs, AX4, AX3, AX2,

or or or or or

(A (A (A (A (A

-

=

= X4,

=

= X2,

=

= O.

AI)xs (A - AI)IXS U)X4 = (A - U)2 xS = U)X3 (A - U)3 xS U)X2 = (A - AI)4 xs = U)XI (A - U)SXs

X3, XI,

+

AXS].

282

Chapter 8. Jordan Canonical Forms

=

=

Note that the vectorxs satisfies (A -A/)5 x5 0 but (A -Al)4 X5 Xl f:. O. However, (A -A/)5 = (J _A/)5 = O. Hence, if one gets xj as a solution of (A -A/)4x f:. 0, then all other x, 's can be obtained by X4 = (A -Al)X5, X3 = (A -Al)X4, X2 = (A -Al)X3, etc. Such a vectorxs is called a generalized eigenvector of rank 5, and the (ordered) set {xj , .. . , X5} is called a chain ofgeneralized eigenvectors belonging to A. Therefore, the columns of the basis-change matrix Q form a chain of generalized eigenvectors.

o

In general, by expanding A Q = QJ, one can see that the columns of Q corresponding to the first columns of Jordan blocks of J form a maximal set of linearly independent eigenvectors of A, and remaining columns of Q are generalized eigenvectors .

Definition 8.1 A nonzero vector X is said to be a generalized eigenvector of A of rank k belong ing to an eigenvalue A if (A - A/)k x = 0

and

(A - Al)k-l x

f:. O.

Note that if k = 1, this is the usual definition of an eigenvector. For a generalized eigenvector X of rank k 2: 1 belonging to an eigenvalue A,define

= = Xk-2 =

Xk

Xk-l

X2 XI

= =

X, (A - A/)Xk (A - A/)Xk-I

= =

(A - A/)X, (A - A/)2 x,

(A - A/)X3 (A - A/)X2

= =

(A - Al)k-2 x , (A - Al)k-I X.

Thus, for each i, 1 < i :::: k, we have (A - A/)lxl = (A - Al)k x = 0 and (A - A/)l-I x l = XI ;;f: O. Note also that (A - Al)lXj = 0 for i 2: i. Hence, the vector Xl = (A - Al)k-l x is a generalized eigenvector of A of rank l. See Figure 8.1.

Definition 8.2 The set of vectors {xI , X2 , ... , Xk} is called a chain of generalized eigenvectors belonging to the eigenvalue A. The eigenvector XI is called the initial eigenvector of the chain. The following successive three theorems show that a basis-change matrix Q can be constructed from the chains of generalized eigenvectors initiated from s linearly independent eigenvectors of A, and also justify the invertibility of Q. (A reader may omit reading their proofs below if not interested in details .)

Theorem 8.2 A chain of generalized eigenvectors {XI, X2, ... , Xk} belonging to an eigenvalue A is linearly independent.

Let CIXI + C2X2 + .. . + CkXk = 0 with constants ci, i = 1, .. . , k. If we multiply (on the left) both sides of this equation by (A - Al)k-I , then for i = 1, ... ,k-l,

Proof:

8.2. Generalized eigenvectors

.



.

283

.

'-A.../ A - AI

A - >..I

A - >..I

Figure 8.1. A chain of generalized eigenvectors

(A - AI)k-l Xi

= (A -

).,l)k-(i+I)(A - AI)i Xi

= O.

Thus, Ck(A - AI)k-l Xk = 0, and, hence, Ck = O. Do the same to the equation ClXl + ... +Ck-lXk-1 = 0 with (A - AI)k-Z and get Ck-l O. Proceeding successively, Onecan show that ci 0 for all i 1, . . . , k. 0

=

=

=

Theorem 8.3 The union of chainsof generalized eigenvectorsofa square matrix A belongingto distinct eigenvalues is linearlyindependent. Proof: For brevity, we prove this theorem for only two distinct eigenvalues. Let {Xl,Xz, . . . , Xk} and {Yl,Yz, . . . , Yd be the chains of generalized eigenvectors of A belonging to the eigenvalues x and u, respectively, and let X ::f: u: In order to show that the set of vectors [xj , ... ,Xk, Yl, .. ., Yd is linearly independent, let elxl + . . . +

qXk

+ dlYl + . . . + deYe = 0

with COnstants ci 's and dj ' SoWe multiply both sides of the equation by (A - ).,l)k and note that (A - Al)k Xi = 0 for all i = 1, .. . , k. Thus we have (A - )..l)k(d1YJ + dzyz + . . . + deye) = O.

Again, multiply this equation by (A - f..L/)e-1 and note that (A - f..Ll)e-I(A - ).,l)k = (A - f..Ll)e-l Ye = (A - f..Ll)e-l Yi =

(A - ).,l)k(A - f..Ll)e-l,

Yl,

0

for i = 1, ... , f. - 1. Thus we obtain

Because (A - f..L/)Yl = 0 (or AYI = f..LYl), this reduces to de(f..L - )..lYl = 0,

284

Chapter 8. Jordan Canonical Forms

which implies that dl = 0 by the assumption A =1= /.l and YI =1= O. Proceeding successively, one can show that dj = 0, i = .e,.e - 1, .. . ,2,1, so we are left with CIXt

+ . .. +

qXk

= O. =

Since {XI, ... , xd is already linearly independent by Theorem 8.2, c, 0 for all i = 1, . . . , k. Thus the set of generalized eigenvectors {x I , . .. , Xb Yt , ... , Yel is linearly independent. 0 The next step to determine Q such that A Q = Q J is how to choose chains of generalized eigenvectors from a generalized eigenspace, which is defined below, so that the union of the chains is linearly independent.

Definition 8.3 Let A be an eigenvalue of A. The generalized eigenspace of A belonging to A, denoted by K)., is the set K). = {x

E

en : (A -

AI)P X

=0

for some positive integer

pl.

It turns out that dim K). is the multiplicity of A, and it contains the usual eigenspace N(A - AI) . The following theorem enables us to choose a basis for K)., but we omit the proof even though it can be proved by induction on the number of vectors in SU T.

Theorem 8.4 Let S =

{XI, X2, . . . , Xk} and T = {YI, Y2, .. . , Ytl be two chains of generalized eigenvectors ofA belonging to the same eigenvalue A.lfthe initial vectors XI and y ; are linearly independent. then the union SU T is also linearly independent.

Note that Theorem 8.4 extends easily to a finite number of chains of generalized eigenvectors of A belonging to an eigenvalue A, and the union of such chains will form a basis for K). so that the matrix Q may be constructed from these bases for each eigenvalue as usual.

Example 8.7 (Basis-change matrix for a triangular matrix) For a matrix A =

21 4] [ 00 02 3-1 ,

find a basis-change matrix Q so that Q-I A Q is the Jordan canonical matrix.

Solution: Method 1: In Example 8.4, the Jordan canonical form J of A is determined as J=

[

210] 0 2 0

.

003

By comparing the column vectors of AQ

= QJ with Q = [XI X2 X3], one can get

8.2. Generalized eigenvectors

285

Since XI and X3 are eigenvectors of A belonging to A = 2 and A = 3, one can take XI = (1, 0, 0) and X3 = (3, -1, 1) . Also, from the equation AX2 = 2X2 + XI, one can conclude X2 = (a , 1, 0) with any constant a, so that 3 ] 1 a 0 1 -1 . [ 001

Q=

One may check directly the equality A Q = QJ by a direct computation.

Method 2: This is a direct method to compute Q without using the Jordan canonical form J. Clearly, the eigenvalues of A are Al = A2 = 2, A3 = 3. Since rank(A All) = 2, the dimension of the eigenspace N(A - All) is 1. Thus there is only one linearly independent eigenvector belonging to Al = A2 = 2, and an eigenvector belonging to A3 = 3 is found to be X3 = (3, -1 , 1). We need to find a generalized eigenvector X2 of rank 2 belonging to A = 2, which is a solution of the following systems:

[~

(A - 2l)x =

(A - 2l)2 x

-:]. ~. -!]. ~.

1

0 0 0 0 0

~U

From the second equation , X2 has to be ofthe form (a , b, 0), and from the first equation we must have b :f; O. Let us take X2 = (0, 1, 0) as a generalized eigenvector of rank 2. Then, we have XI = (A - 2l)x2 = (1, 0, 0). Now, one can set

Q~['1'2'3]~

U! -:]

The reader may check by a direct computation Q-I AQ

where JI =

=

[~

210] [ o0 02 03 = [ 0

JI

; ] and

Jz

o

= [3].

Example 8.8 (Basis-change matrix for a companion matrix) Find a basis-change matrix Q so that Q-I A Q = J is the Jordan canonical form of the matrix 0 1

A=

0 0

o

0

o1

0]

0 01

[ -1 4 -6

4

'

286

Chapter 8. Jordan Canonical Forms

Solution: Method J: In Example 8.5, we computed

J=

1100] 1 1 0

o

[0 0

1 1

.

000 1

Now, one can find a basis-change matrix Q = [Xl X2 X3 X4] by comparing the column vectors of AQ = QJ : AXl=Xl , AX2=X2+Xl , AX3=X3+X2 , AX4=X4+X3. By computing an eigenvector of A belonging to >.. = I, one can get xj = (1, 1, 1, 1). The equation AX2 = X2 + Xl gives X2 = (a, a + 1, a + 2, a + 3) for any a. Take X2 = (0, 1, 2, 3). Similarly, from equations AX3 = X3 + X2 and AX4 = X4 + X3, one can get X3 = (b, b, b + I, b + 3) for any b and set X3 = (0, 0, 1, 3), and successively one can take X4 = (0, 0, 0, 1). We conclude that

Q~U ~!

n

One may check A Q = QJ by a direct matrix multiplication. Method 2: The characteristic polynomial of the matrix A is

det(A - >..I) = >..4 - 4>..3 + 6>..2 - 4>" + 1 = (>.. _1)4 . The only eigenvalue of A is x = 1 of multiplicity 4. Note that dimN(A - J) = 1 because the rank of the matrix A _ J _

-

[

-1 1 00] 0 -1 1 0 0 0 -1 1 -1 4 -6 3

is 3. Thus, a basis-change matrix Q = lxr X2 X3 X4] consists of a chain of generalized eigenvectors. First, we find a generalized eigenvector X4 of rank 4, which is a solution Xof the following equations:

(A _1)3 x =

[-1 -3 1]

(A _1)4 x =

O.

3 -1 3 -3 1 -1 3 -3 1 -1 3 -3 1

X

f:

0,

But, a direct computation shows that the matrix (A - 1)4 = O. Hence, one can take any vector that satisfies the first equation as a generalized eigenvector of rank 4: Take X4 = (-I, 0, 0, 0). Then ,

[-~ n[-~] [H

287

8.2. Generalized eigenvectors

X3

=

X2

=

Xl

=

1 0 -1 1 (A -/)'4 = 0 -1 -1 4 -6 (A -/)X3 = (-1, 0, 1, 2), (1, 1, 1, 1). (A -/)X2

=

Now, one can set -1

~ -~]

o

1 o 2 1

0

.

0

Then,

100] 1 0 = J

o1 o

1 1

. h Q-I _ [

Wit

-

0 1

1 00]

00 -1 1 0 1 -2 1 3 -3 1 -1 0

.

o

Remark: In Examples 8.7-8 .8, we use two different methods to determine a basischange matrix. In Method 1, we first find an initial eigenvector Xl in order to get a chain {Xl, X2, .. . ,Xk} of generalized eigenvectors belonging to A. After that, we find X2 as a solution of (A - H)X2 = xi, and X3 as a solution of (A - A/)X3 = X2, and so on. In this method, we don't need to compute the power matrix (A - H)2 and (A - A/)3, etc. But, this method may not work sometimes, when the matrix A - H is not invertible. See the next Example 8.9. In Method 2, we first find a generalized eigenvector Xk of rank k as a solution of (A - )../)k x = 0 but (A - )..l)k-I x =1= O. With this Xko one can get Xk-l as Xk-l = (A - A/)Xk, and Xk-2 = (A - )../)Xk-l, and so on. This method works always, but we need to compute a power (A _ )../)k . The next example shows that a chain of generalized eigenvectors may sometimes not be obtained from an initial eigenvector of the chain. Example 8.9 For a matrix

A=

5-3 -2] [ 8 -5 -4 -4 3 3

,

find a basis-change matrix Q so that Q-l A Q is the Jordan canonical matrix. Solution: Method 1: The eigenvalue of A is A = I of multiplicity 3, and the rank of the matrix A - 1=

[ 4-3 -2] 8 -6 -4 -4 3 2

288

Chapter 8. Jordan Canonical Forms

is 1. (Note that the second and the third rows are scalar multiples of the first row.) Hence, there are two linearly independent eigenvectors belonging to A. = 1, and the Jordan canonical form J of A is determined as

J=

° J.

1 1 010 [ 001

Now, one may find a basis-change matrix Q = [XI X2 X3] by comparing the column vectors of AQ = QJ:

By computing an eigenvector of A belonging to A. = 1, one may get two linearly independent eigenvectors: take UI (l, 0, 2) and U2 (0, 2, -3). If we take the eigenvector XI as UI or U2, then a generalized eigenvector X2 must be a solution of

=

=

But this system is inconsistent and one cannot get a generalized eigenvector X2 in this way. It means that we are supposed to take an eigenvector XI E £(1) carefully in order to get X2 as a solution of (A - I)x XI .

=

Method 2: First, note that A has an eigenvalue A. = 1 of multiplicity 3 and there are two linearly independent eigenvectors belonging to A. = 1. Hence, we need to find a generalized eigenvector of rank 2, which is a solution X of the following equations:

(A -I)x

=

(A _1)2 x

=

U=~ =n [n ~

0,

O.

But, a direct computation shows that the matrix (A - 1)2 = O. Hence, one can take any vector that satisfies the first equation as a generalized eigenvector of rank 2: take X2 = (0, 0, -1). Then,

x\

~

(A -I)x,

~ [j =~

=n UJ ~ Ul

Now, by taking another eigenvector X3 = (1, 0, 2), so that XI and X3 are linearly independent, one can get

8.3. The power A k and the exponential e A

-n

One may check by a direct computation

Q-'AQ=

1 1 0] 0 1 0 =J [ 001

289

o

Problem 8.5 (From Problem 8.4) Find a full set of generalized eigenvectors of the following

matrices:

(2)

[

412] 0 4 2 004



8.3 The power A k and the exponential eA In this section, we discuss how to compute the power Ak and the exponential matrix

e A , which are necessary for solving linear difference or differential equations as

mentioned in Sections 6.3 and 6.5. It can be dealt with in two ways. Firstly, it can be done by computing the power Jk and the exponential matrix e J for the Jordan canonical form J of A, as shown in this section. The second method is based on the Cayley-Hamilton theorem (or the minimal polynomial) and it will be discussed later in Sections 8.6.1-8.6.2. Let J be the Jordan canonical form of a square matrix A and let

with a basis-change matrix Q. Since

for k = 1,2, . .. • it is enough to compute Jk for a single Jordan block J. Now an n x n Jordan block J belonging to an eigenvalue). of A can be written as

).J

+

N.

290

Chapter 8. Jordan Canonical Forms

Since I is the identity matrix, clearly IN = N I and

e,

Note that N k = 0 for k ~ n. Thus, by assuming (~) = 0 if k <

=

o Next, to compute the exponential matrix e A , we first note that

Thus, it is enough to compute e J for a single Jordan block J. Let J = AI before. Then, N k = 0 for k ~ nand

1 0 eJ =eJ..leN = eA

n-l

Nk

k=O

k!

L-

I 21

-

1

I

1

(n - 2)!

(n - I)!

-

1

(n - 2)!

2!

= eA

1

1

-

2!

1 0

+

0

1

N, as

8.3. The power A k and the exponential e A

291

Example 8.10 (Computing Ak and eA by usingthe Jordan canonicalform) Compute the power Ak and the exponential matrix eA by using the Jordan canonical form of A for

(1)A=

2 2I - 4] I

[o 0

0

3

(2) A

=[

~ ~ ! ~].

-I 4 -6 4

Solution: (1) From Examples 8.4-8.7, one can see that

-i],

A=QJQ-l=[~0~0-i][~; ~][~ ~ 1003001 and

Jk =

[

2k 00

(~)2kk-l 2

o

0] 0

3k

Hence,

and

(2) Do the same process as (I). From Examples 8.5-8.8,

-~][~ i: ~][ ~ -l j ~1]' o 0 0 0 I

and

-1

3-3

292

Chapter 8. Jordan Canonical Forms

o o

1

@ (;) @

0

1 (~)

(~)

m

000 Now. one can compute Af

e

A

=

=

[ '1 -10 0I ell 0

e

1

2 1

[-I

1 2

2

5

and e

J

=e [

1

o1 11 o0 o0

~1 ~] 1 1

2I 1

0

1

.

= QJkQ-1 and e/' = Qe JQ-l.Forexample,

J[ :

-~o ~

0 0 0 0

o

1 -2o

1 2I 1

l]U

1 0

0]

1 o1 0 -1 1 -2 1 3 -3 1

J

']

0

-3 2 -3 13 21 17 -'6 8 -2 T

Example 8.11 (Computing Ak and e A by usingthe Jordancanonicalform) Compute the power A k and the exponential matrix eA by using the Jordan canonicalform of A for

o1

21 21 0 0 2 [ -1 1 0

A=

01] 0 . 3

Solution: (1) The characteristicpolynomialof A is det(AI - A) = >.. 4 _8>..3 +24>..232>" + 16 = (>" - 2)4 , and A = 2 is an eigenvalue of multiplicity4. Since

rank(A - 2l)

= rank

-1 1 1 1] o0 2 0 0 0 0 0

[

=2

-1 1 0 1

and rank(A - 2l)

2

= rank

[

00 00 01 00 ] 0 0 0 0

o

0 1 0

the Jordan canonicalform J of A must be of the form

= 1,

8.3. The power Ak and the exponential e A

o2100] 2I 0

020' [0 o0 0 2

J=

and a basis-change matrix Q such that Q-I AQ = J is

101 0] 1 0 '

2 0

o o

[ 2 -~ ~ -~] .

-1-1 Q = 0 o 1 0 2 -1 0 -2

and then

0 -1

Therefore,

O]k

2 1

Ak _ QJk Q-I _ Q -

and

0 2 1 [ 0 0 2

[

o

~ ~]k [~ =

02

0

Hence,

Zk Ak =

QJk Q-I = Q [

[

0

o

2k _ k2 k- 1 kZk-l o 2k -

[

(~)2k-1 k 2

0

@2

k-2]

(~)2k-l

] 0

Q_I

2k

o

~2k-2 + 2k2 k- 1

k2 k- 1

0

0

2k

-kZk-l

k2 k- 1

k(k;I)Zk-2

2k k2 k- 1 0

O'

k2 k- 1 + Zk

(Z) With the same notation,

o ] Q-l

eh where

]

'

293

294

Chapter 8. Jordan Canonical Forms

Hence, ell

= e21eN = e2 L'N' -, = e2 [ 01 k-O

-

Thus we have

e

A

~

gel g -l

~ ~ e'g [

0 0

;, ] I

.

I

e2 e2

2 I

I

I

~] g-l~ U'

1

I

k.

I

0

0 I

e2

0 0

j

J. e2 2

2 1 2 o 1 00 (1) A = 0 0 2 0 [ o 0 o 1

0 -3

(2) A

'

=

2 1 -11 2 1 -1 2 [ -2 -3 1 4

-2 -2

o

.

0

~e2 2e 2

Problem 8.6 Compute A k and e A by using the Jordan canonical form for

o

e'0 ]

2e 2 e2

j .

8.4 Cayley-Hamilton theorem As we saw in earlier chapters, the association of the characteristic polynomial with a matrix is very useful in studying matrices. In this section, using this association of polynomials with matrices, we prove one more useful theorem, called the CayleyHamilton theorem, which makes the calculation of matrix polynomials simple, and has many applications to real problems. Let f(x) = amx m + am_lX m-1 + . . . + alx an n x n square matrix. The matrix defined by

+ ao be a polynomial, and let A be

is called a matrix polynomial of A. For example, if f(x) = x 2 - 2x

f(A) =

+

2 and A = [ ;

~], then

A 2-2A+2h

= [~~]-2[;

~]+2[~ ~]=[~ ~] .

8.4. Cayley-Hamilton theorem

295

Problem 8.7 Let}. be an eigenvalue of a matrix A . For any polynomial f(x), show that f(}.) is an eigenvalue of the matrix polynomial f(A) .

Theorem 8.5 (Cayley-Hamilton) For any n x n matrix A , is the characteristic polynomial of A, then f(A) O.

=

if f(A)

= det(Al - A)

Proof: If A is diagonal , its proof is an easy exercise. For an arbitrary square matrix A, let its Jordan canonical form be

sothatf(A)

= Qf(J)Q-I. Since and

f(J) = [f(JI>

o

".

0

],

f(Js)

it is sufficient to show that f(Jj) = 0 for each Jordan block Jj. Let Jj = Aol + N with eigenvalue Ao of multiplicity m, in which N'" = O. Since f(A) = det(Al - A) = det(Al - J) = (A - Ao)mg(A) for some g(A) , we have

f(Jj)

= (Jj -

Aol)mg(Jj)

= (Aol +

N - Aol)mg(Jj)

= Nmg(Jj) = 0 g(Jj) = O. o

Remark: For a Jordan block 0

J=[1

:]

o

Ao

with a single eigenvalue Ao of multiplicity m, we have ( k ),k-m+1

.. . \m-I 1\.0

o Hence, for any polynomial p(A),

296

Chapter 8. Jordan CanonicalForms a~m-l ) pO.O) (m-l) !

p(J) = cl).pO·o) p().o)

o

where 11>.. denotes the derivative with respect to x . In particular, for the characteristic polynomial f().) = det(Al - J) = (). - ).o)m, we have f().o) = a>..f().o) = ... = aim-I) f().o) = 0 and hence f(J) = O. Example 8.12 The characteristic polynomial of A=

is f().)

=

3 6 0 2 [ -3 -12

j]

6)" and

det(Al - A)

=).3 +).2 -

f(A) =

A 3+A2-6A

[27o 78 54]o + [-9 -42 -18] 8 -27 -102 -54

=

0 9

-6[ : 6 6] [0 2

0

-3 -12 -6

=

4 30

0

0 0

0 0

0 18

n

o

Problem 8.8 Let us prove the Cayley-Hamilton Theorem 8.5 as follows: by setting A = A , f(A) det(AI - A) det 0 O. Is it correct or not? If not, what is a wrong step?

=

=

=

Problem 8.9 Prove the Cayley-Hamilton theorem for a diagonal matrix A by computing f(A) directly. By using this, do the same for a diagonalizable matrix A.

The Cayley-Hamilton theorem can be used to find the inverse of a nonsingular matrix. If f().) = ).n + an_l).n-l + . . . + al). + ao is the characteristic polynomial of a matrix A, then or

O=f(A) =

An+an_lAn-l+ ... +alA+aoI,

- aoI

(A n- l + an_lA n- 2 + . . . + all)A.

Since ao = f(O) = det(OI - A) = det( -A) = (_l)n det A, A is nonsingular if and only if ao = (_l)n det A :/= O. Therefore, if A is nonsingular, 1 n- l +an-lA n-2 + .. . +all). A -1 = --(A ao

8.4. Cayley-Hamilton theorem

297

Example 8.13 (Compute A-I by the Cayley-Hamilton theorem) The characteristic polynomial of the matrix A is f()...) yields

=

det()...h - A)

=)...3 -

=

[

42-2]

-5 3 -2 4

2 I

8)...2 + 17A - 10, and the Cayley-Hamilton theorem

Hence

-2] +-17 [1 2 1

10

0 1

o Problem 8.10 Let A and B be square matrices , not necessarily of the same order, and let f()..) = det(A1 - A) be the characteristic polynomial of A. Show that f(B) is invertible if and only if A has no eigenvalue in common with B.

The Cayley-Hamilton theorem can also be used to simplify the calculation of matrix polynomials. Let p()...) be any polynomial and let f()...) be the characteristic polynomial of a square matrix A. A theorem of algebra tells us that there are polynomials q()...) and r()...) such that

p()...)

= q()...)f()...) + r()...) ,

where the degree of r()...) is less than the degree of f()...). Then

p(A) = q(A)f(A) + r(A).

= 0 and p(A) = r(A).

By the Cayley-Hamilton theorem, f(A)

Thus, the problem of evaluating a polynomial of an n x n matrix A or in particular a power A k can be reduced to the problem of evaluating a matrix polynomial of degree less than n. Example 8.14 The characteristic polynomial of the matrix A = [ ; 3. Let p()...) = f()...) gives that

)...2 _ 2)", -

)...4

_7A 3

- 3)...2

i]

is f()...) =

+ ).. + 4 be a polynomial. A division by

298

Chapter 8. Jordan Canonical Forms p(),)

= (),2 -

5), - lO)f(),) - 34), - 26.

Therefore p(A) =

(A 2 - 5A - lO)f(A) - 34A - 26/

=

-34A -26/

=

-34 [;

i] - [b 26

~] = [ =~~

-68 ] -60 .

o

Example 8.15 (Computing A k by the Cayley-Hamilton theorem) Computethepower A 10 by using the Cayley-Hamiltontheoremfor

A=

o1 1 2 1 21 0] 002 0 . [ - 1 1 0 3

Solution: The characteristic polynomial of A is f(),) = det(Al - A) =), 4 - 8),3 + 24),2 - 32), + 16 = (), - 2)4, see Example 8.11. A divisionby f(),) gives ), 10

= q(),)f(),)

+

r(),)

with a quotientpolynomial q(),) = ),6+8),5+40),4+ 160),3 +560),2+ 1792),+5376 and a remainderpolynomial r(),) = 15360),3 -80640),2+ 143360),-86016. Hence, A IO = =

r(A) = 15360A 3 - 80640A 2 + 143360A - 86016/

-4096 5120 16640 5120] o 1024 10240 0 o 0 1024 O ' [ -5120 5120 11520 6144

(Compare this result with Ak givenin Example 8.11). One may noticethat this computationalmethodfor An will becomeincreasingly complicated if n becomesbigger and bigger. A simpler methodwill be shownlater in Example 8.22. 0

Problem 8.11 For the matrix A

=[

I 0 1]

0 2 0

002

,(1) evaluate the power matrix A 10 and the

inverse A-I ; (2) evaluate the matrix polynomial A 5 + 3A 4

+

A 3 - A 2 + 4A

+

6/.

8.5. The minimal polynomial

299

8.5 The minimal polynomial of a matrix Let A be a square matrix of order n and let

to.) = det(H -

t

A)

= An + an_lAn-I + """ + alA + ao = TI (A -

Ai)m}.j

i= 1

be the characteristic polynomial of A, where mi; is the multiplicity of the eigenvalue Ai. Then, the Cayley-Hamilton theorem says that An

t

TI (A -

+ an_lAn-I +.. . + alA + ao! =

Ail)ml.j

= O.

i=1

The minimal polynomial of a matrix A is the monic polynomial m(A) = of smallest degree m such that m

m(A) =

L ci A i =

Er=o CiAi

o.

i=O

=

Clearly, the minimal polynomial m(A) divides any polynomial p(A) satisfying p(A) O. In fact, if p(A) = q(A)m(A) + r(A) as on page 297, where the degree of r(A) is less than the degree of m(A), then 0 = p(A) = q(A)m(A) + r(A) = r(A) and the minimality of m(A) implies that r(A) O. In particular, the minimal polynomial m(A) divides the characteristic polynomial so that

=

t

m(A) =

TI (A -

Ai)k j

i=1

with k; ~ mi. For example, the characteristic polynomial of the n x n zero matrix is An and its minimal polynomial is just A. Clearly, any two similar matrices have the same minimal polynomial because

for any invertible matrix Q. Example 8.16 For a diagonal matrix

A-

[i

0 2 0 0

0 0 2 0

0 0 0 5

0 0 0

n

300

Chapter 8. Jordan Canonical Forms

its characteristic polynomial is f(A) = (A- 2)3 (A- 5)2. However, for the polynomial m(A) = (A - 2)(A - 5), the matrix m(A) is

m(A) = (A - 2/)(A - 5/) =

[~ ~ ~ ~ ~] [-~ -~ -~ ~ ~~] 00030 00003

Hence, the minimal polynomial of A is m(A)

0000 0000

= (A -

= O.

o

2)(A - 5).

Example 8.17 (The minimal polynomial ofa diagonal matrix) (1) Any diagonal matrix of the form AO/ has the minimal polynomial A - AO. In particular, the minimal polynomial of the zero matrix is the monomial A, and the minimal polynomial of the identity matrix is the monomial A - I. (2) If an n x n matrix A has n distinct eigenvalues AI , ... , An , then its minimal polynomial coincides with the characteristic polynomial f(A) = (A - Ai). In fact, for the diagonal matrix D having distinct diagonal entries AI, .. . , An successively and for any given j, the (j, j)-entry of Oi;=j (D - Ai l) is equal to Oi;=/Aj - Ai), which is not zero . 0

07=1

Example 8.18 (The minimal polynomial ofa Jordan canonical matrix J having one or two blocks)

[I ! ~ i ~],

(1) For any 5 x 5 matrix A similar to a Jordan block of the form

Q-I AQ = J =

OOOOAO

its minimal polynomial is equal to the characteristic polynomial f(A) = (A AO)5, because (J - AO/)4 :p 0 but (J - AO/)5 O. (2) For a matrix J having two Jordan blocks belonging to a single eigenvalue A, say

=

JI

J = [ 0

0]

h

=

AO [

o o

1 0]

AO 1 0 AO

the minimal polynomial of the smaller block Jz is a divisor of the minimal polynomial of the larger block JI . In general, if a matrix A or its Jordan canonical form J has a single eigenvalue AO, then its minimal polynomial is (A - AO)k, where k is the smallest positive integer l such that (A - A/)i = O. In fact, such number k is known as the order of the largest Jordan block belonging to A.

8.5. The minimal polynomial

301

(3) For a matrix J having two Jordan blocks belonging to two different eigenvalues AO :/= AI respectively, say

[~

J-[iJ 0]_ 0 h -

I 0 0 AO I o 0 o 0 AO 1 0 0 0 AO I 0 0 o AO

]

[~

1 Al 0

n

the minimal polynomial of J is a product of the minimal polynomials of JI and h which is (A - Ao)5(A - AI)3. In general, for a Jordan canonical matrix J, let h denote the direct sum of Jordan blocks belonging to the eigenvalueA. Then, J = E9:=1 JA; , where t is the number of the distinct eigenvalues of J. In this case, the minimal polynomial of J is the product of the minimal polynomials of the summands hi's. 0 By a method similar to Example 8.18 (2) and (3) with Step 2(i) on page 279, one can have the following theorem. Theorem 8.6 For any n x n matrix A or its Jordan canonical matrix J. its minimal polynomial is n:=1 (A - Ai)ki , where AI, . . . , At are the distinct eigenvalues of A and k; is the order ofthe largest Jordan block in J belonging to the eigenvalue Ai. Or equivalently, k, is the smallest positive integer i such that rank(A - Ai /)l + m A; n, where mAl is the multiplicity of the eigenvalue Ai.

=

Corollary 8.7 A matrix A is diagonalizable if and only if its minimal polynomial is equal to n:=1 (A - Ai), where AI, ... , At are the distinct eigenvalues of A. Example 8.19 (Computing the minimal polynomial) Compute the minimal polynomial of A for (I)A=

20 2I -I4]

[o

0

3

01 o 0 00] 1 0

[ -10

(2) A =

0

0 1

.

4 -6 4

Solution: (1) Since A is triangular, its eigenvalues are Al = A2 = 2, A3 = 3. But, rank(A - 2/) = 2 and so A is not diagonalizable. Hence, its minimal polynomial is m(A) = (A - 2)2(A - 3). (2) Recalling Example 8.5, we know that the eigenvalue of A is A = 1 of multiplicity 4, and rank(A - I) = 3. It implies that the Jordan canonical form of A is 1 1 o J = 0 I 1 0 o 0 1 1 ' [

0]

o 0 o

1

302

Chapter 8. Jordan Canonical Forms

and the minimal polynomial of A is meA) = (A - 1)4, by Theorem 8.6.

0

Example 8.20 Compute the minimal polynomial of A for

o1 12 1 2 1 0 002 0 [ -1 1 0 3

A=

J

.

Solution: In Example 8.11, we show that the characteristic polynomial of A is f(A) = (A - 2)4 , rank(A - 2/) = 2, rank(A - 2/)2 = 1 and the Jordan canonical form J of A is

J=[H o

HJ.

0 0 2

So, the minimal polynomial of A is meA) = ()" - 2)3, by Theorem 8.6.

0

Problem 8.12 In Example8.2, we haveseen that there are seven possible(nonsimilar) Jordan canonicalmatricesof order 5 that have a single eigenvalue. Computethe minimalpolynomial of each of them.

8.6 Applications 8.6.1 The power matrix Ak again We already know how to compute a power matrix Ak in two ways: One is by using the Jordan canonical form J of A with a basis-change matrix Q as shown in Section 8.3; and the other is by using the Cayley-Hamilton theorem with the characteristic polynomial of A as shown in Section 8.4. In this section, we introduce the third method by using the minimal polynomial , as a possibly simpler method. The following example demonstrates how to compute a power A k and A-I for a diagonalizable matrix A by using the minimal polynomial instead ofthe characteristic polynomial and also without using the Jordan canonical form J of A.

Example 8.21 (Computing A k by the minimalpolynomial when A is diagonalizable) Compute the power A k and A-I by using the minimal polynomial for a symmetric matrix 4 0

0 4

A= [

~ -~0 J .

1 1 5 -1 1

o

5

8.6.1. Application: The power matrix Ak again

303

Solution: Its characteristic polynomial is f (A) = (A - 3)2(A - 6)2. Since A is symmetric and so diagonalizable, its minimal polynomial is meA) = (A - 3)(A - 6), or equivalently meA) = A 2 - 9A + 181 = O. Hence , I I 1 A- =--(A-9l)=-18 18

[

-5 0 I-I] 0 -5 I I I 1 -4 0 -I 1 0-4

.

To compute Ak for any natural number k, first note that the power Ak with k ~ 2 can be written as a linear combination of I and A, because A 2 = 9A - 18/. (See page 297.) Hence, one can write

with unknown coefficients Xi'S. Now, by multiplying an eigenvector x of A belonging to each eigenvalue Ain this equation, that is, by computing Akx = (xol + XI A)x, we have a system of equations as A = 3, as A = 6, Its solution is Xo

= 2 . 3k -

3k = 6k =

6k and XI

Xo Xo

= 1(6k -

+ +

3XI ,

6xI.

3k). Hence , 6k

_

3k

6k

-

3k

3k

+

2. 6k

o Now, we discuss how to compute a power A k in two separate cases depending on the diagonalizability of A. (1) Firstly, suppose that A is diagonalizable. Then its minimal polynomial is equal to meA) = (A - Ai), where AI , . . . , AI are the distinct eigenvalues of A. Now, by using meA) = (A - Ail) = 0 as in Example 8.21, one can write

0:=1

0:=1

A k = xoI

+

x\A + ... + xl_ IA I - I

with unknowns Xi 'S. Now, by multiplying an eigenvector Xi of A belonging to each eigenvalue Ai in this equation , one can have a system of equations

304

Chapter 8. Jordan Canonical Forms

Its coefficient matrix P.. {] is an invertible Vandermonde matrix of order k because the eigenvalues Ai'S are all distinct. Hence, the system is consistent and its unique solution Xi'S determines the power A k completely. (2) Secondly, let A be any n x n, may not be diagonalizable. By Theorem 8.6, the minimal polynomial of A is m(A) = (A - Ai )k; , where AI, ... , A, are the distinct eigenvalues of A and ki is the smallest positive integer l such that rank(A Ai l)l + mA; n, Let s L::=1 k, be the degree of the minimal polynomial m(A) and let A k = xoI + xlA + .. . + xs_lA s- l

m=1

=

=

with unknown Xi'S as before. Now, by multiplying an eigenvector x of A belonging to each eigenvalue Ai in this equation, we have

Af = Xo

+

AiXl + ... + Af-lxs_l.

For the eigenvalue Ai, let {Xl, X2, .. . , Xkj} be a chain of generalized eigenvectors belonging to Ai. Then these vectors satisfy

=

AjXl,

Ai X2 + Xl,

By using the first two equations repeatedly, one can get

AkX2 = Afx2 + kAf-lXl . On the other hand,

Akx2

=

(xoI+XlA+ "'+XS_lA S - l)X2

=

XOX2

=

(XO+AiXl+"'+Af-lXs-l)X2

+ Xl(AiX2 + +

Xl) + .

.. +

Xs-l (Af- lX2 + (s - 1)Af-2Xl)

(Xl + 2AiX2 + .

.. +

(s - 1)Ar2Xs_l) Xl.

In these two equations of Akx2, since the vectors Xl and X2 are linearly independent, their coefficients must be the same. It means that =

Xo

+

AiXl Xl

+ +

+

+

2AiX2 +

+

ArX2

, s-l "i Xs-l, (s - 1)Ar2xs_l.

Here, the second equation is the derivative of the first equation with respect to Ai. Similarly, one can write AkXk; as a linear combination oflinearly independent vectors xi, X2, . .. , Xkj in two different ways and then a comparison of their coefficients gives the following ki equations:

8.6.1. Application: The power matrix A k again

).} (~)A/-I (~)Ajk-2

= = =

+ (DXI + @X2 +

+ ... + + .. . + + ... +

AjXI

xo

(i)AiX2 @Ai X3

305

As-I i Xs-I, I)AS - 2 I i Xs-I , I S 3 2 )Ai - Xs-I,

ee-

Note that the last ki - 1 equations are equivalent to the consecutive derivatives of the first equationwith respect to Ai . For example, the two times of the third equation is just the second derivative of the first equation, and the (ki - I)! times of the last equationis the (ki - 1)-th derivative of the firstequation. Gettingtogetherall of such equations for each eigenvalue Ai, i = 1, .. . , s, one can get a system of s equations with s unknowns x j 's with an invertible coefficient matrix. Therefore,the unknowns xo, . . . , Xs-I can be uniquely determined and so can the power Ak , Example 8.22 (Computing A k by the minimal polynomial when A is not diagonalizable) (Example8.11 again) Computethe power A k and A -I by using the minimal polynomial for

A=

I 21I2 01]

o

[ -10

0 2 0 1 0 3

'

Solution: In Example 8.20, the minimal polynomial of A is determinedas m(A) (A - 2)3. Hence, one can write k

A = xoI

+

+

xIA

x2A2

with unknown coefficients Xi'S and

~ ~ ~8]'

2

A = [

:

-4 4 1

Now, at the eigenvalue A = 2, one can have as A = 2,

2k

a", aI,

k2k- 1

take take

k(k -

1)2k -

2

= = =

Xo

+

2xI XI

+ +

= k(k -

1)2k- 3; xI

= k(2 -

k)2 k _ I and XQ

2· 2x2, 2X2 ·

Its solutionis X2

22x2,

= 2k

(1 3+ ) "ik2 - "ik

1 ,

=

306

Chapter 8. Jordan Canonical Forms

Thus, one can have (the same power as in Example 8.11)

+ k2 )

2k - 3(3k

k2 k 2k k(k - 1)2k- 3

To find A -1, first note that m(A) = A 3 - 6A 2

A

-1

1

= '8 (A

2

+

- 6A

Problem 8.13 (Problem 8.11 again)

12/)

1

= '8

[

+

12A - 8/ = O. Hence,

6 -2 -1 -2] 0 4 -4 0 0 0 4 0 . 2 -2

F., the matrix A ~

1

o

2

[~ ~ ~].

,"",.,1< the pow"

matrix A 10 and the inverse A - 1.

Problem 8.14 (Problem8.6 again)Compute Ak by using the minimal polynomial for

0]

2 1 (1) A

=

o

2 o 1 0

o

0

0 0 2 0 [

o

0 -3

'

1

(2) A

=

12]

1 - 1 2 1 -1 2 [ -2 -3 1 4 -2 -2

.

8.6.2 The exponential matrix eA again As a continuation of computing the exponential matrix eA in Section 8.3 in which we use the Jordan canonical form J of A and a basis-change matrix Q, we introduce another method with its minimal polynomial. This method is quite similar to computing a power A k discussed in Section 8.6.1 (see also Example 8.22). It means that we don't need the Jordan canonical form J of A and a basis-change matrix Q. Because of a similarity to computing A k , we state only the difference in computing the exponential matrix eA in this section . We also discuss how to compute eA in two cases depending on the diagonalizability of A. (1) First, let A be diagonalizable. Then its minimal polynomial is equal to m(A) n:=1 (A - Ai), where A1' .. . , At are the distinct eigenvalues of A. Now, as before, one can set

=

8.6.2. Application: The exponential matrix e A again

307

with unknowns Xj'S and by multiplying an eigenvector x of A belonging to each eigenvalue Aj in this equation, one can get

~ ~~

~l=~ ] [ ~~] [ :~: ] A;~l X,~, ,L .

[ ; A,

=

The coefficient matrix is invertible and the unique solution x i 's determines the matrix eA.

(2) Next, let A be any n x n, may not be diagonalizable. Then, the minimal polynomial of A is m(A) = m=1 (A - A;)k; , where AI, ... , At are the distinct eigenvalues of A and k; is the smallest positive integer f. such that rank(A - Aj /) l + mA; = n. Let s = L~=I k, be the degree of m(A) and let e

A

= xoI

+

xlA

+ .. . +

xs_IA

s- 1

with unknown Xj'S as before. For each eigenvalue Aj and a chain of generalized eigenvectors {XI , X2, . . . , Xk;} belonging to Aj, a parallel procedure in computing eAxk; to that for A kxk; gives the following kj equations :

eA; = teA; = "':'e A; =

AjXI

q)AjX 2 (2)A jX3

2!

+ + +

+ + +

,s-l

I\.j

s- l ) , s-2

( 1

I\.j

Xs-I, xs-I,

(S-2 l),I\.js-3 Xs-I,

Note that the last k; - 1 equations are equivalent to the consecutive derivatives of the first equation with respect to Aj. For example, the 2! times of the third equation is just the second derivative of the first equation, and the (kj - I)! times of the last equation is the (kj - 1)-th derivative of the first equation . Getting together all of such equations for each eigenvaluex. , i = 1, . .. , t, one can get a system of s equations with s unknowns x j 's with an invertible coefficient matrix. Therefore, the unknowns xo, .. . , Xs-I can be uniquely determined and so can the exponential eA.

Example 8.23 (Computing e A by the minimalpolynomialwhen A is diagonalizable) Compute the exponential matrix e A by using the minimal polynomial for A

=

[

5 -4 4]

12 -11 12 4 -4 5

.

Solution: First, recall that the matrix A is the coefficient matrix of the system oflinear differential equations given in Example 6.16:

308

Chapter 8. Jordan Canonical Forms

I

yj

= =

5Yl -

y~ Y~ =

l2Yl

4Y2

-

llY2

4Yl -

4Y2

+ + +

4Y3 l2Y3 5Y3·

It was known that A is diagonalizable with the eigenvalues Al = A2 = 1, and A3 = -3. Hence, the minimal polynomial of A is m(A) = (A - l)(A + 3) , and one can write e A xoI + Xl A with unknowns Xi'S. Then

=

as A = 1, as A = -3,

e = Xo + Xl, e- 3 = Xo - 3Xl.

By solving it, one can get Xo = t(3e + e- 3 ) , Xl = t(e - e- 3 ) and

e

A

= xoI +

2e - e- 3 -e + e- 3 e - e- 3 ] xlA = 3e - 3e-3 -2e + 3e-3 3e - 3e-3 . [ e - e- 3 -e + e- 3 2e - e- 3

o

Example 8.24 (Example 8.11 again) (Computing eA by theminimalpolynomialwhen A is not diagonalizable) Compute the exponential matrix e A by using the minimal polynomial for

A=

o1 1 2 1 21 0] [

0 0 2 0

.

-1 1 0 3

Solution: In Example 8.20, the minimal polynomial of A is computed as m(A) = (A - 2)3. Hence, one can set as in Example 8.22, e A = xoI +xlA +x2A2 with unknown coefficients Xi 'S and 2

A Now, at the eigenvalue A

A

~ ~ : ~8]'

-4 4 1

= 2, one can have

as A = 2, take 0)", take (0),,)2, Its solution is X2

=[

e2 = Xo e2 = e2 =

+

2xl Xl

+

+

= !e2; Xl = _ e2 and Xo = e2. Hence,

e =xoI +xlA +x2A

2

= e2 (I -

A+

~A2) = [ ~

_e 2

as shown in Example 8.11.

e2

~e2

e2 2e2 o e2

e2 !e 2

o

8.6.3. Application: Linear Difference equations again

309

Problem 8.15 (Problem 8.6 again) Compute eA by using the minimal polynomial for

(1) A

[~ ~ ~ ~]

= o

0 2 0

12]

0 -3 (2) A

'

-2 1 -1 2 -2 1 -1 2 [ -2 - 3 1 4

=

000 1

.

8.6.3 Linear difference equations again

=

A linear difference equation is a matrix equation Xn AXn-I with a k x k matrix A and if an initial vector Xo is given, then Xn = Anxo for all n. If the matrix A is diagonalizable with k linearly independent eigenvectors VI , V2, . •.• Vk belonging to the eigenvalues AI . A2, •.. , Ak, respectively, a general solution of a linear difference equation Xn AXn-I is known as

=

Xn = CIA'jVI

+

c2A2v2

+ .. . +

CkAkvk

with constants CI, C2, •.• • q . (See Theorem 6.13.) On the other hand, a linear difference equation Xn AXn-I can be solved for any square matrix A (not necessarily diagonalizable) by using the power An, whose computation was discussed in Sections 8.3 and 8.5.

=

Example 8.25 Solve a linear difference equation

o1

21 A= 0 0 [ - 1 1

Xn

= AXn-I, where

211] 0 2 0 . 0 3

Solution: In Example 8.11. it was shown that the matrix A has an eigenvalue 2 with multiplicity 4, and A is not diagonalizable. However, the solution of Xn = AXn-I is Xn Anxo , where

=

2n _ n2 n- I [

+

n2 )

0

~

0

n~ 2n

_n2n- 1

n2 n- 1

n(n - 1)2 n- 3

n

A =

n2 n- 1 2n- 3(3n

0

o

which is given in Examples 8.11 and 8.22

Problem 8.16 Solve the linear difference equation

21

(1) A _

-

0 2

[ o0 00

! ~], o

2

(2) A

=

Xn

= AXn-l for

[~ i ! ~], 0 0 0 2

(3) A

=

[~ i ~ ~2] ' 000

310

Chapter 8. Jordan Canonical Forms

8.6.4 Linear differential equations again Now, we go back to a system of linear differential equations

y' = Ay

with initial condition

y(O) = Yo.

Its solution is known as y(t) = etAyo. (See Theorem 6.27.) In particular, if A is diagonal izable, a general solution of y' = Ay is known as

y(t)

= etAyo = cteA1tvl + c2eA2tv2 + ...+

cneA.tvn,

where VI , V2 , . .. , Vn aretheeigenvectorsbelongingtotheeigenvaluesAl, A2, . . . , An of A, respectively . For any square matrix A (not necessarily diagonalizable), the matrix etA can be J be the Jordan canonical computed in two different ways. Firstly, let Q-l AQ etAyo is form of A. Then, the solution y(t)

=

=

where Q-lyO = (cj , ... , cn) and the Uj'S are generalized eigenvectors of A. In particular, if Q-I A Q = J is a single Jordan block with corresponding generalized eigenvectors u, of order k, then the solution becomes

iAyo

=

eAt QetN Q-I yO = eAt [UI U2 . . . un] 1

o x

o

=

eAt

((~Ck+ O.

8.7. Solve the system of linear equations

+ +

{ (1- i)x (1 + i)x

(1 (1

+ +

i)y i)y

= 2-i = 1 + 3i .

8.8. Solve the system of three difference equations:

= = =

{ X.+l

Yn+1 Zn+1

for n

= 0 , 1, 2, .. ..

[i

+ +

= AYn-1

for A

=

8.10. Solve Yn

= AYn-1

for A

= [=~

~

:]

2 -12 -6

s' = Ay for A = [~ ~ ] with Yo = [ ~

8.12. Solve y'

= Ay for A = [

=~

2:

+ + +

5Yn Yn Yn

~] with Yo = [ ~

8.9. Solve Yn

8.11. Solve

3xn Xn 2xn

1

with Yo

= (2,

1, 0) .

1

: ] with y(1)

2 -12 -6

2zn Zn 3zn

= (2,

1, 0) .

8.13. Solve the initial value problem YI(O) n(O) Y3(O)

8.14. Consider a 2 x 2 matrix A

= = =

-2 0

- 1.

= [~ ~].

(1) Find a necessary and sufficient condition for A to be diagonalizable.

(2) The characteristic polynomial for A is f()..) that f(A) = O.

= )..2 -

(a

+

d))"

+

(ad - be). Show

8.7. Exercises

317

8.15. For each of the following matrices, find its Jordan canonical form and the minimal polynomial .

2 -3 ]

(2) [ 7

-4

(3)

'

[~ ~ -~]. o

2 -1

8.16. Compute the minimal polynomial of each of the following matrices and from these results compute A -I , if it exists .

(1)

[~ ~J.

(2) [ ;

~

mU

J.

0 2 0

8.17. Compute A-I, An and e A for

(1)

[~ ~ J. (4)

(2)

["

4I 2]

o

4 2 004

1 0 0 2 0 0 o0 1 1 o0 0 2

(5)

,

(3)

n

[1 0 I] 0 2 1 001

[ 10 1 1 0 0 0] 0 0 1 1 0

,

.

-1 0 0 1

8.18 . An n x n matrix A is called a circulant matrix if the i -th row of A is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general form of the circulant matrix is

A

=

a2 a3 an al a2 [ "I an al an-I: a2 a3 a4

(1) Show that any circulant matrix is normal.

(2) Find all eigenvalues of the n x n circulant matrix

(3) Find all eigenvalues of the circulant matrix A by showing that n

A

= "L..Jai W;-I . ;=1

(4) Compute det A. (Hint: It is the produc t of all eigenvalues.)

318

Chapter 8. Jordan Canonical Forms (5) Use your answer to find the eigenvalues of

8.19. Determine whether the following statements are true or false, in general, and justify your answers . (1) Any square matrix is similar to a triangular matrix. (2) If a matrix A has exactly k linearly independent eigenvectors, then the Jordan canonical form of A has k Jordan blocks . (3) If a matrix A has k distinct eigenvalues, then the Jordan canonical form of A has k Jordan blocks.

=

(4) If two square matrices A and B have the same characteristic polynomial det(AI- A) det(AI - B) and for each eigenvalue Athe dimensions of their eigenspacesN(AI - A) and N (AI - B) are the same, then A and B are similar.

(5) lfa4x4matrix A has eigenvalues 1 and2,eachofmultiplicity2,suchthatdim E(1) = 2 and dim E(2) I, then the Jordan canonical form of A has three Jordan blocks .

=

(6) If there is an eigenvalue Aof A with multiplicity mj., and dim E(Aj) not diagonalizable. (7) For any Jordan block J with eigenvalue A, det eJ

:F mj."

then A is

= ej.,.

(8) For any square matrix A , A and A T have the same Jordan canonical form. (9) If f(x) is a polynomial and A is a square matrix such that f(A) multiple of the characteristic polynomial of A.

= 0, then f(x) is a

(10) The minimal polynomial of a Jordan canonical matrix J is the product of the minimal polynomials of its Jordan blocks Jj . (11) If the degree of the minimal polynomial of A is equal to the number of the distinct eigenvalues of A , then A is diagonalizable.

9

Quadratic Forms

9.1 Basic properties of quadratic forms In the beginning of this book, we started with systems of linear equations , one of which can be written as

+ a2x2 + .. . + anXn = b. The left-hand side al X l +a2x2+ ... +anXn = aT x of the equation is a (homogeneous) alXl

polynomial of degree I in n real variables. In this chapter, we study a (homogeneous) polynomial of degree 2 in several variables, called a quadraticform, and show that matrices also play an important role in the study of a quadratic form. Quadratic forms arise in a variety of applications, including geometry , number theory, vibrations of mechanical systems, statistics, electrical engineering, etc. A more general type of a quadratic form is a bilinearform which will be described in Section 9.6. As a matter of fact, a quadratic form (or bilinear form) can be associated with a real symmetric matrix, and vice-versa. A quadratic equation in two variables X and y is an equat ion of the form ax

2

+

2bx y

+

cy2

+

dx

+

ey

+f

= 0,

in which the left-hand side consists of a constant term f , a linear form dx + ey , and a quadratic form ax 2 + 2bxy + cy2 . Note that this quadratic form may be written in matrix notation as ax

2

+

2bxy + cl

where x

=[~

= [x

]

y] [ :

and

A

~] [ ~ ] = xT Ax,

= [:

~

l

Note also that the matrix A is taken to be a (real) symmetric matrix. Geometrically, the solution set of a quadratic equation in X and y usually represents a conic section, such as an ellipse , a parabola or a hyperbola in the xy -plane. (See Figure 9.1.) J H Kwak et al., Linear Algebra © Birkhauser Boston 2004

320

Chapter 9. Quadratic Forms

Definition 9.1 (1) A linear form on the Euclidean space jRn is a polynomial of degree 1 in n variables XI. X2, ...• Xn of the form n

b

T

X

=

L:>;x;. ;= 1

where x = [XI .. . xn]T and b = [bl ... bn]T in jRn. (2) A quadratic equation on jRn is an equation in n variables XI. X2 , .. . • Xn of the fonn n n n f(x) = L L aijx;xj + L bix; + c = O. ;=1 j=1

;=1

where aij , bj and c are real constants. In matrix form. it can be written as

= xT Ax + b T X + c = 0, . . , xn]T and b = [bl .. . bnf in lRn •

f(x)

where A = [aij]. x = [XI (3) A quadraticform on jRn is a (homogeneous) polynomial of degree 2 in n variables XI. X2 • . . . •Xn of the form

q(x)

= xT Ax = [XI X2

., . xn][aij]

X2 XI ] : [ .

n

n

= LLa;jX;Xj, ;=1 j=l

Xn

where x = [XI X2 .. . xnf E jRn and A = [aij] is a real n x n symmetric matrix. It is also possible to define a linear form and a quadratic form on the Euclidean complex n-space en instead of the Euclidean real n-space lRn • Definition 9.2 (1) A linear form on the complex n-space en is a polynomial of degree 1 in n complex variables XI. X2 • . . . •Xn of the form n

bH x = " L.Jb;x; '- , ;= 1

=

=

where x [XI .. . xn]T and b [bl . . . bnf in en. (2) A complex quadratic form on en is a polynomial of degree 2 in n complex variables Xl. X2 • . . . •Xn of the form

where x E en and A

= [aij] is an n x n Hermitian matrix .

9.1. Basic properties of quadratic forms

321

The real quadratic form on jRn and the complex quadratic form on en can be denoted simultaneously as q(x) = (x, Ax) by using the dot product on the real n-space jRn or on the complex n-space en . Remark: (I)A quadratic equation f(x) is said to be consistent ifithas a solution, i.e., there is a vector x E jRn such that f(x) = O. Otherwise, it is said to be inconsistent. For instance, the equation 2x 2 + 3 y 2 = -1 in jR2 is inconsistent. In the following, we will consider only consistent equations. (2) A linear form is simply the dot product on the real n-space jRn or on the complex n-space en with a fixed vector b. (3) The matrix A in the definition of a real quadratic form can be any square matrix. In fact, a square matrix A can be expressed as the sum of a symmetric part B and a skew-symmetric part C, say

A = B+C,

1

where B = -(A

2

+

TIT

A ) and C = -(A - A ).

2

For the skew-symmetric matrix C, we have

Hence, as a real number, x T Cx = O. Therefore, q(x) = x T Ax = x T (B

+

C)x

= x T Bx.

This means that, without loss of generality, one may assume that the matrix A in the definition of a real quadratic form is a symmetric matrix. (4) For the definition of a complex quadratic form, let A be any n x n complex matrix. Then, for any x E en, the matrix product x H Ax is a complex number. But, for the matrix A, it is known that there are Hermitian matrices Band C such that A = B + iC. (See page 264.) Hence, x H Ax = x H (B

+

iC)x = x H Bx

+

ix H Cx ,

in which x H Bx and x H Cx are real numbers. Hence, for a complex quadratic form on

en , we are only concerned with a Hermitian matrix A so that x H Ax is a real number for any x E en.

The solution set of a consistent quadratic equation f (x) = x T Ax + b T x + c = 0 is a level surface in jRn, that is, a curved surface that can be parameterized in n - 1 variables. In particular, if n = 2, the solution set of a quadratic equation is called a quadratic curve, or more commonly a conic section. When n = 3, it is called a quadratic surface, which is an ellipsoid, a paraboloid or a hyperboloid. Example 9.1 (The standard three types of conic sections) (1) (circle or ellipse)

~ + ~ = 1 with A = [~ ~] .

322

Chapter 9. Quadratic Forms 2

2

2

(2) (hyperbola) ~ - ~ = 1 or ;;. -

[* _~]

A= (3) (parabola) x

ir2 =

1 with

A=

or

[-t

~].

= ay or y2 = bx with A = [~ ~] or A = [~ ~].

2

All of these cases are illustrated in Figures 9.1 as conic sections.

0

Figure 9.1. Conic sections

Example 9.2 (The standardfour typesof quadratic surfaces) (1) (ellipsoids)

~+ ~+ ~=

1 with A

=

00&

.

2

(2) (hyperboloids of one or two sheets) ~ a 2 2 ~ + ~ = 1 (of two sheets) with

:0!I a

[o

A=

2

(3) (cones) ~

+

2

~

-

2

~

01 b! 0

0] 0

A~[~ 0] 0 0

2 trb -

A=

2

2

~ = 1 (of one sheet) or -~ c a -

[-:!I0 a

0

= 0 with A =

(4) (paraboloids; elliptic or hyperbolic) (hyperbolic) with

+

or

-1c

o1 b! o

[* X ~].

:0!I b!0 0] 0 a

[o

~+ ~

1

0

-&1

.

= ~ (elliptic) or ~ - ~ = ~, c > 1

or

01 0] -b! &? . 0

'Q! A= [ ~

0

9.1. Basic properties of quadratic forms

2

Figure 9.2. Ellipsoid: a~

2

2

2

+

trb

2

+

~ c

323

=1

x2

2

2

2

Figure 9.3. Hyperboloid of onesheet: ~ + trb - c ~ = 1;and of twosheets: -:;r - trb + ~ a a e

2

Figure 9.4. Cone: ~ a

+

2

2

trb - 5 c

=0

All of these cases are illustrated in Figures 9.2-9.5. Problem 9.1 Find the symmetricmatricesrepresenting the quadraticforms

4xj +

(1) 9x[ - xi + (2) XjX2 + XjX3 (3) X[ + xi -

+

xj -

6XjX2 - 8XjX3 + 2X2X3, X2x3, xl + 2xjX2 - lOxjX4 + 4X3X4.

=1

o

324

Chapter 9. Quadratic Forms

Figure 9.5. Elliptic paraboloid: c>O

~+ ~

=

~

and Hyperbolic paraboloid:

~

-

~

=

~,

9.2 Diagonalization of quadratic forms In this section, we discuss how to sketch the level surface of a quadratic equation on jRn. To do this for a quadratic equation f(x) x T Ax + b T X + c 0, we first T consider a special case of the type x Ax = c without a linear form .

=

=

A quadratic form on jRn without a linear form may be written as the sum of two parts: n

T

q(x) = x Ax = 'L,aiixl

+

;=1

2 'L,aijXiXj, i 0 b(Yi ,Yi)::SO

for i = 1, 2, fori=p'+l,

, p', and ,n.

= p', let U and W be subspaces

of V spanned by {Xl, " " x p } and 0 for any nonzero vector u E U and b(w, w) ::s 0 for any nonzero vector w E W by the bilinearity of b. Thus, W {OJ, and

To show p

{ypl+l," " Yn}, respectively. Then, b(u, u) >

un =

dim(U

+

W)

= dim U +

dim W - dim(U

n W) = p +

(n - p')

::s n,

or p ::s p', Similarly, one can show p' ::s p to conclude p = p', Therefore, any two diagonal matrix representations of b have the same number of positive diagonal entries. By considering the bilinear form -b instead of b, one can also have that any two diagonal matrix representations of b have the same number of negative diagonal entries . 0

9.7. Diagonalization of bilinear or Hermitian forms

347

Corollary 9.11 (1) Any two symmetricmatrices which are congruenthave the same inertia. (2) Any two Hermitian matrices which are Hermitian congruenthave the same inertia. Definition 9.11 Let A be a real symmetric or a Hermitian matrix. The number of positive eigenvalues of A is called the index of A. The difference between the number of positive eigenvalues and the number of negative eigenvalues of A is called the signature of A . Hence, the index and signature together with the rank of a symmetric or a Hermitian matrix are invariants under the congruence relation, and any two of these invariants determine the third: that is,

= =

the number of positive eigenvalues, the index the rank the index + the number of negative eigenvalues, the signature the index - the number of negative eigenvalues.

=

We have shown the necessary condition of the following corollary.

Corollary 9.12 (1) Two symmetric matrices are congruent if and only if they have the same invariants; index, signature and rank. (2) Two Hermitian matrices are Hermitian congruent if and only if they have the same invariants. Proof: We only prove (I). Suppose that two symmetric matrices A and B have the same invariants, and let D and E be diagonal matrices congruent to A and B, respectively. Without loss of generality, one may choose D and E so that the diagonal entries are in the order of positive, negative and zero. Let p and r denote the index and the rank, respectively, of both D and E . Let d; denote the i-th diagonal entry of D. Define the diagonal matrix Q whose i-th diagonal entry q; is given by qi =

j

l/.Jdi ~/J-dj

iflSiSp ifp2

X2Y2

i].

2 1 -6

358

Chapter 9. Quadratic Forms (4) b(x, y)

= XIY2 -

X2YI

=

9.13. For a bilinear form on]R2 defined by b(x , y) XI YI +X2Y2 , find the matrix representation of b with respect to each of the following bases :

= {(I, 0),

a

(0, I)},

{3

= {(I ,

-1) , (1, I)},

= {(I,

y

2), (3, 4)}.

9.14. Which one of the following bilinear forms on lR3 are symmetric or skew-symmetric? For each symmetric one, find its matrix representation of the diagonal form , and for each skew-symmetric one, find its matrix representation of the block form in Theorem 9.7.

= XI Y3 + X3YI y) = xI YI + 2xIY3 + 2x3YI - X2Y2 y) = xIY2 + 2XIY3 - x2Y3 - x2YI -

(1) b(x, y) (2) b(x, (3) b(x,

(4) b(x, y)

= rL=1 (i -

2X3YI

+ x3Y2

j)Xj Yj

9.15. Determine whether each of the following matrices takes a local minimum, local maximum or saddle point at the given point:

= -1

+

+4(eX -x) - 5xsiny 6y 2 atthepoint (x , y) (2) f(x , y)=(x 2-2x)cosyat(x, y)=(1, zr). (1) f(x , y)

= (0,

0);

9.16. Show that the quadratic form q(x) = 2x 2 + 4xy + y 2 has a saddle point at the origin, despite the fact that its coefficients are positive . Show that q can be written as the difference of two perfect squares. 9.17. Find the eigenvalues of the following matrices and the maximum value of the associated quadratic forms on the unit sphere . (1)

[-~

o

.,

~ ],

[-~ -~ ~ ] ,

(2)

1 -1

0

9.18. A bilinear form b : V x W if it satisfies b(v , w) b(v , w)

~

(3)

1 -2

[-~ -~ ~] . 0

0 5

lRon vector spaces V and W is said to be nondegenerate

=0

=0

for all w E W

implies v

for all v E V

implies w

= 0,

= O.

and

As an example, an inner product on a vector space V is just a symmetric, nondegenerate bilinear form on V. Let b : V x W ~ lR be a nondegenerate bilinear form. For a fixed WE W , we define f{Jw : V ~ R by f{Jw(v)

= b(v,

w)

for v E V.

Then, the bilinearity of b proves that f{Jw E V*, from which we obtain a linear transformation rp : W ~ V* defined by rp(w) = f{Jw . Similarly, we can have a linear transformation 1/1 : V ~ W* defined by 1/I(v)(W)

= b(v,

Prove the following statements:

w)

for v E V and WE W.

9.9. Exercises

359

(1) If b : V x W -+ JR is a nondegenerate bilinear form, then the linear transformations rp : W -+ V* and 1/1 : V -+ W* are isomorphisms. (2) If there exists a nondegenerate bilinear form b : V x W -+ JR, then dim V

= dim W .

9.19. Determine whether the following statements are true or false, in general, and justify your answers . (1) For any quadratic form q on JRn , there exists a basis ex for JRn with respect to which the matrix representation of q is diagonal. (2) Any two matrix representations of a quadratic form have the same inertia. (3) If A is positive definite symmetric matrix, then every square submatrix of A has positive determinant. (4) If A is negative definite, det A < O. (5) The sum of a positive definite quadratic form and a negative definite quadratic form is indefinite. (6) If A is a real symmetric positive definite matrix, then the solution set of x T Ax is an ellipsoid. (7) For any nontrivial bilinear form b v =0.

i=

0 on a vector space V, if b(v, v)

=I

= 0, then

(8) Any symmetric matrix is congruent to a diagonal matrix. (9) Any two congruent matrices have the same eigenvalues. (10) Any two congruent matrices have the same determinant. (11) The sum of two bilinear forms on V is also a bilinear form. (12) Any matrix representation of a bilinear form is diagonalizable. (13) If a real symmetric matrix A is both positive semidefinite and negative semidefinite, then A must be the zero matrix. (14) Any two similar real symmetric matrices have the same signature .

Selected Answers and Hints

Chapter1 Problems 1.2 (1) Inconsistent. (2) (XI, Xz, x3, X4) = (-1- 4t, 6 - 21, 2 - 31, 1) for any 1 E JR. 1.3 (1) (x , y, z) 1.4 (1) bl

1.7 a

+

= (1, -1 , 1). (3) (w,

x , y, z)

bZ - b3 = O. (2) For any bj's.

= -¥, b = ¥, c = .!j,

1.9 Consider the matrices : A

d

= (2, 0,

= -4.

= [; :

l

B

= [; ~

1, 3)

J.

C

= [~ ~

J-

1.10 Compare the diagonal entries of AA T and AT A.

1.12 (1) Infinitely many for a = 4, exactly one for a f= ±4, and none for a = -4. (2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise. 1.14 (3) I

= IT = (AA-I)T = (A-I)T AT means by definition (AT)-I = (A-I)T.

1.16 Use Problem 1.14(3). 1 .17 Any permutation on n objects can be obtained by taking a finite number of interchangings of two objects . 1.21 Consider the case that some dj is zero.

1.22

X

= 2, y = 3, Z = 1.

1.23 True if Ax

= b is consistent, but not true in general.

=[ -~ o

~ ~],

[b -~ -~ ].

U= -I I 0 0 1.25 (1) Consider (i, j)-entries of AB for i < j .

1.24 L

I

(2) A can be written as a product of lower triangular elementary matrices .

362

Selected Answers and Hints

1.26 L=

1 01 0] 0 [ -1/2 o -2/3 1

,D=

[20 3~2 ~ 0

o

] ,

4/3

u= [~

0

-~/2 -~/3] . 0

1

1.27 There are four possibilities for P.

= 0.5, 12 = 6, 13 = 0.55. (2) II = 0, h = 13 = 1,14 = 15 = 5.

II

1.29 (1)

1.30 x = k

1.31 A

=

0.35 ] 0.40 for k > O. [ 0.25

0.0 0.1 0.8] [ 90 ] 0.4 0.7 0.1 with d = 10 . [ 0.5 0.0 0.1 30

Exercises 1.1 Row-echelon forms are A, B, D, F . Reduced row-echelon forms are A, B, F.

1 -3 2

1.2 (1)

[

1

2]

~

~ ~ -1/~ 3/~

o

0 0

0

.

0

1 -3 0 3/2 1/2] 0 0 1 -1/4 3/4 o 0 0 0 0 . [ o 0 0 0 0 1.4 (1) XI = 0, X2 = 1, X3 = -1, X4 = 2. (2) X

1.3 (1)

= 17/2, y = 3, Z = -4.

1.5 (1) and (2) . 1.6 For any bi'S. 1.7 bl - 2b2 + 5b3 i= O. 1.8 (1) Take x the transpose of each row vector of A. 1.10 Try it with several kinds of diagonal matrices for B.

1.11 Ak =

[

o

0

-2227

1

101] -60 . [ o 0 87 1.14 See Problem 1.9. 1.16 (1) A-lAB = B. (2) A-lAC 1.17 a = 0, c- I = b i= O. 1.13 (2)

5 0

1 2k 3k(k - 1) ] 1 3k .

0

8 A-I _ 11 . -

1.19 A-I

=C = A +

I.

1 -1 0 0] [13/8 0 1/2 -1/2 0 B-1 = -15/8 0 0 1/3 -1/3 ' [ o 0 0 1/4 5/4

= -Is

8 -23 -19 2] 4 .

[4

1

-2 1

Selected Answers and Hints

1.22 (I)x=A-Ib=

[-~~; -~~~ ~~~] [ ; ] = [-~~;]. -1/3 -2/3

1/3

7

- 5/ 3

1.23 (1) A =

[~ ~] [~ ~] [~

1.24 (1) A =

[~3 ~1 ~] [~~ ~ ] [~0 ~0 ~] , 1 0 0 -1 1

(2)

363

[b~a ~][ ~

Ii

d _ ~2/a ][

2]

= LDU ,

~ b/~

(2) L = A, D = U = I.

l

1.25 c=[2 -I3f,x=[423]T.

1.26 (2) A

=

[~1~1~]1[~0 0 ~ ~]2 [~ ~ 4/~] . 001

1.27 (1) (Ak)- I = (A-Ii . (2) An-I = 0 if A E Mnxn. (3) (l- A)(l + A + ... + Ak- I) = I - Ak. 1.28 (1) A =

[~ ~

1

(2) A =

r

l

AZ =

r

l

A= I.

1.29 Exactly seven of them are true. (1) See Theorem 1.9. :::

(2) Consider A =

[~ ~

r)

l

;:n::e~:=[1-1~ T]1f ~:B-: [? tB~=]~ABl = BT AT =BA . 5 3 1

1 3 2 (8) If A-I exists, A-I = A-I (AB T) = B T. (9) If AB has the (right) inverse C, then A-I = BC.

(7) (A-Il = (AT)-I = A-I.

(10) Con sider EI =

[~ ~] and EZ =

(12) Consider a permutation matrix

U~ l

[~ ~] .

Chapter 2 Problems 2.2 (1) Note: 2nd column - 1st column = yd column - 2nd column. 2.4 (1) -27, (2) 0, (3) (1 - x 4)3 . 2.7 Let a be a transposition in Sn. Then the composition of a with an even (odd) permutation in Sn is an odd (even, respectively) permutation. 2.9 (1) -14. (2) O. 2.10 (I)(y-x)(-x+z)(z-y)(w-x)(w-y)(w- z)(w+y+x+z) . (2) (fa - be + cd) (fa + cd - eb) . 2.11 A-I =

[-~

I =1];

2 -!

adjA=

!

[_~ -58 -52] -1

6

I

.

364

Selected Answers and Hints

2.12 If A = 0, then clearly adj A = O. Otherwise , use A . adj A = (det A)/. 2.13 Use adj A . adj(adj A) = det(adj A) I . 2.14 If A and B are invertible matrices , then (AB)-l = B-1 A-I . Since for any invertible

=

=

matrix A we have adj A (det A)A- 1, (1) and (2) are obvious. To show (3), let AB BA and A be invertible. Then A-I B = A- 1(BA)A- 1 = A- 1(AB)A-1 = BA- 1, which gives (3).

2.15 (1) xl = 4, X2 = 1, x3 = -2. 10 5 5 (2) x = 23' y = 6' z = 2' 2.16 The solution of the system id(x) = x is Xi = «Ji~Si

= det A.

Exercises 2.1 k = 0 or 2. 2.2 It is not necessary to compute A 2 or A 3 . 2.3 -37. 2.4 (1) det A = (_1)n-1 (n - 1). (2) O.

2.5 -2,0,1,4.

L

2.6 Consider

a1u(l) . .. anu(n)'

ueS.

2.7 (1) 1, (2) 24.

2.8 (3) Xl = 1, X2 = -1, x3 =2, X4 = - 2. 2.9 (2) x = (3,0, 4/11)T.

2.10 k=Oor±1.

2.11 x = (-5, 1,2, 3)T . 2.12 x = 3, y = -1, z = 2. 2.13 (3) All = - 2, A12 = 7, An = -8, A33 = 3. 2.16 A-I

= -h.

[~~ -~ 1~ ] 6

2.17 (1) adj A = [

A-I

14 -18

.

i =~ =~] ,

-4

7

5

det A = -7, det(adj A) = 49,

= -+adj A. (2) adj A = [-1~

~ -~],

7 -3 -1

det A = 2, det(adj A) = 4, A-I = !-adj A .

2.19 Multiply

[~ ~].

2.20 If we set A

= [; ~], then the area is 11 det AI = 4.

2.21 If we set A

=

[i ~ 1

then the area is

!.fl "'I(AT A)I ~ 1#.

SelectedAnswers and Hints 2.22 Use det A =

L

sgn(a)al 17(I)'"

a n17(n)'

17eS n

2.23 Exactly seven of them are true.

[~

(I) Consider A = [ ; ; ] and B =

;

(2) det(AB) = det A det B = det B det A .

[~ ~ ] .

l

(4) (cIn - A)T = cln - AT .

(3) Consider A = [ ; ; ] and c = 3. (5) Consider E =

365

(6) and (7) Compare their determinants.

(9) Find its counter example. (8) If A = h ? (10) What happened for A = O? (11) UyT = U[VI . .. vn] = [VIU' " vnu] ; det(uyT) = VI . . . Vn det([u ··· uJ) = O.

: : ::::, U[~(:' ~ 'J': o

A

: :l IT ~ dot A

0'

1 1 (16) At = A -I for any permutation matrix A.

Chapter 3 Problems 3.1 Check the commutativity in addition. 3.2 (2), (4).

3.3 (1), (2), (4). 3.5 See Problem 1.11. 3.6 Note that any vector yin W is of the form aIxI inU .

+

a2x2

+ ... +

amXm

which is a vector

3.7 Use Lemma 3.7(2) . 3.9 Use Theorem 3.6 3.10 Any basis for W must be a basis for V already, by Corollary 3.12. 3.11 (I) dim= n - 1, (2) dim= n(nt) , (3) dim= n(yll.

3.13 63a + 39b - 13c + 5d = O. 3.15 If bj , . .. , b n denote the column vectors of B, then AB = [Abl ... Ab n]. 3.16 Consider the matrix A from Example 3.20. 3.17 (I) rank = 3, nullity = 1. (2) rank = 2, nullity = 2.

3.18 Ax = b has a solution if and only if b E C(A) . 3.19 A-I(AB) = B implies rank B = rank A-1(AB)::: rank(AB) . 3.20 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have an invertible submatrix C of rank r. By (1) of the same theorem, the rank of C must be the largest. 3.22 dim(V

+

W) = 4 and dim(V

n W)

= 1.

366

Selected Answers and Hints

3.23 A basis for V is «1 ,0,0,0), (0, -1 , 1,0), (0, -1,0, I)}, for W : «-1 , 1,0,0), (0,0,2, I)}, and for V n W : «3, -3,2, I)}. Thus, dim(V

+

W) = 4 means V

3.26

+

W =]R4 and any basis for]R4 works for V

+

W.

1 0

A=

[

°1 °1 ° 1

Exercises 3.1 Consider 0(1, 1). 3.2 (5). 3.3 (2), (3). For (1), if f(O) = 1, 2f(O) = 2. 3.4 (1). 3.5 (1), (4). 3.6 tr(AB - BA)

3.7 No.

3.8 (1) p(x)

= 0.

= -PI (x) + 3P2(X) -

2P3(X).

3.9 No. 3.10 Linearly dependent.

3.12 No. 3.13 ((1,1,0) , (1,0, I)}. 3.14 2.

3

{}

OO }

.15 Consider (ej = ai i=l where ai =

{I°

if i = j , otherwise.

+ ... +cpAbs = A(qb l + . .. + cpb S ) implies qb l + .. . + cpb P = 0 since N (A) = 0, and this also implies Ci = for all i = 1, . . . , P since columns of Bare linearly independent. (2) B has a right inverse . (3) and (4) : Look at (1) and (2) above.

3.16 (1) 0 = qAb l

°

3.17 (1) {(-5, 3, I)}. (2) 3.

3.18 5f, and dependent. 3.19 (1) 'R,(A) = (1,2,0,3) , (0,0,1 ,2»), C(A) = (5,0,1), (0,5 ,2»), N(A) = (-2, 1,0,0), (-3,0, -2, 1») . (2) 'R,(B) (1 ,1 , -2,2), (0,2,1, -5), (0,0,0, I»), C(B) ((1, -2,0), (0,1,1), (0,0,1»), N(B) = (5, -1,2,0»).

=

=

3.20 rank = 2 when x = -3, rank = 3 when x

=f. - 3.

3.22 See Exercise 2.23: Each column vector of UyT is of the form ViU , that is, U spans the column space . Conversely, if A is of rank I, then the column space is spanned by anyone column of A, say the first column u of A, and the remaining columns are of the form ViU, i = 2, . .. , n . Take v = [1 V2 ... vnf . Then one can easily see that A = UyT. 3.23 Four of them are true. (1) A - A =1 or 2A = 1 (2) In]R2 , let a = [ej , e2} and

f3

= [ej , -e2}.

Selected Answers and Hints

367

(3) Even U = W, (X n fJ can be an empty set. (4) How can you find a basis for C(A). See Example 3.20. . (5) Consider A = [0 1 0 0 ] and B = [ 1 0 0 0] . (6) See Theorem 3.24. (7) See Theorem 3.25. (8) Ifx (9) In]R2, Consider U ]R2 x 0 and V 0 x ]R2. (10) Note dim C(A T ) dim'R(A) dim C(A). (II) By the fundamental Theorem and the Rank Theorem.

= =

=

=

= -s.

Chapter 4 Problems 4.1

[~ ~

l

since it is simply the change of coordinates x and y .

4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with I at the (i, j)-th position and 0 at others . Let Fk be the matrix with 1 at the (k, k) -th position, -I at the (n, n)-thposition and 0 at others. Then theset{Eij, Fk : I :::: i f= j :::: n, k = I, ... , n-I} is a basis for W . Thus dim W n2 - 1. 4.3

4.4 4.5

= tr(AB) = I:i=1 I:k=1 aikbki = I:k=1 I:i=1 bkiaik = tr(BA) . If yes, (2, I) = T(-6 , -2, 0) = -2T(3 , I, 0) = (-2, -2). If aivi + a2v2 + . ..+ akvk = 0, then o = T(al VI + a2v2 + .., + akvk) = al WI + a2w2 + . .. + akwk implies ai = 0 for i=I , ... ,k.

4.6 (I) If T(x)

= T(y) , then S

0

T(x)

= So T(y) implies x = y. (4) They are invertible.

4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T) . (2) Let{VI, . . . , vn} be a basis for V. 1fT is one-to-one, then the set{T(vI), . .. , T(vn )} is linearly independent as the proof of Theorem 4.7 shows. Corollary 3.12 shows it is a basis for V . Thus, for any y E V, we can write it as y I:I=I aiT(vi) T(I:I=I aivi) . Set x = I:I=I aivi E V . Then clearly T(x) = y so that T is onto. If T is onto, then for each i I , .. . , n there exists xi E V such that T(Xi) Vi.Then the set Ixj , ... ,xn} is linearly 0, then 0 T(I:I=I aixi) I:I=I aiT(xi) = independent in V,since, ifI:l=l aixi I:I=I aivi implies ai = 0 for all i = I, . . . ,n. Thus it is a basis by Corollary 3.12 again. 0 for x I:I=I aixi E V , then 0 T(x) I:I=I aiT(xi) I:I=I aivi If T(x) implies ai 0 for all i I , . . . , n, that is x O.Thus Ker (T) {O}.

=

=

=

=

=

=

=

=

4.8 Use rotation R!f andrefiection

=

=

=

4

4.11

=i ~]'[Tlll=[~ -~ 7 0

[Tl~ = [~o ~2 -~3 4~] .

=

=

[~ _~] about the x-axis.

4.9 (1) (5, 2, 3). (2) (2, 3, 0). 4.10 (I)[Tla=[;

=

4 -3

245 ].

=

=

368

413

4.14

[S+Tla~

[Sl~ = [~o - 0~ ~] , [Tl a = [~ ~ ~] . 1 0 0 4

4.15 (2) 4.16

[T]~ = [~ ~] [T -1l p = [ - ~ ~

[idlp=~[~ 2

4.17

U~ n[TaS]a~ U~ n

Selected Answers and Hints

2

-; 1

l

-~]'[idl~=[-; -~ -1~] . 1

1

1

-2

[T]a=[~1 -~0 4~]'[T1P=[-~1 -~1 -~] . 5

4.18 Write B = Q-I AQ with some invertible matrix Q. (1) det B = det(Q-I AQ) = det Q-I det Adet Q = det A. (2) tr(B) = tr(Q-I AQ) tr(QQ-I A) = tr(A) (see Problem 4.3). (3) Use Problem 3.19.

4.20 a* = {fI (x,

y, z)

= x-

!y, !2(x, y, z)

= !y, f3(X, y, z) =

=

-x + z},

4.24 By BA, we get a tilting along the x-axis; while by BT A, one can get a tilting along the y-axis .

Exercises 4.1 (2). 4.2 ax 3 + bx 2 + ax + c. 4.4 S is linear because the integration satisfies the linearity.

4.5 (1) Consider the decomposition ofv = v+I(V)

4.6 (1) {(x, ~x, 2x)

E

lR3

:

+

v-I(v).

x E R}.

4.7 (2) T-1(r, s, t) = (! r, 2r - s, 7r - 3s - r) , 4.8 (1) Since T 0 S is one-to-one from V into V, T 0 S is also onto and so T is onto . Moreover, if S(u) = S(v), then T 0 S(u) = T 0 S(v) implies u = v. Thus, S is one-to-one, and so onto . This implies T is one-to-one. In fact, if T(u) = T(v), then there exist x and y such that S(x) = u and S(y) = v. Thus T 0 S(x) = T 0 S(y) implies x = y and so u = T(x) = T(y) =v.

4.9 Note that T cannot be one-to-one and S cannot be onto. 4.12 vol(T(C» = Idet(A)lvo1(C), for the matrix representation A of T.

4.13 (3) (5, 2, 0). 5 4 -4 -3 4.14 0 0 [ o 0

=~o -:~].

-1/3 ] 4.15 (1) [ -5/3 2/3 1/3 .

1

4.16 (1)

[~

_

i l [i (2)

4.17 (1) T(l , 0, 0) (2) T(x, y, z) 4.18

Selected Answers and Hints

1

-~

= (4, 0), T(1, I, 0) = (1, 3), T(1 , = (4x - 2y + z, Y+ 2z) .

(1)[~001 ~ ;] '(4)[~0 0~ 0~] .

~1' h (I)

U~: ~n(2)

Q

~

[:

I , I)

i iJ ~

369

= (4, 3).

tr:' ,

4.20 Compute the trace of AB - BA . 4.21 (1)

[-7 -13] 4

-33 19

8'

-2/3 1/3 4/3] 2/3 -1/3 -1/3 . [ 7/3 -2/3 -8/3 4.23 ' All represents reflection in a line at angle ()/2 to the x -axis . And , any two such reflections 4.22 (2) [~ ;], (4)

are similar (by a rotation).

4.25 Compute their determinants. 4.27 [T]a

= [-~

~ ~] =

([T*Ja*)T .

1 0 1

4.28 (1)

4J9

[_~ ~ ~ l(2)[T]~ =

:~):Il:::

4.31 PI (x)

I , 0, I),

=1 + x -

0,

[-i ; -~ 1

I, I , I) , 14,2,2,3»,

~x2, P2(x)

= -i +

[T)~~ ~1 ~ ~n

~x2, P3(X)

[

= -j. + x -

~x2 .

4.32 Five of them are false.

(1) Consider T : jR2 -+ jR3 defined by T(x , y) = (x, 0, 0) . (2) Note dim Ker(T) + dim 1m(T) = dim jRn = n ,

(3) dim Ker(T) = dim N([T]~). (4) dim Im(T) dimC([T]a) dimjR([TJa). (5) Ker(T) CKer(S 0 T) . (6) P (x) = 2 is not linear. (7) Use the definition of a linear transformation. (8) and (10) See Remark (1) in Section 4.3. (9) T : jRn -+ jRn is one-to-one iff T is an isomorphism. See Remark in Section 4.4 and Theorem 4.14. (11) By Definition 4.5 and Theorem 4.14. (12) Cf. Theorem 4.17. (13) det(A + B) i= det(A) + det(B) in general. (14) T(O) i= (0) in general for a translation T.

=

=

370

Selected Answers and Hints Chapter 5 Problems

5.1 Let (x, y) = XT Ay be an inner product. Then, for x = (1,0), (x, x) = aXIYI + C(XI Y2 + X2YI) + bX2Y2 > 0 implies a > O. Similarly, for any x (x, 1), (x, x) > 0 implies ab - c2 > O.

=

5.2 (x, y)2 = (x , x) (y, y) if and only if [rx + ylj2 = (x, x)t 2 + 2(x, y)t + (y, y) = 0 has a repeated real root to. 5.3 If dl < 0, then for x

= (1,

0, 0), (x, x)

= dl

< 0: Impossible.

5.4 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality. 5.5 (4) Use Problem 5.4(4): triangle inequality in the length.

fJ

5.6 (f, g) = f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy-Schwarz inequality.

fJ

5.7 (2)-(3): Use f(x)g(x) dx = 0 if k f. i; and = ~ if k = 5.8 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.

=

e.

=

5.10 Clearly, Xl (1,0,1), x2 (0,1,2) are in W. The Gram-Schmidt orthogonalization gives UI = = (1,0,1), u2 = JJ(-I, 1, 1) which form a basis.

W

5.11 {I, J3(2x - 1), V5(6x 2

-

6x + I)}.

5.12 (4) Im(idv - T) ~ Ker(T) because T(idV - T)(x) = T(x) - T 2(x) = O. Im(idVT) 2 Ker(T) because if T(x) = 0, x = x - 0 = X - T(x) = (idv - T)x.

= 0 for only x = O. (2) Use Definition 5.6(2).

5.13 (1) (x , x)

5.16 1) is just the definition, and use (1) to prove (2).

-~+x.

5.17

5.18 (1) b E C(A) and y E N(A T) . 5.19 The null space of the matrix

[b -i

_~

i]

is

x=t[I - I l O]T +s[-4IOIffort,sElR.

5.20 Note: R(A)l.

= N(A).

5.22 There are 4 rotations and 4 reflections. 5.23 (1) r =

~,

s=

~,

a = JJ ' b = - JJ '

C

= - JJ.

5.24 Extend {VI , ... , Vm} to an orthonormal basis {Vb " " vm, ... , vn}. Then IIxll 2 El=ll(x, vi)12 + E}=m+ll(x, Vj)l2. 5.25 (1) orthogonal. (2) not orthogonal. 5.26 x

= (1,

~~ ]

-1 , 0) + t(2, 1, -1) for any number t .

=x=

(AT A)-IATb=

5.30 For x E IRm, x

= (VI, X)VI + . . . +

5.28 [

=

'!g

[~~3:] . 16.1

(vm, x)vm = (VI vf)x + ... + (Vmv~)x.

5.31 The line is a subspace with an orthonormal basis ~ (1, 1), or is the column space of

A=~[~l

Selected Answers and Hints

5.32 P

=~ [

371

7 ; -~ ] .

3

1 - I

2

5.33 Note that (e) , e2, ea} is an orthonormal basis for the subspace. 5.35 Hint: First, show that P is symmetric. Exercises 5.1 Inner products are (2) , (4), (5).

5.2 For the last condition of the definition, note that (A , A) if and only if aij = 0 for all t, j .

at =

0

= 3.

5.4 (I)k 5.5 (3)

= tr(A T A) = Lj,i

= IIgll =../f7'1., The angle is 0 ifn = m, ~ ifn i: m .

11/11

5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x y = (1, .. . , 1) in (R", .).

JT97J. = h(% + ~ +

5.7 (1) 37/4,

(2) If (h , g)

and

= 0 with h i: 0 a constant and g(x) = ax 2 + bx + c, %+ ~ + c = 0 in jR3.

c)

then (a , b, c) is on the plane

5.9 Hint: For A

= (a), . . . , an)

= [V) V2], two columns are linearly independent, and its column space

is W.

3

1

5.11 (I ) ZV2, (2) ZV2. 5.13 Orthogonal: (4). Nonorthogonal: (I ), (2), (3). 5.17 Use induction on n. Let B be the matrix A with the first column c) replaced by c = c ) - Projw (C) , and write Projw(c) = a2c2 + ... +ancn for some c. ts. Show that Jdet (AT A )

= J det( B T B ) = Ilcllvol(c2, . . . , cn) = vol(P(A».

=

Then the volume of the tetrahedron is

5.18 Let A

[~ ! ~] .

tJ det(A T A ) = 1.

012

= det A imply det A = ±1. Th e matrix . A = [cos sin e . h i WIt ' h det A = - I . sin e e _ cos e ] ISort ogona

5.19 A T A

5.21 Ax

=I

and det AT

= b has a solution for every b E jRm if r = m. It has infinitely many solutions if

nullity

=n - r =n -

5.22 Fiod a least squares

.

my

m > O.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF