Cahill-Physical Mathematics What Physicists and Engineers Need to Know
February 20, 2017 | Author: rangvanrang | Category: N/A
Short Description
Download Cahill-Physical Mathematics What Physicists and Engineers Need to Know...
Description
Physical Mathematics: What Physicists and Engineers Need to Know Kevin Cahill Department of Physics and Astronomy University of New Mexico, Albuquerque, NM 87131-1156
i
c Copyright 2004–2010 Kevin Cahill
ii
For Marie, Mike, Sean, Peter, Mia, and James, and in honor of Muntader al-Zaidi and Julian Assange.
Contents
Preface 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Linear Algebra Fourier Series Fourier and Laplace Transforms Infinite Series Complex-Variable Theory Differential Equations Integral Equations Legendre Polynomials Bessel Functions Group Theory Tensors and Local Symmetries Forms Probability and Statistics Monte Carlos Chaos and Fractals Functional Derivatives Path Integrals The Renormalization Group Finance Strings
Preface
A word to students: You will find lots of physical examples crammed in amongst the mathematics of this book. Don’t let them bother you. As you master the mathematics, you will learn some of the physics by osmosis, just as people learn a foreign language by living in a foreign country. This book has two goals. One is to teach mathematics in the context of physics. Students of physics and engineering can learn both physics and mathematics when they study mathematics with the help of physical examples and problems. The other goal is to explain succinctly those concepts of mathematics that are simple and that help one understand physics. Linear dependence and analyticity are simple and helpful. Elaborate convergence tests for infinite series and exhaustive lists of the properties of special functions are not. This mathematical triage does not always work: Whitney’s embedding theorem is helpful but not simple. The book is intended to support a one- or two-semester course for graduate students and advanced undergraduates. One could teach the first seven, eight, or nine chapters in the first semester, and the other chapters in the second semester. Several friends and colleagues, especially Bernard Becker, Steven Boyd, Robert Burckel, Colston Chandler, Vageli Coutsias, David Dunlap, Daniel Finley, Franco Giuliani, Igor Gorelov, Dinesh Loomba, Michael Malik, Sudhakar Prasad, Randy Reeder, and Dmitri Sergatskov have given me valuable advice. The students in the courses in which I have developed this book have improved it by asking questions, contributing ideas, suggesting topics, and correcting mistakes. I am particularly grateful to Mss. Marie Cahill and Toby Tolley and to Messrs. Chris Cesare, Robert Cordwell, Amo-Kwao Godwin, Aram Gragossian, Aaron Hankin, Tyler Keating, Joshua Koch, Akash Rakholia, Ravi Raghunathan, Akash Rakholia, and Daniel Young for
Preface
v
ideas and questions and to Mss. Tiffany Hayes and Sheng Liu and Messrs. Thomas Beechem, Charles Cherqui, Aaron Hankin, Ben Oliker, Boleszek Osinski, Ravi Raghunathan, Christopher Vergien, Zhou Yang, Daniel Zirzow, for pointing out several typos.
1 Linear Algebra
1.1 Numbers The natural numbers are the positive integers, with or without zero. Rational numbers are ratios of integers. An irrational number x is one whose decimal digits dn ∞ X dn x= 10n n=m
(1.1)
x
do not repeat. Thus, the repeating decimals 1/2 = 0.50000 . . . and 1/3 = 0.¯ 3 ≡ 0.33333 . . . are rational, while π = 3.141592654 . . . is not. Incidentally, decimal arithemetic was invented in India over 1500 years ago but was not widely adopted in the Europe until the seventeenth century. The real numbers R include the rational numbers and the irrational numbers; they correspond to all the points on an infinite line called the real line. The complex numbers C are the real numbers with one new number i whose square is −1. A complex number z is a linear combination of a real number x and a real multiple iy of i z = x + iy.
(1.2)
Here x = Rez is said to be the real part of z, and y the imaginary part y = Imz. One adds complex numbers by adding their real and imaginary parts z1 + z2 = x1 + iy1 + x2 + iy2 = x1 + x2 + i(y1 + y2 ).
(1.3)
Since i2 = −1, the product of two complex numbers is z1 z2 = (x1 + iy1 )(x2 + iy2 ) = x1 x2 − y1 y2 + i(x1 y2 + y1 x2 ).
(1.4)
2
Linear Algebra
The polar representation z = r exp(iθ) of a complex number z = x + iy is z = x + iy = reiθ = r(cos θ + i sin θ)
(1.5)
in which r is the modulus of z r = |z| =
p x2 + y 2
(1.6)
and θ is its argument θ = arctan (y/x).
(1.7)
Since exp(2πi) = 1, there is an inevitable ambiguity in the definition of the argument of any complex number: the argument θ + 2πn gives the same z as θ. There are two common notations z ∗ and z¯ for the complex conjugate of a complex number z = x + iy z ∗ = z¯ = x − iy.
(1.8)
The square of the modulus of a complex number z = x + iy is |z|2 = x2 + y 2 = (x + iy)(x − iy) = z¯z = z ∗ z.
(1.9)
The inverse of a complex number z = x + iy is z −1 = (x + iy)−1 =
x − iy x − iy z∗ z∗ = 2 = = . (x − iy)(x + iy) x + y2 z∗z |z|2
(1.10)
Grassmann numbers θi are anti-commuting numbers, i.e., the anticommutator of any two Grassmann numbers vanishes {θi , θj } ≡ [θi , θj ]+ ≡ θi θj + θj θi = 0.
(1.11)
In particular, the square of any Grassmann number is zero θi2 = 0.
(1.12)
One may show that any power series in N Grassmann numbers θi is a polynomial whose highest term is proportional to the product θ1 θ2 . . . θN . For instance, the most complicated power series in two Grassmann numbers f (θ1 , θ2 ) =
∞ X ∞ X
fnm θ1n θ2m
(1.13)
n=0 m=0
is just f (θ1 , θ2 ) = f0 + f1 θ1 + f2 θ2 + f12 θ1 θ2 .
(1.14)
1.2 Arrays
3
1.2 Arrays An array is an ordered set of numbers. Arrays play big roles in computer science, physics, and mathematics. They can be of any (integral) dimension. A one-dimensional array (a1 , a2 , . . . , an ) is variously called an n-tuple, a row vector when written horizontally, a column vector when written vertically, or an n-vector. The numbers ak are its entries or components. A two-dimensional array aik with i running from 1 to n and k from 1 to m is an n × m matrix. The numbers aik are called its entries, elements, or matrix elements. One can think of a matrix as a stack of row vectors or as a queue of column vectors. The entry aik is in the ith row and kth column. One can add together arrays of the same dimension and shape by adding their entries. Two n-tuples add as (a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn )
(1.15)
and two n × m matrices a and b add as (a + b)ik = aik + bik .
(1.16)
One can multiply arrays by numbers: Thus z times the three-dimensional array aijk is the array with entries zaijk . One can multiply two arrays together no matter what their shapes and dimensions. The outer product of an n-tuple a and an m-tuple b is an n × m matrix with elements (ab)ik = ai bk
(1.17)
or an m × n matrix with entries (ba)ki = bk ai . If a and b are complex, then one also can form the outer products (a b)ik = ai bk
(1.18)
and (b a)ki = bk ai . The (outer) product of a matrix aik and a threedimensional array bj`m is a five-dimensional array (ab)ikj`m = aik bj`m .
(1.19)
An inner product is possible when two arrays are of the same size in one of their dimensions. Thus the inner product (a, b) ≡ ha|bi or dot product a · b of two real n-tuples a and b is (a, b) = ha|bi = a · b = (a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn . (1.20)
4
Linear Algebra
The inner product of two complex n-tuples is defined as (a, b) = ha|bi = a · b = (a1 , . . . , an ) · (b1 , . . . , bn ) = a1 b1 + · · · + an bn (1.21) or as its complex conjugate (a, b)∗ = ha|bi∗ = (a · b)∗ = (b, a) = hb|ai = b · a
(1.22)
so that (a, a) ≥ 0. The product of an m × n matrix aik times an n-tuple bk is the m-tuple b0 whose ith component is b0i
= ai1 b1 + ai2 b2 + · · · + ain bn =
n X
aik bk
(1.23)
k=1
or simply b0 = a b in matrix notation. If the size of the second dimension of a matrix a matches that of the first dimension of a matrix b, then their product ab is the matrix with entries (ab)i` = ai1 b1` + · · · + ain bn` .
(1.24)
1.3 Matrices Apart from n-tuples, the most important arrays in linear algebra are the two-dimensional arrays called matrices. The trace of an n × n matrix a is the sum of its diagonal elements Tr a = tr a = a11 + a22 + · · · + ann =
N X
aii .
(1.25)
i=1
The trace of two matrices is independent of their order Tr (a b) =
n X n X
aik bki =
i=1 k=1
n X n X
bki aik = Tr (ba) .
(1.26)
k=1 i=1
It follows that the trace is cyclic Tr (a b . . . z) = Tr (b . . . z a) .
(1.27)
(Here we take for granted that the elements of these matrices are ordinary numbers that commute with each other.) The transpose of an n × ` matrix a is the ` × n matrix aT with entries aT ij = aji . (1.28)
1.3 Matrices
5
Some mathematicians use a prime to mean transpose, as in a0 = aT , but physicists tend to use a prime to mean different. One may show that (a b) T = bT aT .
(1.29)
A matrix that is equal to its transpose a = aT
(1.30)
is symmetric. The (hermitian) adjoint of a matrix is the complex conjugate of its transpose (Charles Hermite, 1822–1901). That is, the (hermitian) adjoint a† of an N × L complex matrix a is the L × N matrix with entries (a† )ij = (aji )∗ = a∗ji .
(1.31)
(a b)† = b† a† .
(1.32)
One may show that
A matrix that is equal to its adjoint (a† )ij = (aji )∗ = a∗ji = aij
(1.33)
(and which therefore must be a square matrix) is said to be hermitian or self adjoint a = a† . Example: The three Pauli matrices 0 1 0 −i σ1 = , σ2 = , 1 0 i 0
(1.34)
1 0 and σ3 = 0 −1
(1.35)
are all hermitian (Wolfgang Pauli, 1900–1958). A real hermitian matrix is symmetric. If a matrix a is hermitian, then the quadratic form hv|a|vi =
n X N X
vi∗ aij vj ∈ R
(1.36)
i=1 j=1
is real for all complex n-tuples v. The Kronecker delta δik is defined to be unity if i = k and zero if i 6= k (Leopold Kronecker, 1823–1891). In terms of it, the n × n identity matrix I is the matrix with entries Iik = δik . The inverse a−1 of an n × n matrix a is a matrix that satisfies a−1 a = a a−1 = I
(1.37)
6
Linear Algebra
in which I is the n × n identity matrix. So far we have been writing n-tuples and matrices and their elements with lower-case letters. It is equally common to use capital letters, and we will do so for the rest of this section. A matrix U whose adjoint U † is its inverse U †U = U U † = I
(1.38)
is unitary. Unitary matrices are square. A real unitary matrix O is orthogonal and obeys the rule OT O = OOT = I.
(1.39)
Orthogonal matrices are square. An N × N hermitian matrix A is said to be non-negative A≥0
(1.40)
if for all complex vectors V the quadratic form hV |A|V i =
N X N X
Vi∗ Aij Vj ≥ 0
(1.41)
i=1 j=1
is non-negative. A similar rule hV |A|V i > 0
(1.42)
for all |V i defines a positive or a positive-definite matrix (A > 0), although people often use these terms to describe non-negative matrices. Examples: The non-symmetric, non-hermitian 2 × 2 matrix 1 1 (1.43) −1 1 is positive on the space of all real 2-vectors but not on the space of all complex 2-vectors. The 2 × 2 matrix 0 −1 (1.44) 1 0 provides a representation of i since 0 −1 0 −1 −1 0 = = −I. 1 0 1 0 0 −1
(1.45)
The 2 × 2 matrix 0 1 0 0
(1.46)
7
1.4 Vectors
provides a representation of a Grassmann number since 0 1 0 1 0 0 = = 0. 0 0 0 0 0 0 To represent two Grassmann numbers one needs 4 0 0 1 0 0 0 0 0 0 0 θ1 = 0 0 0 1 and θ2 = 1
(1.47)
× 4 matrices, such as 0 0 0 0 0 0 . (1.48) 0 0 0
0 −1 0 0
0 0 0 0
1.4 Vectors Vectors are things that can be multiplied by numbers and added together to form other vectors in the same vector space. So if U and V are vectors in a vector space S over a set F of numbers and x and y are numbers in F , then W = xU + yV
(1.49)
is also a vector in the vector space S. A basis for a vector space S is a set of vectors Bk for k = 1 . . . N in terms of which every vector U in S can be expressed as a linear combination U = u1 B1 + u2 B2 + · · · + uN BN
(1.50)
with numbers uk in F . These numbers uk are the components of the vector U in the basis Bk . Example: Suppose the vector W represents a certain kind of washer and the vector N represents a certain kind of nail. Then if n and m are natural numbers, the vector H = nW + mN
(1.51)
would represent a possible inventory of a very simple hardware store. The vector space of all such vectors H would include all possible inventories of the store. That space is a two-dimensional vector space over the natural numbers, and the two vectors W and N form a basis for it. Example: The complex numbers are a vector space. Two of its vectors are the number 1 and the number i; the vector space of complex numbers is then the set of all linear combinations z = x1 + yi = x + iy.
(1.52)
8
Linear Algebra
So the complex numbers form a two-dimensional vector space over the real numbers, and the vectors 1 and i form a basis for it. The complex numbers also form a one-dimensional vector space over the complex numbers. Here any non-zero real or complex number, for instance the number 1 can be a basis consisting of the single vector 1. This onedimensional vector space is the set of all z = z1 for arbitrary complex z. Example: Ordinary flat two-dimensional space is the set of all linear combinations r = xˆ x + yˆ y
(1.53)
ˆ and y ˆ are perpendicular vectors of in which x and y are real numbers and x unit length (unit vectors). This vector space, called R2 , is a 2-d space over the reals. Note that the same vector r can be described either by the basis vectors x ˆ and y ˆ or by any other set of basis vectors, such as −ˆ y and x ˆ r = xˆ x + yˆ y = −y(−ˆ y ) + xˆ x.
(1.54)
So the components of the vector r are (x, y) in the {ˆ x, y ˆ} basis and (−y, x) in the {−ˆ y, x ˆ} basis. Each vector is unique, but its components depend upon the basis. Example: Ordinary flat three-dimensional space is the set of all linear combinations r = xˆ x + yˆ y + zˆ z
(1.55)
in which x, y, and z are real numbers. It is a 3-d space over the reals. Example: Arrays of a given dimension and size can be added and multiplied by numbers, and so they form a vector space. For instance, all complex three-dimensional arrays aijk in which 1 ≤ i ≤ 3, 1 ≤ j ≤ 4, and 1 ≤ k ≤ 5 form a vector space over the complex numbers. Example: Derivatives are vectors, so are partial derivatives. For instance, the linear combinations of x and y partial derivatives taken at x=y=0 ∂ ∂ a +b (1.56) ∂x ∂y form a vector space. Example: The space of all linear combinations of a set of functions fi (x) defined on an interval [a, b] X f (x) = zi fi (x) (1.57) i
9
1.5 Linear Operators
is a vector space over the space of the numbers {zi }. Example: In quantum mechanics, a state is represented by a vector, often written as ψ or in Dirac’s notation as |ψi. If c1 and c2 are complex numbers, and |ψ1 i and |ψ2 i are any two states, then the linear combination |ψi = c1 |ψ1 i + c2 |ψ2 i
(1.58)
also is a possible state of the system.
1.5 Linear Operators A linear operator A is a map that takes any vector U in its domain into another vector U 0 = A(U ) ≡ AU in a way that is linear. So if U and V are two vectors in the domain of the linear operator A and b and c are two real or complex numbers, then A(bU + cV ) = bA(U ) + cA(V ) = bAU + cAV.
(1.59)
In the most important case, the operator A maps vectors in a vector space S into vectors in the same space S. In this case, A maps each basis vector Bi for the space S into a linear combination of these basis vectors Bk ABi = a1i B1 + a2i B2 + · · · + aN i BN =
N X
aki Bk .
(1.60)
k=1
The square matrix aki represents the linear operator A in the Bk basis. The effect of A on any vector U = u1 B1 + u2 B2 + · · · + uN BN in S then is AU = A(
N X
ui Bi ) =
i=1
=
ui ABi =
i=1
N N X X k=1
N X
N X
ui
i=1
N X
aki Bk
k=1
! aki ui
Bk .
(1.61)
i=1
Thus the kth component u0k of the vector U 0 = AU is u0k = ak1 u1 + ak2 u2 + · · · + akN uN =
N X
aki ui .
(1.62)
i=1
Thus the column vector u0 of the components u0k of the vector U 0 = AU is the product u0 = au of the matrix with elements aki that represents the linear operator A in the Bk basis with the column vector with components
10
Linear Algebra
ui that represents the vector U in that basis. So in a given basis, vectors and linear operators can be identified with column vectors and matrices. Each linear operator is unique, but its matrix depends upon the basis. Suppose we change from the Bk basis to another basis Bk0 Bk =
N X
u`k B`0
(1.63)
`=1
in which the N × N matrix u`k has an inverse matrix u−1 ki so that ! N N N N N N X X X X X X −1 −1 0 0 u−1 B = u u B = u u B = δ`i B`0 = Bi0 . k `k `k ` ` ki ki ki k=1
k=1
`=1
`=1
k=1
`=1
(1.64) Then the other basis vectors are given by Bi0
=
N X
u−1 ki Bk
(1.65)
k=1
and one may show (problem 3) that the action of the linear operator A on this basis vector is N X 0 0 ABi = u`j ajk u−1 (1.66) ki B` j,k,`=1
which shows that the matrix a0 that represents A in the B 0 basis is related to the matrix a that represents it in the B basis by a0`i
=
N X
u`j ajk u−1 ki
(1.67)
jk=1
which in matrix notation is simply a0 = u a u−1 . Example: Suppose the action of the linear operator A on the basis {B1 , B2 } is AB1 = B2 and AB2 = 0. If the column vectors 1 0 b1 = and b2 = (1.68) 0 1 represent the two basis vectors B1 and B2 , then the matrix 0 0 a= 1 0
(1.69)
would represent the linear operator A. But if we let the column vectors 1 0 b01 = and b02 = (1.70) 0 1
11
1.6 Inner Products
represent the basis vectors 1 B10 = √ (B1 + B2 ) 2 1 B20 = √ (B1 − B2 ) 2
(1.71)
then the vectors 1 b1 = √ 2
1 1
1 and b2 = √ 2
1 −1
would represent B1 and B2 , and so the matrix 1 1 1 0 a = 2 −1 −1
(1.72)
(1.73)
would represent the linear operator A. A linear operator A also may map a vector space S with basis Bk into a different vector space T with a different basis Ck . In this case, A maps the basis vector Bi into a linear combination of the basis vectors Ck ABi =
M X
aki Ck
(1.74)
k=1
and an arbitrary vector U = u1 B1 + · · · + uN BN into ! M N X X AU = aki ui Ck . k=1
(1.75)
i=1
We’ll return to this point in Sections (1.14 & 1.15). 1.6 Inner Products Most, but not all, of the vector spaces used by physicists have an inner product. An inner product is a function that associates a number (f, g) with every ordered pair of vectors f & g in the vector space in such a way as to satisfy these rules: (f, g) = (g, f )∗
(1.76)
(f, z1 g1 + z2 g2 ) = z1 (f, g1 ) + z2 (f, g2 ) (z1 f1 + z2 f2 , g) =
z1∗ (f1 , g)
(f, f ) ≥ 0
+
z2∗ (f2 , g)
(1.77) (1.78) (1.79)
in which the f ’s and g’s are vectors and the z’s are numbers. The first two rules require that the inner product be linear in the second vector of the
12
Linear Algebra
pair and anti-linear in the first vector of the pair. (The third rule follows from the first two.) If, in addition, the only vector f that has a vanishing inner product with itself is the zero vector (f, f ) = 0
if and only if
f =0
(1.80)
then the inner product is hermitian or non degenerate; otherwise it is semi-definite or degenerate. The inner product of a vector f with itself is the square of the norm |f | =k f k of the vector |f |2 =k f k2 = (f, f ) and so by (1.79), the norm is well-defined as p k f k= (f, f ).
(1.81)
(1.82)
The distance between two vectors f and g is the norm of their difference kf −g k.
(1.83)
Example: The space of real vectors V with N components Vi forms an N -dimensional vector space over the real numbers with inner product (U, V ) =
N X
Ui Vi
(1.84)
i=1
If the inner product (U, V ) is zero, then the two vectors are orthogonal. If (U, U ) = 0, then (U, U ) =
N X
Ui2 = 0
(1.85)
i=1
which implies that all Ui = 0, so the vector U = 0. So this inner product is hermitian or non degenerate. Example: The space of complex vectors V with N components Vi forms an N -dimensional vector space over the complex numbers with inner product (U, V ) =
N X
Ui∗ Vi
(1.86)
i=1
If the inner product (U, V ) is zero, then the two vectors are orthogonal. If (U, U ) = 0, then (U, U ) =
N X i=1
Ui∗ Ui
=
N X i=1
|Ui |2 = 0
(1.87)
13
1.6 Inner Products
which implies that all Ui = 0, and so the vector U is zero. So this inner product is hermitian or non degenerate. Example: For the vector space of N × L complex matrices A, B, . . ., the trace of the product of the adjoint (1.31) of A times B is a natural inner product (A, B) = TrA† B =
N X L N X L X X (A† )ji Bij = A∗ij Bij . i=1 j=1
(1.88)
i=1 j=1
Note that (A, A) is positive (A, A) = TrA† A =
N X L X
A∗ij Aij =
i=1 j=1
N X L X
|Aij |2 ≥ 0
(1.89)
i=1 j=1
and zero only when A = 0. Two examples of degenerate or semi-definite inner products are given in the section (1.41) on correlation functions. Mathematicians call a vector space with an inner product (1.76–1.79) an inner-product space, a metric space, and a pre-Hilbert space. A sequence of vectors fn is a Cauchy sequence if for every > 0 there is an integer N () such that kfn − fm k < whenever both n and m exceed N (). A sequence of vectors fn converges to a vector f if for every > 0 there is an integer N () such that kf −fn k < whenever n exceeds N (). An inner-product space with a norm defined as in (1.82) is complete if each of its Cauchy sequences converges to a vector in that space. A Hilbert space is a complete inner-product space. Every finite-dimensional inner-product space is complete and so is a Hilbert space. But the term Hilbert space more often is used to describe infinite-dimensional complete inner-product spaces, such as the space of all square-integrable functions (David Hilbert, 1862–1943). Example 1.1 (The Hilbert Space of Square-Integrable Functions) For the vector space of functions (1.57), a natural inner product is Z b (f, g) = dx f ∗ (x)g(x). (1.90) a
The squared norm k f k of a function f (x) is Z b k f k2 = dx |f (x)|2 .
(1.91)
a
A function is said to be square integrable if its norm is finite. The space of
14
Linear Algebra
all square-integrable functions is an inner-product space; it is also complete and so is a Hilbert space.
1.7 Schwarz Inequality Since by (1.79) the inner product of a vector with itself cannot be negative, it follows that for any vectors f and g and any complex number z = x + iy the inner product P (x, y) = (f + zg, f + zg) = (f, f ) + z ∗ z(g, g) + z ∗ (g, f ) + z(f, g) ≥ 0 (1.92) is positive or zero. It even is non-negative at its minimum, which we may find by differentiation 0=
∂P (x, y) ∂P (x, y) = ∂x ∂y
(1.93)
to be at x = −Re(f, g)/(g, g)
&
y = Im(f, g)/(g, g)
(1.94)
as long as (g, g) > 0. If we substitute these values into Eq.(1.92), then we arrive at the relation (f, f )(g, g) ≥ |(f, g)|2
(1.95)
which is called variously the Cauchy-Schwarz inequality and the Schwarz inequality. Equivalently k f kk g k≥ |(f, g)|.
(1.96)
If the inner product is degenerate and (g, g) = 0, then the non-negativity of (f + zg, f + zg) implies that (f, g) = 0, in which case the Schwarz inequality is trivially satisfied. Example: For the dot-product of two real 3-vectors r & R, the CauchySchwarz inequality is (r · r) (R · R) ≥ (r · R)2 = (r · r) (R · R) cos2 θ
(1.97)
where θ is the angle between r and R. Example: For two real n-vectors x and y, the Schwarz inequality is (x · x) (y · y) ≥ (x · y)2 = (x · x) (y · y) cos2 θ
(1.98)
and it implies (problem 5) that kxk + kyk ≥ kx + yk.
(1.99)
1.8 Linear Independence and Completeness
15
Example: For two complex n-vectors u and v, the Schwarz inequality is (u∗ · u) (v ∗ · v) ≥ |u∗ · v|2 = (u∗ · u) (v ∗ · v) cos2 θ
(1.100)
and it implies (problem 6) that kuk + kvk ≥ ku + vk.
(1.101)
Example: For the inner product (1.90) of two complex functions f and g, the Schwarz inequality is Z b 2 Z b Z b dx |g(x)|2 ≥ dx f ∗ (x)g(x) . (1.102) dx |f (x)|2 a
a
a
1.8 Linear Independence and Completeness A set of N vectors Vi is linearly dependent if there exist numbers ci , not all zero, such that the linear combination ci Vi vanishes N X
ci Vi = 0.
(1.103)
i=1
A set of vectors Vi is linearly independent if it is not linearly dependent. A set {Vi } of linearly independent vectors is maximal in a vector space S if the addition of any other vector V ∈ S to the set {Vi } makes the set {V, Vi } linearly dependent. A set {Vi } of N linearly independent vectors that is maximal in a vector space S spans that space. For if V is any vector in S (and not one of the vectors Vi ), then the set {V, Vi } linearly dependent. Thus there are numbers c, ci , not all zero, that make the sum cV +
N X
ci Vi = 0
(1.104)
i=1
vanish. Now if c were 0, then the set {Vi } would be linearly dependent. Thus c 6= 0, and so we may divide by it and express the arbitrary vector V as a linear combination of the vectors Vi V =−
N 1 X ci Vi . c
(1.105)
i=1
So the set of vectors {Vi } spans the space S; it is a complete set of vectors in the space S.
16
Linear Algebra
A set of vectors {Vi } that is complete in a vector space S is said to provide a basis for that space because the set affords a way to expand an arbitrary vector in S as a linear combination of the basis vectors {Vi }. If the vectors of basis are linearly dependent, then at least one of them is superfluous; thus it is convenient to have the vectors of a basis be linearly independent. 1.9 Dimension of a Vector Space Suppose {Vi |i = 1 . . . N } and {Ui |i = 1 . . . M } are two sets of N and M maximally linearly independent vectors in a space S. Then N = M . Suppose M < N . Since the U ’s are complete, as explained in Sec. 1.8, we may express each of the N vectors Vi in terms of the M vectors Uj Vi =
M X
Aij Uj .
(1.106)
j=1
Let Aj be the vector with components Aij ; there are M < N such vectors, and each has N > M components. So it is always possible to find a nonzero N -dimensional vector C with components ci that is orthogonal to all M vectors Aj : N X
ci Aij = 0.
(1.107)
i=1
But then the linear combination N X i=1
ci Vi =
N X M X
ci Aij Uj = 0
(1.108)
i=1 j=1
vanishes, which would imply that the N vectors Vi were linearly dependent. Since these vectors are by assumption linearly independent, it follows that N ≤ M. Similarly, one may show that M ≤ N . Thus M = N . The number N of vectors in a maximal linearly independent set of a vector space S is the dimension of the vector space. Any N linearly independent vectors in an N -dimensional space forms a basis for it. 1.10 Orthonormal Vectors Suppose the vectors {Vi |i = 1 . . . N } are linearly independent. Then we may make out of them a set of N vectors Ui that are orthonormal (Ui , Uj ) = δij .
(1.109)
1.10 Orthonormal Vectors
17
Procedure (Gramm-Schmidt): We set V1 U1 = p (V1 , V1 )
(1.110)
So the first vector U1 is normalized. Next we set u2 = V2 + c12 U1
(1.111)
and require that u2 be orthogonal to U1 0 = (U1 , u2 ) = (U1 , c12 U1 + V2 ) = c12 + (U1 , V2 )
(1.112)
whence c12 = −(U1 , V2 ), and so u2 = V2 − (U1 , V2 )U1 .
(1.113)
The normalized vector U2 then is u2 U2 = p . (u2 , u2 )
(1.114)
u3 = V3 + c13 U1 + c23 U2
(1.115)
Similarly, we set
and ask that u3 be orthogonal both to U1 0 = (U1 , u3 ) = (U1 , c13 U1 + c23 U2 + V3 ) = c13 + (U1 , V3 )
(1.116)
and to U2 0 = (U2 , u3 ) = (U2 , c13 U1 + c23 U2 + V3 ) = c23 + (U2 , V3 )
(1.117)
whence ci3 = −(Ui , V3 ) for i = 1 & 2, and so u3 = V3 − (U1 , V3 )U1 − (U2 , V3 )U2 .
(1.118)
The normalized vector U3 then is u3 U3 = p . (u3 , u3 )
(1.119)
We may continue in this way until we reach the last of the N linearly independent vectors. We require the kth unnormalized vector uk uk = Vk +
k−1 X i=1
cik Ui .
(1.120)
18
Linear Algebra
to be orthogonal to the k − 1 vectors Ui and find that cik = −(Ui , Vk ) so that k−1 X uk = Vk − (Ui , Vk )Ui . (1.121) i=1
The normalized vector then is uk . Uk = p (uk , uk )
(1.122)
In general, a basis is more useful if it is composed of orthonormal vectors.
1.11 Outer Products From any two vectors f and g, we may make an operator A that takes any vector h into the vector f with coefficient (g, h) Ah = f (g, h).
(1.123)
It is easy to show that A is linear, that is that A(zh + we) = zAh + wAe
(1.124)
for any vectors e, h and numbers z, w. Example: If f and g are vectors with components fi and gi , and h has components hi , then the linear transformation is (Ah)i =
N X
Aij hj = fi
j=1
N X
gj∗ hj
(1.125)
j=1
so A is a matrix with entries Aij = fi gj∗ .
(1.126)
The matrix A is the outer product of the vectors f and g.
1.12 Dirac Notation Such outer products are important in quantum mechanics, and so Dirac invented a notation for linear algebra that makes them easy to write. In his notation, the outer product A of Eqs.(1.123–1.126) is A = |f ihg|
(1.127)
19
1.12 Dirac Notation
and the inner product (g, h) is (g, h) = hg|hi.
(1.128)
He called hg| a bra and |hi a ket, so that hg|hi is a bracket. notation Eq.(1.123) reads A|hi = |f ihg|hi.
In his (1.129)
The new thing in Dirac’s notation is the bra hf |. If the ket |f i is represented by the vector z1 z2 |f i = (1.130) z3 z4 then the bra hf | is represented by the adjoint of that vector hf | = (z1∗ , z2∗ , z3∗ , z4∗ ) .
(1.131)
In the standard notation, bras are implicit in the definition of the inner product, but they do not appear explicitly. In Dirac’s notation, the rules that a hermitian inner product (1.76–1.80) satisfies are: hf |gi = hg|f i∗
(1.132)
hf |z1 g1 + z2 g2 i = z1 hf |g1 i + z2 hf |g2 i hz1 f1 + z2 f2 |gi =
z1∗ hf1 |gi
+
z2∗ hf2 |gi
(1.134)
hf |f i ≥ 0 hf |f i = 0
(1.133) (1.135)
if and only if
f = 0.
(1.136)
Usually, however, states in Dirac notation are labeled |ψi or by their quantum numbers |n, l, mi, and so one rarely sees plus signs or complex numbers or operators inside bras or kets. But one should. Dirac’s notation allows us to write outer products clearly and simply. Example: If the vectors f = |f i and g = |gi are a z |f i = b and |gi = (1.137) w c then their outer products are ∗ ∗ az aw∗ za zb∗ zc∗ ∗ ∗ |f ihg| = bz bw and |gihf | = wa∗ wb∗ wc∗ cz ∗ cw∗
(1.138)
20
Linear Algebra
as well as ∗ aa ab∗ ac∗ |f ihf | = ba∗ bb∗ bc∗ ca∗ cb∗ cc∗
zz ∗ zw∗ . wz ∗ ww∗
and |gihg| =
(1.139)
Example: In Dirac notation, formula (1.121) is |uk i = |Vk i −
k−1 X
|Ui ihUi |Vk i
(1.140)
i=1
or |uk i =
I−
k−1 X
! |Ui ihUi | |Vk i
(1.141)
i=1
and (1.122) is |uk i . |Uk i = p huk |uk i
(1.142)
1.13 Identity Operators Dirac notation provides a neat way of representing the identity operator I in terms of a complete set of orthonormal vectors. First, in standard notation, the expansion of an arbitrary vector f in a space S in terms of a complete set of N orthonormal vectors ei (ej , ei ) = δij
(1.143)
is f=
N X
ci ei
(1.144)
i=1
from which we conclude that (ej , f ) = (ej ,
N X
ci ei ) =
i=1
N X
ci (ej , ei ) =
i=1
N X
ci δij = cj
(1.145)
i=1
whence N N X X f= (ei , f ) ei = ei (ei , f ). i=1
(1.146)
i=1
The derivation stops here because there is no explicit expression for a bra.
21
1.13 Identity Operators
But in Dirac notation, these equations read hej |ei i = δij |f i =
N X
(1.147) ci |ei i
(1.148)
i=1
hej |f i = hej |
N X
ci ei i =
N X
i=1
|f i =
N X hei |f i |ei i =
ci hej |ei i =
i=1 N X
i=1
N X
ci δij = cj
(1.149)
i=1
|ei i hei |f i.
(1.150)
i=1
(1.151) We now rewrite the last equation as |f i =
N X
! |ei i hei | |f i.
(1.152)
i=1
Since this equation holds for every vector |f i, the quantity inside the parentheses must be the identity operator I=
N X
|ei i hei |.
(1.153)
i=1
Because one always may insert an identity operator anywhere, and because the formula is true for every complete set of orthonormal vectors, the resolution (1.153) of the identity operator is extremely useful. By twice inserting the identity operator (1.153), one may convert a general inner product (g, Af ) = hg|A|f i into an expression involving a matrix Aij that represents the linear operator A hg|A|f i = hg|IAI|f i =
N X
hg|ei ihei |A|ej ihej |f i
(1.154)
i,j=1
In the basis {|ek i}, the matrix Aij that represents the linear operator A is Aij = hei |A|ej i
(1.155)
and the components of the vectors |f i and |gi are fi = hei |f i gi = hei |gi.
(1.156)
22
Linear Algebra
In this basis, the inner product (g, Af ) = hg|A|f i takes the form hg|A|f i =
N X
gi∗ Aij fj .
(1.157)
i,j=1
1.14 Vectors and Their Components Usually, the components vk of a vector |vi are the inner products vk = hk|vi
(1.158)
of the vector |vi with a set of orthonormal basis vectors |ki. Thus the components vk of a vector |vi depend on both the vector and the basis. A vector is independent of the basis used to compute its components, but its components depend upon the chosen basis. If the basis is orthonormal and so provides for the identity operator I the expansion I=
N X
|kihk|
(1.159)
k=1
then the components vk of the vector |vi are the coefficients in its expansion in terms of the basis vectors |ki |vi = I|vi =
N X
|kihk|vi =
N X
vk |ki.
(1.160)
k=1
k=1
1.15 Linear Operators and Their Matrices A linear operator A maps vectors into vectors linearly as in Eq. (1.59) A(bU + cV ) = bA(U ) + cA(V ) = bAU + cAV.
(1.161)
In the simplest and most important case, the linear operator A maps the vectors of a vector space S into vectors in the same space S. If the space S is N -dimensional, then it maps the vectors |ii of any basis {|ii} for S into vectors A|ii ≡ |Aii that can be expanded in terms of the same basis {|ki} A|ii = |Aii =
N X
Aki |ki.
(1.162)
k=1
The N × N matrix with entries Aki represents the linear operator A
23
1.15 Linear Operators and Their Matrices
in the basis {|ii}. Because A is linear, its action on an arbitrary vector P |Ci = N i=1 Ci |ii in S is ! N N N X N X X X A|Ci = A Ci |ii = Ci A |ii = Aki Ci |ki. (1.163) i=1
i=1
k=1 i=1
Thus the coefficients (AC)k of the vector A|Ci ≡ |ACi in the expansion N X (AC)k |ki
A|Ci = |ACi =
(1.164)
k=1
are given by the matrix multiplication of the vector C with elements Ci by the matrix A with entries Aki (AC)k =
N X
Aki Ci .
(1.165)
i=1
Both the elements Ci of the vector C and the entries Aki of the matrix A depend upon the basis {|ii} one chooses to use. If the vectors {|ii} are orthonormal, then the elements C` and A`i are h`|Ci =
h`|A|ii =
N X i=1 N X
Ci h`|ii =
N X
Ci δ`i = C`
i=1 N X
Aki h`|ki =
k=1
Aki δ`k = A`i .
(1.166)
k=1
In the more general case, the linear operator A maps vectors in a vector space S into vectors in a different vector space S 0 . Now A maps an orthonormal basis {|ii} for S into vectors A|ii that may be expanded in terms of an orthonormal basis {|ki0 } 0
A|ii =
N X
Aki |ki0 .
(1.167)
k=1
If the N vectors A|ii are linearly independent, then N 0 = N , but if they are linearly dependent or if some of them are zero, then N 0 < N . The elements A`i of the matrix that represents the linear operator A now are N0 N0 X X 8 8 0 h`|A|ii = Aki h`|ki = Aki δ`k = A`i . (1.168) k=1
k=1
They depend on both bases {|ii} and {|ki0 }. So although the linear
24
Linear Algebra
operator is basis independent, the matrices that represent it vary with the chosen bases. So far we have mostly been talking about linear operators that act on finite-dimensional vector spaces and that can be represented by matrices. But infinite-dimensional vector spaces and the linear operators that act on them play central roles in electrodynamics and quantum mechanics. For instance, the Hilbert space H of all “wave” functions ψ(x, t) that are square integrable over three-dimensional space at all times t is of (very) infinite dimension. An example in one space dimension of a linear operator that maps (a subspace of) H to H is the hamiltonian H for a non-relativistic particle of mass m in a potential V ~2 d2 + V (x). (1.169) 2m dx2 It maps the state vector |ψi with “components” hx|ψi = ψ(x) into the vector H|ψi with components H=−
hx|H|ψi = Hψ(x) = −
~2 d2 ψ(x) + V (x)ψ(x) 2m dx2
(1.170)
where ~ = 1.05 × 10−34 Js. Translations in space and time UT (a, b)ψ(x, t) = ψ(x + a, t + b)
(1.171)
and rotations in space UR (θ)ψ(x, t) = ψ(R(θ)x, t)
(1.172)
are also represented by linear operators acting on vector spaces of infinite dimension. As we’ll see in what follows, these linear operators are unitary. We may think of linear operators that act on vector spaces of infinite dimension as infinite-dimensional matrices or as “matrices” of continuously infinite dimension, the latter really being integral operators like Z ∞ Z ∞ H= dp dp0 |pihp|H|p0 ihp0 |. (1.173) −∞
−∞
Thus we may carry over to spaces of infinite dimension most of our intuition about matrices—as long as we use common sense and keep in mind that infinite sums and integrals do not always converge to finite numbers. 1.16 Determinants The determinant of a 2 × 2 matrix A is det A = |A| = A11 A22 − A21 A12 .
(1.174)
25
1.16 Determinants
In terms of the antisymmetric matrix eij = −eji (which implies that e11 = e22 = 0) with e12 = 1, this determinant is det A =
2 X 2 X
eij Ai1 Aj2 .
(1.175)
i=1 j=1
It’s also true that ek` det A =
2 X 2 X
eij Aik Aj` .
(1.176)
i=1 j=1
These definitions and results extend to any square matrix. If A is a 3 × 3 matrix, then its determinant is det A =
3 X
eijk Ai1 Aj2 Ak3
(1.177)
ijk=1
in which eijk is totally antisymmetric with e123 = 1 and the sums over i, j, & k run from 1 to 3. More explicitly, this determinant is det A =
3 X
eijk Ai1 Aj2 Ak3
ijk=1
=
3 X
Ai1
i=1
3 X
eijk Aj2 Ak3
jk=1
= A11 (A22 A33 − A32 A23 ) + A21 (A32 A13 − A12 A33 ) +A31 (A12 A23 − A22 A13 ) .
(1.178)
This sum involves the 2 × 2 determinants of the matrices that result when we strike out column 1 and row i, which are called minors, multiplied by (−1)1+i det A = A11 (−1)2 (A22 A33 − A32 A23 ) + A21 (−1)3 (A12 A33 − A32 A13 ) +A31 (−1)4 (A12 A23 − A22 A13 ) =
3 X
Ai1 Ci1 .
(1.179) (1.180)
i=1
These minors multiplied by (−1)1+i are called cofactors: C11 = A22 A33 − A23 A32 C21 = − (A12 A33 − A32 A13 ) C31 = A12 A23 − A22 A13 .
(1.181)
26
Linear Algebra
This way of computing determinants is due to Laplace. Example: The determinant of a 3 × 3 matrix is the dot product of the vector of its first row with the cross-product of the vectors of its second and third rows: U1 U2 U3 3 3 X X V1 V2 V3 = eijk Ui Vj Wk = Ui (V × W )i = U · (V × W ). W1 W2 W3 ijk=1 i=1 (1.182) Totally antisymmetric quantities ei1 i2 ...iN with N indices and with e123...N = 1 provide a definition of the determinant of an N × N matrix A as N X
det A =
ei1 i2 ...iN Ai1 1 Ai2 2 . . . AiN N
(1.183)
i1 i2 ...iN =1
in which the sums over i1 . . . iN run from 1 to N . The general form of Laplace’s expansion of this determinant is det A =
N X i=1
Aik Cik =
N X
Aik Cik
(1.184)
k=1
in which the first sum is over the row index i but not the (arbitrary) column index k, and the second sum is over the column index k but not the (arbitrary) row index i. The cofactor Cik is (−1)i+k Mik in which the minor Mik is the determinant of the (N − 1) × (N − 1) matrix A without its ith row and kth column. Incidentally, it’s also true that N X
ek1 k2 ...kN det A =
ei1 i2 ...iN Ai1 k1 Ai2 k2 . . . AiN kN .
(1.185)
i1 i2 ...iN =1
The key feature of a determinant is that it is an antisymmetric combination of products of the elements Aik of a matrix A. One implication of this antisymmetry is that the interchange of any two rows or any two columns changes the sign of the determinant. Another is that if one adds a multiple of one column to another column, for example a multiple xAi2 of column 2 to column 1, then the determinant det A0 =
N X i1 i2 ...in =1
ei1 i2 ...iN (Ai1 1 + xAi1 2 ) Ai2 2 . . . AiN N
(1.186)
1.16 Determinants
27
is unchanged. The reason is that the extra term δ det A vanishes δ det A =
N X
x ei1 i2 ...iN Ai1 2 Ai2 2 . . . AiN N = 0
(1.187)
i1 i2 ...iN =1
because it is proportional to a sum of products of a factor ei1 i2 ...iN that is antisymmetric in i1 and i2 and a factor Ai1 2 Ai2 2 that is symmetric in these indices. For instance, when i1 and i2 are 5 & 7 and 7 & 5, the two terms cancel e57...iN A52 A72 . . . AiN N + e75...iN A72 A52 . . . AiN N = 0
(1.188)
because e57...iN = −e75...iN . By repeated additions of x2 Ai2 , x3 Ai3 , etc. to Ai1 , we can change the first column of the matrix A to a nearly arbitrary linear combination of all the columns N X Ai1 −→ Ai1 + xk Aik (1.189) k=2
without changing det A. This linear combination is not completely arbitrary because the coefficient of Ai1 remains unity. The analogous operation Ai` −→ Ai` +
N X
yk Aik
(1.190)
k=1,k6=`
replaces the `th column by a nearly arbitrary linear combination of all the columns without changing det A. The key concepts of linear dependence and independence were explained in Sec. 1.8. Suppose that the columns of an N × N matrix A are linearly dependent, so that for some coefficients yk not all zero the linear combination N X
yk Aik = 0
∀i
(1.191)
k=1
vanishes for all i (the upside-down A means for all ). Suppose y1 6= 0. Then by adding suitable linear combinations of columns 2 through N to column 1, we could make all the elements Ai1 of column 1 vanish without changing det A. But then the det A as given by (1.183) would vanish. It follows that the determinant of any matrix whose columns are linearly dependent must vanish. The converse also is true: if columns of a matrix are linearly independent, then the determinant of that matrix can not vanish. To see why, let us
28
Linear Algebra
recall, as explained in Sec. 1.8, that any linearly independent set of vectors is complete. Thus if the columns of a matrix A are linearly independent and therefore complete, some linear combination of all columns 2 through N when added to column 1 will convert column 1 into a (non-zero) multiple of the N -dimensional column vector (1, 0, 0, . . . 0), say (c1 , 0, 0, . . . 0). Similar operations will convert column 2 into a (non-zero) multiple of the column vector (0, 1, 0, . . . 0), say (0, c2 , 0, . . . 0). Continuing in this way, we may convert the matrix A to a matrix with non-zero entries along the main diagonal and zeros everywhere else. The determinant det A is then the product of the non-zero diagonal entries c1 c2 . . . cN 6= 0, and so det A can not vanish. We may extend these arguments to the rows of a matrix. The addition to row k of a linear combination of the other rows Aki −→ Aki +
N X
z` A`i
(1.192)
`=1,`6=k
does not change the value of the determinant. In this way, one may show that the determinant of a matrix vanishes if and only if its rows are linearly dependent. The reason why these results apply to the rows as well as to the columns is that the determinant of a matrix A may be defined either in terms of the columns as in definitions (1.183) & 1.185) or in terms of the rows: det A =
N X
ei1 i2 ...iN A1i1 A2i2 . . . AN iN
(1.193)
ei1 i2 ...iN Ak1 i1 Ak2 i2 . . . AkN iN .
(1.194)
i1 i2 ...iN =1
ek1 k2 ...kN det A =
N X i1 i2 ...iN =1
These and many other properties of determinants follow from a study of permutations, which are discussed in Section 10.13. Detailed proofs can be found in the book by Aitken (Aitken, 1959). By comparing the row (1.183) & 1.185) and (1.193 & 1.194) column definitions of determinants, we see that the determinant of the transpose of a matrix is the same as the determinant of the matrix itself: det AT = det A.
(1.195)
Let us return for a moment to Laplace’s expansion (1.184) for the determinant det A of an N × N matrix A as a sum of Aik Cik over the row index
29
1.16 Determinants
i with the column index k held fixed det A =
N X
Aik Cik
(1.196)
i=1
in order to prove that δk` det A =
N X
Aik Ci` .
(1.197)
i=1
For k = `, this formula just repeats Laplace’s expansion (1.196). But for k 6= `, it is Laplace’s expansion for the determinant of a matrix A0 that is the same as A but with its `th column replaced by its kth one. Since the matrix A0 has two identical columns, its determinant vanishes, which explains (1.197) for k 6= `. The rule (1.197) therefore provides a formula for the inverse of a matrix A whose determinant does not vanish. Such matrices are called nonsingular. The inverse A−1 of an N × N nonsingular matrix A is the transpose of the matrix of cofactors divided by det A A−1
`i
=
Ci` det A
or A−1 =
CT . det A
(1.198)
To verify this formula, we use it for A−1 in the product A−1 A and note that by (1.197) the `kth entry of the product A−1 A is just δ`k A
−1
A
`k
=
N X
A
i=1
−1
`i
Aik
N X Ci` = Aik = δ`k det A
(1.199)
i=1
as required. Example: Let’s apply our formula general 2 × 2 matrix a A= c
(1.198) to find the inverse of the b . d
(1.200)
We find then A
−1
1 = ad − bc
d −b −c a
which is the correct inverse. The simple example of matrix multiplication a b c 1 x y a xa + b ya + zb + c d e f 0 1 z = d xd + e yd + ze + f g h i 0 0 1 g xg + h yg + zh + i
(1.201)
(1.202)
30
Linear Algebra
shows that the operations (1.190) on columns that don’t change the value of the determinant can be written as matrix multiplication from the right by a matrix that has unity on its main diagonal and zeros below. Now imagine that A and B are N × N matrices and consider the 2N × 2N matrix product A 0 I B A AB = (1.203) −I B 0 I −I 0 in which I is the N ×N identity matrix, and 0 is the N ×N matrix of all zeros. The second matrix on the left-hand side has unity on its main diagonal and zeros below, and so it does not change the value of the determinant of the matrix to its left, which thus is equal to that of the matrix on the right-hand side: A 0 A AB det = det . (1.204) −I B −I 0 By using Laplace’s expansion (1.184) along the first column to evaluate the determinant on the left-hand side (LHS) and Laplace’s expansion (1.184) along the last row to compute the determinant on the right-hand side (RHS), one may derive the general and important rule that the determinant of the product of two matrices is the product of the determinants det A det B = det AB.
(1.205)
Example: The case in which the matrices A and B are both 2 × 2 is easy to understand. The LHS of Eq.(1.204) gives a11 a12 0 0 a21 a22 0 0 A 0 (1.206) det = det −I B −1 0 b11 b12 0 −1 b21 b22 = a11 a22 det B − a21 a12 det B = det A det B while its RHS comes to det
A AB −I 0
a11 a12 ab11 ab12 a21 a22 ab21 ab22 = det −1 0 0 0 0 −1 0 0 = (−1)C42 = (−1)(−1) det AB = det AB. (1.207)
Often one uses an absolute-value notation to denote a determinant, |A| =
1.16 Determinants
31
det A. In this more compact notation, the obvious generalization of the product rule is |ABC . . . Z| = |A||B| . . . |Z|.
(1.208)
The product rule (1.208) implies that the determinant of A−1 is the inverse of |A| since 1 = |I| = |AA−1 | = |A||A−1 |.
(1.209)
Incidentally, Gauss, Jordan, and others have developed much faster ways of computing determinants and matrix inverses than those (1.184 & 1.198) due to Laplace. Octave, Matlab, Maple, and Mathematica use these more modern techniques, which also are freely available as programs in C and fortran from www.netlib.org/lapack. Numerical Example: Adding multiples of rows to other rows does not change the value of a determinant, and interchanging two rows only changes a determinant by a minus sign. So we can use these operations, which leave determinants invariant, to make a matrix upper triangular, a form in which its determinant is just the product of the factors on its diagonal. For instance, to make the matrix 1 2 1 (1.210) A = −2 −6 3 4 2 −5 upper triangular, we add twice the first 1 2 0 −2 4 2
row to the second row 1 5 −5
and then subtract four times the first row from the third 1 2 1 0 −2 5 . 0 −6 −9 Next, we subtract three times the second row from the third 1 2 1 0 −2 5 . 0 0 −24
(1.211)
(1.212)
(1.213)
We now find as the determinant of A the product of its diagonal elements: |A| = 1(−2)(−24) = 48. The Matlab command is d = det(A).
(1.214)
32
Linear Algebra
1.17 Systems of Linear Equations Suppose we wish to solve the system of linear equations N X
Aik xk = yi
(1.215)
k=1
for the N unknowns xk . In matrix notation, with A an N × N matrix and x and y N -vectors, this system of equations is Ax = y.
(1.216)
If the matrix A is non-singular, that is, if det(A) 6= 0, then it has an inverse A−1 given by (1.198), and we may multiply both sides of (1.216) by A−1 and so find x as x = Ix = A−1 Ax = A−1 y.
(1.217)
When A is non-singular, this is the unique solution to (1.215). When A is singular, det(A) = 0, and so its columns are linearly dependent as explained in Sec. 1.16. In this case, the linear dependence of the columns of A implies that Az = 0 for some non-zero vector z, and so if x is a solution, then Ax = y implies that x + cz for all c is also a solution since A(x + cz) = Ax + cAz = Ax = y. So if det(A) = 0, then there may be solutions, but there can be no unique solution. Whether equation (1.215) has any solutions when det(A) = 0 depends on whether the vector y can be expressed as a linear combination of the columns of A. Since these columns are linearly dependent, they span a subspace of fewer than N dimensions, and so (1.215) has solutions only when the N -vector y lies in that subspace. A system of M equations N X
Aik xk = yi
for i = 1, 2, . . . , M
(1.218)
k=1
in N , more than M , unknowns is under-determined. As long as at least M of the N columns Aik of the matrix A are linearly independent, such a system always has solutions, but they are not unique. 1.18 Linear Least Squares Suppose we are confronted with a system of M equations N X k=1
Aik xk = yi
for i = 1, 2, . . . , M
(1.219)
33
1.18 Linear Least Squares
in fewer unknowns N < M . This problem is over-determined. In general, it has no solution, but it does have an approximate solution due to Carl Gauss (1777–1855). If the matrix A and the vector y are real, then Gauss’s solution is the N values xk that minimize the sum of the squares of the errors E=
M X i=1
yi −
N X
!2 Aik xk
.
The minimizing values x` make the N derivatives of E vanish ! M N X X ∂E =0= 2 yi − Aik xk (−Ai` ) ∂x` i=1
(1.220)
k=1
(1.221)
k=1
so in matrix notation AT y = AT Ax.
(1.222)
Since A is real, the matrix of the form AT A is non-negative (1.41); if it also is positive (1.42), then it has an inverse, and our least-squares solution is −1 T x = AT A A y. (1.223) If the matrix A and the vector y are complex, and if the matrix A† A is positive, then one may show (problem 16) that minimization of the sum of the squares of the absolute values of the errors gives −1 x = A† A A† y. (1.224) Example from biophysics: If the wavelength of visible light were a nanometer, microscopes would yield much sharper images. Each photon from a (single-molecule) fluorofore entering the lens of a microscope would follow ray optics and be focused within a tiny circle of about a nanometer on a detector. Instead, a photon that should arrive at x = (x1 , x2 ) arrives at yi = (y1i , y2i ) according to an approximately gaussian probability distribution 2 /(2σ 2 )
P (yi ) = c e−(yi −x)
(1.225)
in which c is a normalization constant and σ is about 150 nm. What to do? Keith Lidke and his merry band of biophysicists collect about N = 500
34
Linear Algebra
Figure 1.1 Conventional (left, fuzzy) and STORM (right, sharp) images of microtubules. The tubulin is labeled with a fluorescent anti-tubulin antibody. The white rectangles are 1 micron in length. Images courtesy of Keith Lidke.
points yi and determine the point x that maximizes the joint probability of the ensemble of image points " N # N N Y X Y N 2 2 −(yi −x)2 /(2σ 2 ) N P = P (yi ) = c (yi − x) /(2σ ) e = c exp − i=1
i=1
i=1
(1.226) by solving for k = 1 and 2 the equations " N # N X P X ∂P ∂P 2 2 − =0=P (yi − x) /(2σ ) = 2 (yik − xk ) . ∂xk ∂xk σ i=1
(1.227)
i=1
Thus this maximum likelihood estimate of the image point x is the average of the observed points yi N 1 X x= yi . N
(1.228)
i=1
Their stochastic optical reconstruction microscopy (STORM) is more complicated because they also account for the finite accuracy of their detector. Microtubules are long hollow tubes made of the protein tubulin. They are 25 nm in diameter and typically have one end attached to a centrosome. Together with actin and intermediate filaments, they form the cytoskeleton of a eukaryotic cell. Fig. 1.1 shows conventional (left, fuzzy) and STORM
1.19 The Adjoint of an Operator
35
(right, sharp) images of microtubules. The fluorophore attaches at a random point on an anti-tubulin antibody of finite size, which binds to the tubulin of a microtubule. This spatial uncertainty and the motion of the molecules of living cells limits the improvement in resolution is by a factor of 10 to 20.
1.19 The Adjoint of an Operator The adjoint
A†
of a linear operator A is defined by (g, A† f ) = (A g, f ) = (f, A g)∗ .
(1.229)
Equivalent expressions in Dirac notation are hg|A† f i = hg|A† |f i = hAg|f i = hf |Agi∗ = hf |A|gi∗ .
(1.230)
So if the vectors {ei } are orthonormal and complete in a space S, then with f = ej and g = ei , the definition (1.229) or (1.230) of the adjoint A† of a linear operator A implies hei |A† |ej i = hej |A|ei i∗
(1.231)
or
A†
ij
= (Aji )∗ = A∗ji
(1.232)
in agreement with our definiton (1.31) of the adjoint of a matrix as the transpose of its complex conjugate A† = A∗ T . Since both (A∗ )∗ = A and (AT )T = A, it follows that † ∗ A† = A∗ T T = A
(1.233)
(1.234)
so the adjoint of an adjoint is the original operator. By applying this rule (1.234) to the definition (1.229) of the adjoint, we find the related rule (g, Af ) = (g, A†† f ) = (A† g, f ).
(1.235)
1.20 Self-Adjoint or Hermitian Linear Operators An operator A that is equal to its adjoint A† = A
(1.236)
36
Linear Algebra
is self adjoint or hermitian. In view of definition (1.229), a self-adjoint linear operator A satisfies (g, A f ) = (A g, f ) = (f, A g)∗
(1.237)
hg| A |f i = hA g|f i = hf |A gi∗ = hf | A |gi∗ .
(1.238)
or equivalently
By Eq.(1.232), a hermitian operator A that acts on a finite-dimensional vector space is represented in an orthonormal basis by a matrix that is equal to the transpose of its complex conjugate = (Aji )∗ = A∗ji . (1.239) Aij = (A)ij = A† ij
Such matrices are said to be hermitian. Conversely, a linear operator that is represented by a hermitian matrix in an orthonormal basis is self adjoint (problem 17). A matrix Aij that is real and symmetric or imaginary and antisymmetric is hermitian. But a self-adjoint linear operator A that is represented by a matrix Aij that is real and symmetric (or imaginary and anti-symmetric) in one orthonormal basis will not in general be represented by a matrix that is real and symmetric (or imaginary and anti-symmetric) in a different orthonormal basis, but it will be represented by a hermitian matrix in every orthonormal basis. As we’ll see in section (1.30), hermitian matrices have real eigenvalues and complete sets of orthonormal eigenvectors. Hermitian operators and matrices represent physical variables in quantum mechanics. 1.21 Real, Symmetric Linear Operators In quantum mechanics, we usually consider complex vector spaces, that is, spaces in which the vectors |f i are complex linear combinations |f i =
N X
zi |ii
(1.240)
i=1
of complex orthonormal basis vectors |ii. But real vector spaces also are of interest. A real vector space is a vector space in which the vectors |f i are real linear combinations |f i =
N X n=1
xn |ni
(1.241)
37
1.22 Unitary Operators
of real orthonormal basis vectors, x∗n = xn and |ni∗ = |ni. A real linear operator A on a real vector space A=
N X
|nihn|A|mihm| =
n,m=1
N X
|niAnm hm|
(1.242)
n,m=1
is represented by a real matrix A∗nm = Anm . A real linear operator A that is self adjoint on a real vector space satisfies the condition (1.237) of hermiticity but with the understanding that complex conjugation has no effect (g, A f ) = (A g, f ) = (f, A g)∗ = (f, A g).
(1.243)
Thus, its matrix elements are symmetric: hg|A|f i = hf |A|gi. Since A is hermitian as well as real, the matrix Anm that represents it (in a real basis) is real and hermitian, and so is symmetric Anm = A∗mn = Amn .
(1.244)
1.22 Unitary Operators A unitary operator U is one whose adjoint is its inverse U U † = U † U = I.
(1.245)
In general, the unitary operators we’ll consider also are linear, that is U (z|ψi + w|φi) = zU |ψi + wU |φi
(1.246)
for all states or vectors |ψi and |φi and all complex numbers z and w. In standard notation, U † U = I implies that for any vectors f and g (g, f ) = (g, U † U f ) = (U g, U f )
(1.247)
(g, f ) = (g, U U † f ) = (U † g, U † f ).
(1.248)
as well as
In Dirac notation, these equations are hg|f i = hg|U † U |f i = hU g|U |f i = hU g|U f i
(1.249)
hg|f i = hg|U U † |f i = hU † g|U † |f i = hU † g|U † f i.
(1.250)
and
Suppose the states {|ni} form an orthonormal basis for a given vector space. Then if U is any unitary operator, the relations (1.247–1.250) show
38
Linear Algebra
that the states {U |ni} also form an orthonormal basis. The orthonormality of the image states {U |ni} follows from that of the basis states {|ni} δnm = hn|mi = hU n|U mi = hn|U † U |mi.
(1.251)
The completeness relation for the basis states {|ni} is that the sum of their dyadics is the identity operator X |nihn| = I (1.252) n
and it implies that the images states {U |ni} also are complete X U |nihn|U † = U IU † = U U † = I.
(1.253)
n
So a unitary matrix U maps an orthonormal basis into another orthonormal basis. In fact, any linear map from one orthonormal basis {|φn i} to another {|ψn i} must be unitary. Such an operator will be of the form U=
N X
|ψn ihφn |
(1.254)
n=1
with hφn |φm i = δnm
and hψn |ψm i = δnm .
(1.255)
The unitarity of such a sum is evident: U U† =
=
N X n=1 N X
|ψn ihφn |
N X
|φm ihψm |
m=1 N X
n=1 m=1
|ψn i δnm hψm | =
N X
|ψn ihψn | = I.
(1.256)
n=1
The product U † U similarly collapses to unity. Unitary matrices have unimodular determinants. To show this, we use the definition (1.245), that is, U U † = I, and the product rule for determinants (1.208) to write 1 = |I| = |U U † | = |U ||U † | = |U ||U T |∗ = |U ||U |∗ .
(1.257)
A unitary matrix that is real is said to be orthogonal. An orthogonal matrix O satisfies OOT = OT O = I.
(1.258)
1.23 Antiunitary, Antilinear Operators
39
1.23 Antiunitary, Antilinear Operators Certain maps on states ψ → ψ 0 , such as those involving time reversal, are implemented by operators K that are antilinear K (zψ + wφ) = K (z|ψi + w|φi) = z ∗ K|ψi + w∗ K|φi = z ∗ Kψ + w∗ Kφ (1.259) and antiunitary (Kφ, Kψ) = hKφ|Kψi = (φ, ψ)∗ = hφ|ψi∗ = hψ|φi = (ψ, φ) .
(1.260)
Don’t feel bad if you find such operators spooky. I do too.
1.24 Symmetry in Quantum Mechanics In quantum mechanics, a symmetry is a map of states f → f 0 that preserves their inner products |hφ0 |ψ 0 i|2 = |hφ|ψi|2
(1.261)
and so their predicted probabilities. The inner products of the primed and unprimed vectors are the same. Eugene Wigner (1902–1995) has shown that every symmetry in quantum mechanics can be represented either by an operator U that is linear and unitary or by an operator K that is anti-linear and anti-unitary. The antilinear, anti-unitary case seems to occur only when the symmetry involves time-reversal; most symmetries are represented by operators U that are linear and unitary. So unitary operators are of great importance in quantum mechanics. They are used to represent rotations, translations, Lorentz transformations, internal-symmetry transformations — just about all symmetries not involving time-reversal.
1.25 Lagrange Multipliers The maxima and minima of a function f (x) of several variables x1 , x2 , . . . , xn are among the points at which its gradient vanishes ∇f (x) = 0.
(1.262)
These are the stationary points of f . Example 1.2 (Minimum) minimum is at
For instance, if f (x) = x21 + 2x22 + 3x23 , then its
∇f (x) = (2x1 , 4x2 , 6x3 ) = 0
(1.263)
40
Linear Algebra
that is, at x1 = x2 = x3 = 0. But how do we find the extrema of f (x) if x must satisfy k constraints c1 (x) = 0, c2 (x) = 0, . . . , ck (x) = 0? We use Lagrange multipliers (JosephLouis Lagrange, 1736–1813). In the case of one constraint c(x) = 0, we no longer expect the gradient ∇f (x) to vanish, but its projection ∇f (x)·dx must vanish in those directions dx that preserve the constraint. So ∇f (x)·dx = 0 for all dx that make ∇c(x)· dx = 0. This means that ∇f (x) and ∇c(x) must be parallel. Thus, the extrema of f (x) subject to the constraint c(x) = 0 satisfy the two equations ∇f (x) = λ ∇c(x)
and c(x) = 0.
(1.264)
These equations define the extrema of the unconstrained function L(x, λ) = f (x) − λ c(x)
(1.265)
of the n + 1 variables x, . . . , xn , λ ∂L(x, λ) = − c(x) = 0. (1.266) ∂λ The extra variable λ is a Lagrange multiplier. In the case of k constraints c1 (x) = 0, . . . , ck (x) = 0, the projection ∇f · dx must vanish in those directions dx that preserve all the constraints. So ∇f (x) · dx = 0 for all dx that make all ∇cj (x) · dx = 0 for j = 1, . . . , k. The gradient ∇f will satisfy this requirement if it’s a linear combination ∇L(x, λ) = ∇f (x) − λ ∇c(x) = 0
and
∇f = λ1 ∇c1 + · · · + λk ∇ck
(1.267)
of the k gradients because then ∇f · dx will vanish if ∇cj · dx = 0 for j = 1, . . . , k. The extrema also must satisfy the constraints c1 (x) = 0, . . . , ck (x) = 0.
(1.268)
Equations (1.267 & 1.268) define the extrema of the unconstrained function L(x, λ) = f (x) − λ1 c1 (x) + . . . λk ck (x)
(1.269)
of the n + k variables x and λ ∇L(x, λ) = ∇f (x) − λ ∇c1 (x) − · · · − λ ∇ck (x) = 0
(1.270)
and ∂L(x, λ) = − cj (x) = 0 j = 1, . . . , k. ∂λj
(1.271)
1.26 Eigenvectors and Invariant Subspaces
41
Example 1.3 (Constrained Extrema and Eigenvectors) Suppose we want to find the extrema of a real, symmetric quadratic form f (x) = xT A x subject to the constraint c(x) = x · x − 1 which says that the vector x is of unit length. We form the function L(x, λ) = xT A x − λ (x · x − 1)
(1.272)
and since the matrix A is real and symmetric, we find its unconstrained extrema as ∇L(x, λ) = 2A x − 2λ x = 0
and x · x = 1.
(1.273)
The extrema of f (x) = xT A x subject to the constraint c(x) = x · x − 1 are the normalized eigenvectors A x = λ x and x · x = 1.
(1.274)
of the real, symmetric matrix A.
1.26 Eigenvectors and Invariant Subspaces Let A be a linear operator that maps vectors |vi in a vector space S into vectors in the same space. If T ⊆ S is a subspace of S, and if the vector A|ui is in T whenever |ui is in T , then T is an invariant subspace of S. The whole space S is a trivial invariant subspace of S, as is the null set ∅. If T ⊆ S is a one-dimensional invariant subspace of S, then A maps each vector |ui ∈ T into another vector λ|ui ∈ T , that is A|ui = λ|ui.
(1.275)
In this case, we say that |ui is an eigenvector of A with eigenvalue λ. (The German adjective eigen means own, proper, singular.) Example: The matrix equation cos θ sin θ 1 1 ±iθ =e (1.276) − sin θ cos θ ±i ±i tells us that the eigenvectors of this 2 × 2 orthogonal matrix are the 2-tuples (1, ±i) with eigenvalues e±iθ . Problem 18 is to show that the eigenvalues λ of a unitary (and hence of an orthogonal) matrix are unimodular, |λ| = 1.
42
Linear Algebra
Example: Let us consider the eigenvector equation N X
Aik Vk = λVi
(1.277)
k=1
for a matrix A that is anti-symmetric Aik = −Aki . The anti-symmetry of A implies that N X
Vi Aik Vk = 0.
(1.278)
i,k=1
Thus the last two relations imply that 0=
N X
Vi Aik Vk = λ
N X
Vi2 = 0.
(1.279)
i=1
i,k=1
Thus either the eigenvalue λ or the dot-product of the eigenvector with itself vanishes. Problem 19 is to show that the sum of the eigenvalues of an anti-symmetric matrix vanishes.
1.27 Eigenvalues of a Square Matrix Let A be an N × N matrix with complex entries Aik . A non-zero N dimensional vector V with entries Vk is an eigenvector of the matrix A with eigenvalue λ if A|V i = λ|V i ⇐⇒ AV = λ V ⇐⇒
N X
Aik Vk = λ Vi .
(1.280)
k=1
Every N × N matrix A has N eigenvectors V (`) and eigenvalues λ` AV (`) = λ` V (`)
(1.281)
for ` = 1 . . . N . To see why, we write the top equation (1.280) as N X
(Aik − λ δik ) Vk = 0
(1.282)
(A − λ I) V = 0
(1.283)
k=1
or in matrix notation as
in which I is the N × N matrix with entries Iik = δik . These equivalent
43
1.27 Eigenvalues of a Square Matrix
equations (1.282 & 1.283) say that the columns of the matrix A − λI, considered as vectors, are linearly dependent, as defined in section 1.8. We saw in section 1.16 that the columns of a matrix, A − λI, are linearly dependent if and only if the determinant |A − λI| vanishes. Thus a non-zero solution of the eigenvalue equation (1.280) exists if and only if the determinant det (A − λI) = |A − λI| = 0
(1.284)
vanishes. This requirement that the determinant of A − λI vanish is called the characteristic equation. For an N × N matrix A, it is a polynomial equation of the N th degree in the unknown eigenvalue λ |A − λI| ≡ P (λ, A) = |A| + · · · + (−1)N −1 λN −1 TrA + (−1)N λN =
N X
pk λk = 0
(1.285)
k=0
in which p0 = |A|, pN −1 = (−1)N −1 TrA, and pN = (−1)N . (All the pk ’s are basis independent.) By the fundamental theorem of algebra, proved in Sec. 5.9, the characteristic equation always has N roots or solutions λ` lying somewhere in the complex plane. Thus, the characteristic polynomial has the factored form P (λ, A) = (λ1 − λ)(λ2 − λ) . . . (λN − λ).
(1.286)
For every root λ` , there is a non-zero eigenvector V (`) whose components (`) Vk are the coefficients that make the N vectors Aik − λ` δik that are the columns of the matrix A − λ` I sum to zero in (1.282). Thus, every N × N matrix has N eigenvalues λ` and N eigenvectors V (`) . Setting λ = 0 in the factored form (1.286) of P (λ, A) and in the characteristic equation (1.285), we see that the determinant of every N × N matrix is the product of its N eigenvalues P (0, A) = |A| = p0 = λ1 λ2 . . . λN .
(1.287)
These N roots usually are all different, and when they are, the eigenvectors V are linearly independent. This result is trivially true for N = 1. Let’s assume its validity for N − 1 and deduce it for the case of N eigenvectors. If it were false for N eigenvectors, then there would be N numbers c` , not all zero, such that (`)
N X `=1
c` V (`) = 0.
(1.288)
44
Linear Algebra
We now multiply this equation from the left by the linear operator A and use the eigenvalue equation (1.281) A
N X `=1
c` V
(`)
=
N X
c` A V
`=1
(`)
=
N X
c` λ` V (`) = 0.
(1.289)
`=1
On the other hand, the product of equation (1.288) multiplied by λN is N X
c` λN V (`) = 0.
(1.290)
`=1
When we subtract (1.290) from (1.289), the terms with ` = N cancel leaving N −1 X
c` (λ` − λN ) V (`) = 0
(1.291)
`=1
in which all the factors (λ` − λN ) are different from zero since by assumption all the eigenvalues are different. But this last equation says that N −1 eigenvectors with different eigenvalues are linearly dependent, which contradicts our assumption that the result holds for N − 1 eigenvectors. This contradiction tells us that if the N eigenvectors of an N × N square matrix have different eigenvalues, then they are linearly independent. An eigenvalue λ` that is a single root of the characteristic equation (1.285) is associated with a single eigenvector; it is called a simple eigenvalue. An eigenvalue λ` that is an nth root of the characteristic equation is associated with n eigenvectors; it is said to be an n-fold degenerate eigenvalue or to have algebraic multiplicity n. Its geometric multiplicity is the number n0 ≤ n of linearly independent eigenvectors with eigenvalue λ` . A matrix whose eigenvectors are linearly dependent is said to be defective. Example: The 2 × 2 matrix 0 1 (1.292) 0 0 has only one linearly independent eigenvector (1, 0)T and so is defective. Suppose A is an N × N matrix that is not defective. We may use its N linearly independent eigenvectors V (`) = |`i to define the columns of an N × N matrix S as Sk` = hk, 0|`i
(1.293)
in which the vectors |k, 0i are the basis in which Aik = hi, 0|A|k, 0i. The
45
1.28 A Matrix Obeys Its Characteristic Equation
inner product of the eigenvalue equation AV (`) = λ` V (`) with the bra hi, 0| is hi, 0|A|`i = hi, 0|A
N X
|k, 0ihk, 0|`i =
k=1
N X
Aik Sk` = λ` Si` .
(1.294)
k=1
Since the columns of S are linearly independent, the determinant of S does not vanish—the matrix S is nonsingular—and so its inverse S −1 is welldefined by (1.198). It follows that N X
S −1
A S = ni ik k`
N X
λ` S −1
ni
Si` = an δn` = λ`
(1.295)
i=1
i,k=1
or in matrix notation S −1 AS = A(d)
(1.296)
in which A(d) is the diagonal form of the matrix A with its eigenvalues λ` arranged along its main diagonal and zeros elsewhere. Equation (1.296) is a similarity transformation. Any nondefective square matrix can be diagonalized by a similarity transformation A = SA(d) S −1 .
(1.297)
By using the product rule (1.208), we see that the determinant of any nondefective square matrix is the product of its eigenvalues |A| = |SA(d) S −1 | = |S| |A(d) | |S −1 | = |SS −1 | |A(d) | = |A(d) | =
N Y
λ`
`=1
(1.298) which is a special case of (1.287).
1.28 A Matrix Obeys Its Characteristic Equation Every square matrix obeys its characteristic equation (1.285). That is, the characteristic equation P (λ, A) = |A − λI| =
N X
pk λk = 0
(1.299)
k=0
remains true when the matrix A replaces the unknown variable λ P (A, A) =
N X k=0
pk Ak = 0.
(1.300)
46
Linear Algebra
To see why, we recall the formula (1.198) for the inverse of the matrix A − λI C(λ, A)T (1.301) (A − λI)−1 = |A − λI| in which C(λ, A)T is the transpose of the matrix of cofactors of the matrix A − λI. Since the determinant |A − λI| is the characteristic polynomial P (λ, A), we have rearranging (A − λI)C(λ, A)T = P (λ, A)I.
(1.302)
The transpose of the matrix of cofactors of the matrix A−λI is a polynomial in λ with matrix coefficients C(λ, A)T = C0 + C1 λ + · · · + CN −1 λN −1 .
(1.303)
The LHS of equation (1.302) is then (A − λI)C(λ, A)T = AC0 + (AC1 − C0 )λ + (AC2 − C1 )λ2 + . . . + (ACN −1 − CN −2 )λN −1 − CN −1 λN .
(1.304)
Equating equal powers of λ on both sides of (1.302), we have using (1.299) and (1.304), we have AC0 = p0 I AC1 − C0 = p1 I AC2 − C1 = p2 I ... = ...
(1.305)
ACN −1 − CN −2 = pN −1 I −CN −1 = pN I. We now multiply on the left the first of these equations by I, the second by A, the third by A2 , . . . , and the last by AN and then add the resulting equations. All the terms on the left-hand sides cancel, while the sum of those on the right give P (A, A). Thus the matrix A obeys its characteristic equation 0=
N X
pk Ak = |A| I +p1 A+· · ·+(−1)N −1 (TrA) AN −1 +(−1)N AN (1.306)
k=0
a result known as the Cayley-Hamilton theorem (Arthur Cayley, 1821– 1895, and William Hamilton, 1805–1865). This derivation is due to Israel Gelfand (1913–2009) (Gelfand, 1961, pp. 89–90).
1.29 Functions of Matrices
47
Because every N × N matrix A obeys its characteristic equation, its N th power AN can be expressed as a linear combination of its lesser powers AN = (−1)N −1 |A| I + p1 A + p2 A2 + · · · + (−1)N −1 (TrA) AN −1 . (1.307) 2 Thus the square A of every 2 × 2 matrix is given by A2 = −|A|I + (TrA)A.
(1.308)
Example 1.4 (Spin-one-half rotation matrix) If θ is a real 3-vector and σ is the 3-vector of Pauli matrices (1.35), then the square of the traceless 2 × 2 matrix A = θ · σ is θ3 θ1 − iθ2 2 (θ · σ) = − I = θ2 I (1.309) θ1 + iθ2 −θ3 in which θ2 = θ · θ. One may use this identity to show (problem (20)) that exp (−iθ · σ/2) = cos(θ/2) − iθˆ · σ sin(θ/2)
(1.310)
in which θˆ is a unit 3-vector. This matrix represents a right-handed rotation of θ radians about the axis θˆ for a spin-one-half object. 1.29 Functions of Matrices What sense can we make of a function f of an N × N matrix A? and how would we compute it? One way is to use the characteristic equation (1.307) to express every power of A in terms of I, A, . . . , AN −1 and the coefficients p0 = |A|, p1 , p2 , . . . , pN −2 , and pN −1 = (−1)N −1 TrA. Then if f (x) is a polynomial or a function with a convergent power series f (x) =
∞ X
ck xk
(1.311)
k=0
in principle we may express f (A) in terms of N functions fk (p) of the coefficients p ≡ (p0 , . . . , pN −1 ) as f (A) =
N −1 X
fk (p) Ak .
(1.312)
k=0
The identity (1.310) for exp (−iθ · σ/2) is an example of this technique for N = 2. which can become challenging for N > 3. Example: In problem (21), one finds the characteristic equation (1.306) for the 3×3 matrix −iθ · J in which the generators are (Jk )ij = iikj and ijk is totally antisymmetric with 123 = 1. These generators satisfy the
48
Linear Algebra
commutation relations [Ji , Jj ] = iijk Jk in which sums over repeated indices from 1 to 3 are understood. In problem (22), one uses this characteristic equation to show that the 3×3 real orthogonal matrix exp(−iθ · J ), which ˆ is represents a right-handed rotation by θ radians about the axis θ, ˆ θ) ˆT exp(−iθ · J ) = cos θ I − iθˆ · J sin θ + (1 − cos θ) θ(
(1.313)
or in terms of indices exp(−iθ · J )ij = δij cos θ − sin θ ijk θˆk + (1 − cos θ) θˆi θˆj .
(1.314)
Direct use of the characteristic equation can become unwieldy for larger values of N . Fortunately, another trick is available if A is a non-defective square matrix, and if the power series (1.311) for f (x) converges. For then A is related to its diagonal form A(d) by a similarity transformation (1.297), and we may define f (A) as f (A) = Sf (A(d) )S −1 in which f (A(d) ) is the diagonal matrix with f (a1 ) 0 0 f (a2 ) f (A(d) ) = . .. . . . 0
0
(1.315)
entries f (a` ) 0 ... 0 ... .. .. . . ...
(1.316)
f (aN )
the a` ’s being the eigenvalues of the matrix A. This definition makes sense because we’d expect f (A) to be f (A) =
∞ X n=0
n
cn A =
∞ X
n cn SA(d) S −1 .
(1.317)
n=0
n n But since S −1 S = I, we have SA(d) S −1 = S A(d) S −1 and so "∞ # n X (d) f (A) = S cn A S −1 = Sf (A(d) )S −1 (1.318) n=0
which is (1.315). Example: In quantum mechanics, the time-evolution operator is taken to be the exponential exp(−iHt/~) in which H = H † is a hermitian linear operator, the hamiltonian, named after William Rowan Hamilton (1805– 1865), and ~ = h/(2π) ≈ 10−34 Js where h is the constant named after Max Planck (1858–1947). As we’ll see in the next section, hermitian operators
1.29 Functions of Matrices
49
are never defective, so H can be diagonalized by a similarity transformation H = SH (d) S −1 .
(1.319)
The diagonal elements of the diagonal matrix H (d) are the energies E` of the states of the system described by the hamiltonian H. The time-evolution operator U (t) then is U (t) = S exp(−iH (d) t/~) S −1 .
(1.320)
If the system has three states with angular frequencies ωi = Ei /~, then U (t) is −iω1 t e 0 0 S −1 U (t) = S 0 (1.321) e−iω2 t −iω t 3 0 0 e in which the angular frequencies are ω` = E` /~. Example: For a system described by the density operator ρ, the entropy S is defined as the trace S = −k Tr (ρ ln ρ)
(1.322)
in which k = 1.38 × 10−23 J/K is the constant named after Ludwig Boltzmann (1844–1906). The density operator ρ is hermitian, non-negative, and of unit trace. Since ρ is hermitian, the matrix that represents it is never defective, and so that matrix can be diagonalized by a similarity transformation ρ = S ρ(d) S −1 .
(1.323)
Thus since the trace is cyclic (1.27), we may compute the entropy as S = −kTr S ρ(d) S −1 S ln(ρ(d) ) S −1 = −kTr ρ(d) ln(ρ(d) ) . (1.324) (d)
A vanishing eigenvalue ρk = 0 contributes nothing to this trace since limx→0 x ln x = 0. If the system has three states, populated with probabilities ρi , the elements of ρ(d) , then the entropy is S = −k (ρ1 ln ρ1 + ρ2 ln ρ2 + ρ3 ln ρ3 ) = k [ρ1 ln (1/ρ1 ) + ρ2 ln (1/ρ2 ) + ρ3 ln (1/ρ3 )] .
(1.325)
50
Linear Algebra
1.30 Hermitian Matrices Hermitian matrices have especially nice properties. By definition (1.33), a hermitian matrix A is square and unchanged by hermitian conjugation A† = A. Since it is square, the results of section 1.27 ensure that an N × N hermitian matrix A has N eigenvectors |ni with eigenvalues an A|ni = an |ni.
(1.326)
In fact, these eigenvalues are all real. To see why, we form the adjoint of equation (1.326) hn|A† = a∗n hn|
(1.327)
and use the property A† = A to find hn|A† = hn|A = a∗n hn|.
(1.328)
We now form the inner product of both sides of this equation with the ket |ni and use the eigenvalue equation (1.326) to get hn|A|ni = an hn|ni = a∗n hn|ni
(1.329)
which tells us that the eigenvalues are real a∗n = an .
(1.330)
Since A† = A, the matrix elements of A between two of its eigenvectors satisfy a∗m hm|ni = (am hn|mi)∗ = hn|A|mi∗ = hm|A† |ni = hm|A|ni = an hm|ni (1.331) which implies that (a∗m − an ) hm|ni = 0.
(1.332)
But since the all the eigenvalues of the hermitian matrix A are real, we have (am − an ) hm|ni = 0.
(1.333)
This equation tells us that when the eigenvalues are different, then the eigenvectors are orthogonal. In the absence of a symmetry, all n eigenvalues usually are different, and so the eigenvectors usually are mutually orthogonal. When two or more eigenvectors |nα i of a hermitian matrix have the same eigenvalue an , their eigenvalues are said to be degenerate. In this case, any
1.30 Hermitian Matrices
51
linear combination of the degenerate eigenvectors will also be an eigenvector with the same eigenvalue an ! ! X X A cα |nα i = an cα |nα i (1.334) α∈D
α∈D
where D is the set of labels α of the eigenvectors with the same eigenvalue. If the degenerate eigenvectors |nα i are linearly independent, then we may use the Gramm-Schmidt procedure (1.110–1.122) to choose the coefficients cα so as to construct degenerate eigenvectors that are normalized and orthogonal to each other and to the non-degenerate eigenvectors. We then may normalize these mutually orthogonal eigenvectors. But two related questions arise: Are the degenerate eigenvectors |nα i linearly independent? And if so, what orthonormal linear combinations of them should we choose for a given physical problem? Let’s consider the second question first. We saw in Sec. 1.22 that unitary transformations preserve the orthonormality of a basis. Any unitary transformation that commutes with the matrix A [A, U ] = 0
(1.335)
maps each set of orthonormal degenerate eigenvectors of A into another set of orthonormal degenerate eigenvectors of A with the same eigenvalue because AU |nα i = U A|nα i = an U |nα i.
(1.336)
So there’s a huge spectrum of choices for the orthonormal degenerate eigenvectors of A with the same eigenvalue. What is the right set for a given physical problem? A sensible way to proceed is to add to the matrix A a second hermitian matrix B multiplied by a tiny, real scale factor A() = A + B.
(1.337)
The matrix B must completely break whatever symmetry led to the degeneracy in the eigenvalues of A. Ideally, the matrix B should be one that represents a modification of A that is physically plausible and relevant to the problem at hand. The hermitian matrix A() then will have N different eigenvalues an () and N orthonormal non-degenerate eigenvectors A()|nβ , i = anβ ()|nβ , i.
(1.338)
52
Linear Algebra
These eigenvectors |nβ i of A() are orthogonal to each other hnβ , |nβ 0 i = δβ,β 0
(1.339)
and to the eigenvectors of A() with other eigenvalues, and they remain so as we take the limit |ni = lim |n, i. →0
(1.340)
We may choose them as the orthogonal degenerate eigenvectors of A. Since one always may find a crooked hermitian matrix B that breaks any particular symmetry, it follows that every N × N hermitian matrix A possesses N orthonormal eigenvectors, which are complete in the vector space in which A acts. (Any N linearly independent vectors span their N -dimensional vector space, as explained in section 1.9.) Now let’s return to the first question and show that an N × N hermitian matrix has N orthogonal eigenvectors. To do this, we’ll first show that the space of vectors orthogonal to an eigenvector |ni of a hermitian operator A A|ni = λ|ni
(1.341)
is invariant under the action of A. We must show that if |yi is any vector orthogonal to the eigenvector |ni hn|yi = 0
(1.342)
then A|yi also is orthogonal to |ni, that is, hn|A|yi = 0. We use successively the definition of A† , the hermiticity of A, the eigenvector equation (1.341), the definition of the inner product, and the reality of the eigenvalues of a hermitian matrix: ¯ hn|A|yi = hA† n|yi = hAn|yi = hλn|yi = λhn|yi = λhn|yi = 0.
(1.343)
Thus the space of vectors orthogonal to an eigenvector of a hermitian operator is invariant under it. Now a hermitian operator A acting on an N -dimensional vector space S is represented by an N × N hermitian matrix, and so it has at least one eigenvector |1i. The subspace of S consisting of all vectors orthogonal to |1i is an (N − 1)-dimensional vector space SN −1 that is invariant under the action of A. On this space SN −1 , the operator A is represented by an (N − 1)×(N −1) hermitian matrix AN −1 . This matrix has at least one eigenvector |2i. The subspace of SN −1 consisting of all vectors orthogonal to |2i is an (N − 2)-dimensional vector space SN −2 that is invariant under the action of A. On SN −2 , the operator A is represented by an (N −2)×(N −2) hermitian matrix AN −2 which has at least one eigenvector |3i. By construction, the
1.30 Hermitian Matrices
53
vectors |1i, |2i, and |3i are mutually orthogonal. Continuing in this way, we see that A has N orthogonal eigenvectors |ki for k = 1, 2, . . . , N . The N orthogonal eigenvectors |ki of an N × N matrix A can be normalized and used to write the N × N identity operator I as I=
N X
|kihk|.
(1.344)
k=1
On multiplying from the left by the matrix A, we find A = AI = A
N X
|kihk| =
k=1
N X
ak |kihk|
(1.345)
k=1
which is the diagonal form of the hermitian matrix A. This expansion of A as a sum over outer products of its eigenstates multiplied by their eigenvalues is important in quantum mechanics. The expansion represents the possible selective, non-destructive measurements of the physical quantity represented by the matrix A. The hermitian matrix A is diagonal in the basis provided by its eigenstates |ki Akj = hk|A|ji = ak δkj .
(1.346)
But in any other basis |`, oi, the matrix A appears as Ai` = hi, o|A|`, oi =
N X hi, o|kiak hk|`, oi.
(1.347)
k=1
The linear operator U=
N X
|kihk, o|
(1.348)
k=1
is unitary because it maps the arbitrary orthonormal basis |k, oi into the orthonormal basis of eigenstates |ki. In the |k, oi basis, U is the matrix whose nth column is the N -tuple hi, o|ni that represents |ni in the basis |i, oi Uin = hi, o|U |n, oi = hi, o|ni.
(1.349)
So equation (1.347) tells us that an arbitrary N × N hermitian matrix A can be diagonalized by a unitary transformation A = U A(d) U † . (d)
Here A(d) is the diagonal matrix Anm = am δnm .
(1.350)
54
Linear Algebra
A matrix that is real and symmetric is hermitian; so is one that is imaginary and antisymmetric. A real, symmetric matrix R can be diagonalized by an orthogonal transformation R = O R(d) OT
(1.351)
in which the matrix O is a real unitary matrix, that is, an orthogonal matrix (1.258). Example: Suppose we wish to find the eigenvalues of the real, symmetric mass matrix 0 m M= (1.352) m M in which m is an ordinary mass and M is a huge mass. The eigenvalues µ of this hermitian mass matrix satisfy the equation det (M − µI) = µ(µ − M ) − m2 = 0
(1.353)
with solutions p 1 M ± M 2 + 4m2 . 2 The larger mass µ+ is approximately the huge mass M µ± =
µ+ ≈ M +
m2 M
(1.354)
(1.355)
and the smaller mass µ− is very tiny m2 . (1.356) M The physical mass of a fermion is the absolute value of its mass parameter, so the negative value of this tiny mass is not a problem. The product of the two eigenvalues is exactly the constant µ− ≈ −
µ+ µ− = det M = −m2
(1.357)
so as µ− goes down, µ+ must go up. In 1975, Murray Gell-Mann and Jerry Stephenson invented this “see-saw” mechanism as an explanation of why neutrinos have such small masses, less than 1 eV/c2 . If mc2 = 10 MeV, and µ− c2 ≈ 0.01 eV, which is a plausible light-neutrino mass, then the rest energy of the huge mass would be M c2 = 107 GeV. This huge mass would come from new physics, beyond the standard model. But if we knew how to combine relativity and quantum mechanics without the silly infinities of quantum field theory, then we might try to explain
55
1.31 Normal Matrices
the small masses of the neutrinos by the weakness of their interactions with themselves and with other particles without any reference to an energy scale three orders of magnitude beyond LHC energies. If we return to the orthogonal transformation (1.351) q and multiply column
` of the matrix O and row ` of the matrix OT by the congruency transformation
(d)
|R` |, then we arrive at
ˆ (d) C T R=CR
(1.358)
ˆ (d) are either ±1 or 0 because the matrices in which the diagonal entries R ` (d) C and C T have absorbed the moduli |R` |. Some call this result Sylvester’s theorem. Example: If G is a real, symmetric 4 × 4 matrix then there’s a real 4 × 4 matrix D = C T−1 such that g1 0 0 0 0 g2 0 0 Gd = DT GD = (1.359) 0 0 g3 0 0
0
0
g4
in which the diagonal entries gi are all ±1 or 0. In particular, there’s a real 4 × 4 matrix D that casts the non-singular metric gik of spacetime into the form −1 0 0 0 0 1 0 0 gd = DT gD = (1.360) 0 0 1 0 0 0 0 1 at any particular point of spacetime. (In general, one needs different Ds at different points.)
1.31 Normal Matrices The largest set of matrices that can be diagonalized by a unitary transformation is the set of normal matrices. These are square matrices that commute with their (hermitian) adjoints [A, A† ] = AA† − A† A = 0.
(1.361)
This broad class of matrices includes not only all hermitian matrices but also all unitary ones since [U, U † ] = U U † − U † U = I − I = 0.
(1.362)
56
Linear Algebra
To see why a normal matrix can be diagonalized by a unitary transformation, let us consider an N × N normal matrix V which (since it is square (section 1.27)) has N eigenvectors |ni with eigenvalues vn (V − vn I) |ni = 0.
(1.363)
The square of the norm (1.81) of this vector must vanish k (V − vn I) |ni k2 = hn| (V − vn I)† (V − vn I) |ni = 0.
(1.364)
But since V is normal, we also have hn| (V − vn I)† (V − vn I) |ni = hn| (V − vn I) (V − vn I)† |ni (1.365) = hn| (V − vn I) V † − vn∗ I |ni = 0. So the square of the norm (1.81) of the vector V † − vn∗ I |ni also must vanish hn| (V − vn I)† V † − vn∗ I |ni =k V † − vn∗ I |ni k2 = 0 (1.366) which tells us that |ni also is an eigenvector of V † with eigenvalue vn∗ V † |ni = vn∗ |ni.
(1.367)
Suppose now that |ni and |mi are eigenvectors of V with eigenvalues vn and vm so that (V − vn ) |ni = (V − vm ) |mi = 0.
(1.368)
hm|V |ni = vn hm|ni
(1.369)
Then ∗ |mi, and the adjoint of this equation is But by (1.367), we have V † |mi = vm
hm|V = vm hm|
(1.370)
hm|V |ni = vm hm|ni.
(1.371)
from which we may infer that
Thus by combining (1.369 & 1.370), we see that (vn − vm ) hm|ni = 0.
(1.372)
Thus any two eigenvectors of a normal matrix V with different eigenvalues must be orthogonal. Usually, all N eigenvalues of an N ×N normal matrix are different. In this case, all the eigenvectors are orthogonal and may be individually normalized.
1.32 Normal-Matrix Identities
57
But even when a set D of eigenvectors has the same (degenerate) eigenvalue, one may use the argument of Eqs.(1.334–1.340) to pick a suitable set of orthonormal eigenvectors with that eigenvalue. Thus every N × N normal matrix has N orthonormal eigenvectors. One may then apply the argument of equations (1.344–1.350) to show that every N × N normal matrix V can be diagonalized by an N × N unitary matrix U V = U V (d) U †
(1.373)
whose nth column is the eigenvector |ni in the basis |i, 0i Uin = hi, 0|ni
(1.374)
as in (1.349).
1.32 Normal-Matrix Identities One may show that the determinant |V | of a normal matrix V satisfies these three identities: |V | = exp [Tr(ln V )] ln |V | = Tr(ln V ) δ ln |V | = Tr V
−1
(1.375) (1.376)
δV .
(1.377)
1.33 Tricks with Dirac Notation Suppose the eigenstates |ni of a matrix A A|ni = an |ni
(1.378)
are complete and orthonormal. In Sec. 1.31, we saw that this is true for normal matrices. Since the states |ni are complete and orthonormal, the identity operator I can be written as I=
N X
|nihn|.
(1.379)
n=1
The product AI is A itself, so A = AI = A
N X n=1
|nihn| =
N X n=1
an |nihn|.
(1.380)
58
Linear Algebra
It follows therefore that if f is a function, then f (A) is f (A) =
N X
f (an ) |nihn|
(1.381)
n=1
which is simpler than the expression (1.315) for an arbitrary non-defective matrix. This is a good way to think about normal matrices and operators. Example: How do we make sense of the operator exp(−iHt/~) that translates states in time by t? We assume that the hamiltonian H is hermitian (and so normal) with orthonormal eigenstates |ni with energies En H|ni = En |ni.
(1.382)
Then we apply (1.381) with A → H and find e−iHt/~ =
N X
e−iEn t/~ |nihn|.
(1.383)
n=1
So the time evolution of any state |ψi is e−iHt/~ |ψi =
N X
e−iEn t/~ |nihn|ψi.
(1.384)
n=1
1.34 Compatible Normal Matrices Suppose B is a linear operator that commutes with A [A, B] ≡ AB − BA = 0.
(1.385)
Then B maps every eigenvector |ui of A A|ui = z|ui
(1.386)
into another eigenvector B|ui of A with the same eigenvalue z. To see why, we multiply both sides of Eq.(1.386) from the left by B BA|ui = Bz|ui
(1.387)
and use the fact that A and B commute A (B|ui) = z (B|ui) .
(1.388)
Two normal matrices A and B that commute [A, B] = 0
(1.389)
1.34 Compatible Normal Matrices
59
are said to be compatible. We have seen that any normal matrix A can be written as a sum (1.380) of outer products A=
N X
|an ian han |
(1.390)
n=1
of its orthonormal eigenvectors |an i which are complete in the N -dimensional vector space S on which A acts. Suppose now that the eigenvalues an of A are non-degenerate, and that B is another normal matrix acting on S and that the matrices A and B are compatible. Then in the basis provided by the eigenvectors (or eigenstates) |an i of the matrix A, the matrix B must satisfy 0 = han |AB − BA|ak i = (an − ak ) han |B|ak i
(1.391)
which says that han |B|ak i is zero unless an = ak . Since the eigenvalues an of the operator A are assumed to be non-degenerate, the operator B is diagonal B = IBI =
N X n=1
|an ihan |B
N X
|ak ihak | =
N X
|an ihan |B|an ihan |
(1.392)
n=1
k=1
in the |an i basis. Moreover, B maps each eigenket |ak i of A into B|ak i =
N X n=1
|an ihan |B|an ihan |ak i =
N X
|an ihan |B|an iδnk = hak |B|ak i|ak i
n=1
(1.393) which says that each eigenvector |ak i of the matrix A is also an eigenvector of the matrix B with eigenvalue hak |B|ak i. Thus, two compatible normal matrices can be simultaneously diagonalized if one of them has nondegenerate eigenvalues. If A’s eigenvalues an are degenerate, each eigenvalue an may have dn orthonormal eigenvectors |an , ki for k = 1, . . . , dn . In this case, the matrix elements han , k|B|am , k 0 i of B are zero unless the eigenvalues are the same, an = am . The matrix representing the operator B in this basis consists of square, dn × dn , normal submatrices han , k|B|an , k 0 i arranged along its main diagonal; it is said to be in block-diagonal form. Since each submatrix is a dn × dn , normal matrix, we may find linear combinations |an , bk i of the degenerate eigenvectors |an , ki that are orthonormal eigenvectors of both compatible operators A|an , bk i = an |an , bk i B|an , bk i = bk |an , bk i.
(1.394)
60
Linear Algebra
The operators A and B can be diagonalized simultaneously. The converse is also true and is seen more easily: If the operators A and B can be simultaneously diagonalized as in (1.394), then they commute AB|an , bk i = Abk |an , bk i = an bk |an , bk i = an B|an , bk i = BA|an , bk i (1.395) and so are compatible. Thus, normal matrices can be simultaneously diagonalized if and only if they are compatible. In quantum mechanics, compatible hermitian operators represent physical observables that can be measured simultaneously to arbitrary precision (in principle). A set of compatible hermitian operators {A, B, C, . . . } is said to be complete if to every set of eigenvalues {an , bk , c` , . . . } there is only a single eigenvector |an , bk , c` , . . .i. Example 1.5 (Compatible Photon Observables) For example, state of a photon is completely characterized (as far as we know) by its momentum and by its angular momentum about its direction of motion. For a photon then, the momentum operator P and the dot-product J · P of the angular momentum J with the momentum form a complete set of compatible hermitian observables. Incidentally, because its mass is zero, the angular momentum J of a photon about its direction of motion can have only two values ±~, which correspond to its two possible states of circular polarization.
Example 1.6 (Thermal Density Operator) A density operator ρ is the most general description of a quantum-mechanical system. It is hermitian, positive, and of unit trace. Since it is hermitian, it can be diagonalized (section 1.30) X ρ= |nihn|ρ|nihn| (1.396) n
and its eigenvalues ρn = hn|ρ|ni are real. Each ρn is the probability that the system is in the state |ni, and so is non-negative. The unit-trace rule X ρn = 1. (1.397) n
ensures that these probabilities add up to one. The mean value of an operator F is the trace hF i = Tr(ρF ). So the mean energy E, which is the mean value of the hamiltonian H, is the trace E = hHi = Tr(ρH).
(1.398)
61
1.34 Compatible Normal Matrices
The entropy operator S is the negative logarithm of the density operator multiplied by Boltzmann’s constant S = −k ln ρ, and the mean entropy S is S = hSi = −kTr(ρ ln ρ).
(1.399)
A density operator that describes a system in thermal equilibrium at a constant temperature T is time independent and so commutes with the hamiltonian, [ρ, H] = 0. Since ρ and H commute, they are compatible operators (1.389), and so they can be simultaneously diagonalized. Each eigenstate |ni of ρ is an eigenstate of H; its energy En is its eigenvalue, H|ni = En |ni. If we have no information about the state of the system other than its mean energy E, then we take ρ to be the density operator that maximizes the mean entropy S while respecting the constraints X ρn − 1 = 0 c1 = n
c2 = Tr(ρH) − E = 0.
(1.400)
We introduce two Lagrange multipliers (section 1.25) and maximize the unconstrained function L(ρ, λ1 , λ2 ) = S − λ1 c1 − λ2 c2 (1.401) " # " # X X X = −k ρn ln ρn − λ1 ρn − 1 − λ2 ρn En − E n
n
n
by setting its derivatives with respect to ρn , λ1 , and λ2 equal to zero ∂L = −k (ln ρn + 1) − λ1 − λ2 En = 0 ∂ρn X ∂L = ρn − 1 = 0 ∂λ1 n X ∂L = ρn En − E = 0. ∂λ2 n
(1.402) (1.403) (1.404)
The first (1.402) of these conditions implies that ρn = exp [−(λ1 + λ2 En + k)/k]
(1.405)
We satisfy the second condition (1.403) by choosing λ1 so that exp(−λ2 En /k) ρn = P . n exp(−λ2 En /k)
(1.406)
62
Linear Algebra
Setting λ2 = 1/T , we define the temperature T so that ρ satisfies the third condition (1.404). Its eigenvalue ρn then is exp(−En /kT ) ρn = P . n exp(−En /kT )
(1.407)
In terms of the inverse temperature β ≡ 1/(kT ), the density operator is ρ=
e−βH Tr (e−βH )
(1.408)
a form known as the Boltzmann distribution.
1.35 The Singular-Value Decomposition Suppose A is a linear operator that maps vectors in an N -dimensional vector space VN into vectors in an M -dimensional vector space VM . The spaces VN and VM will have infinitely many orthonormal bases {|n, ai ∈ VN } and {|m, bi ∈ VM } labeled by continuous parameters a and b. Each pair of bases provides a resolution of the identity operator IN for VN and IM for VM IN =
N X
|n, aihn, a|
and IM =
n=1
M X
|m, bihm, b|.
(1.409)
m=1
These identity operators give us many ways of writing the linear operator A M X N X A = IM AIN = |m, bihm, b|A|n, aihn, a| (1.410) m=1 n=1
in which the hm, b|A|n, ai are the elements of a complex M × N matrix. The singular-value decomposition of the linear operator A is a choice among all these expressions for IN and IM that expresses A as min(M,N )
A=
X
|Ui iSi hVi |
(1.411)
i=1
in which the min(M, N ) singular values Si are non-negative Si ≥ 0.
(1.412)
Let’s use the notation |Ani ≡ A|ni for the image of a vector |ni in an orthonormal basis {|ni} of VN under the map A. We seek a special orthonormal basis {|ni} of VN that has the property that the vectors |Ani are
63
1.35 The Singular-Value Decomposition
orthogonal. It turns out that this special basis {|ni} of VN consists of the N orthonormal eigenstates of the N × N (non-negative) hermitian operator A† A A† A|ni = en |ni.
(1.413)
For since A|n0 i = |An0 i and A† A|ni = en |ni, it follows that hAn0 |Ani = hn0 |A† A|ni = en hn0 |ni = en δn0 n
(1.414)
which shows that the vectors |Ani are orthogonal and that their eigenvalues en = hAn|Ani are non-negative. This is the key to the singular-value decomposition. If N = M , so that matrices hm, b|A|n, ai representing the linear operator A are square, then the N = M singular values Sn are the non-negative square-roots of the eigenvalues en p √ Sn = en = hAn|Ani ≥ 0. (1.415) We therefore may normalize each vector |Ani whose singular value Sn is positive as 1 |Ani for Sn > 0 (1.416) |mn i = Sn so that the vectors {|mn i} with positive singular values are orthonormal hmn0 |mn i = δn0 ,n .
(1.417)
If only P < N of the singular values are positive, then we may augment this set of P vectors {|mn i} with N − P = M − P new normalized vectors |mn0 i that are orthogonal to each other and to P vectors defined by Eq. (1.416) (with positive singular values Sn > 0) so that the set of N = M vectors {|mn i, |mn0 }i are complete in the space VM =N . When N > M , the matrix A maps the N -dimensional space VN into the smaller M -dimensional space VM , and so A must annihilate N − M basis vectors A|n0 i = 0
for M < n0 ≤ N.
(1.418)
In this case, there are only M singular values Sn of which Z may be zero. The Z vectors |Ani = A|ni with vanishing Sn ’s are vectors of length zero; for these values of n, the matrix A maps the vector |ni to the zero vector. If there are more than N − M zero-length vectors |Ani = A|ni, then we must replace the extra ones by new normalized vectors |mn0 i that are orthogonal to each other and to the vectors defined by Eq. (1.416) so that we have M
64
Linear Algebra
orthonormal vectors in the augmented set {|mn i, |mn0 i}. These vectors then form a basis for VM . When N ≤ M , there are only N singular values Sn of which Z may be zero. If Z of the N Sn ’s do vanish, then one must add Q = Z + M − N new normalized vectors |mn0 i that are orthogonal to each other and to the vectors defined by (1.416) hmn0 |mn i =
1 hmn0 |A|ni = 0 Sn
for n0 > N − Z
and Sn > 0
(1.419)
so that we have M orthonormal vectors in the augmented set {|mn i, |mn0 i}. These vectors then form a basis for VM . In both cases, N > M and M ≥ N , there are min(M, N ) singular values, Z of which may be zero. We may choose the new vectors {|mn0 i} arbitrarily—as long as the augmented set {|mn i, |mn0 i} includes all the vectors defined by (1.416) and forms an orthonormal basis for VM . We now have two special orthonormal bases: the N N -dimensional eigenvectors |ni ∈ VN that satisfy (1.413) and the M M -dimensional vectors |mn i ∈ VM . To make the singular-value decomposition of the linear operator A, we choose as the identity operators IN for the N -dimensional space VN and IM for the M -dimensional space VM the sums IN =
N X
|nihn| and IM =
M X
|mn0 ihmn0 |.
(1.420)
n0 =1
n=1
The singular-value decomposition of A then is A = IM AIN =
M X
|mn0 ihmn0 |A
n0 =1
N X
|nihn|.
(1.421)
n=1
There are min(M, N ) singular values Sn ’s. For the positive singular values, equations (1.414 & 1.416) show that the matrix element hmn0 |A|ni vanishes unless n0 = n 1 hmn0 |A|ni = hAn0 |Ani = Sn0 δn0 n . (1.422) Sn0 For the Z vanishing singular values, equation (1.415) shows that A|ni = 0 and so hmn0 |A|ni = 0.
(1.423)
Thus only the min(M, N ) − Z singular values that are positive contribute to the singular-value decomposition (1.421). If N > M , then there can be at most M non-zero eigenvalues en . If N ≤ M , there can be at most N
1.35 The Singular-Value Decomposition
65
non-zero en ’s. The final form of the singular-value decomposition then is a sum of dyadics weighted by the positive singular values min(M,N )
A=
X
min(M,N )−Z
|mn iSn hn| =
n=1
X
|mn iSn hn|.
(1.424)
n=1
The vectors |mn i and |ni respectively are the left and right singular vectors. The non-negative numbers Sn are the singular values. The linear operator A maps the min(M, N ) right singular vectors |ni into the min(M, N ) left singular vectors Sn |mn i scaled by their singular values A|ni = Sn |mn i
(1.425)
and its adjoint A† maps the min(M, N ) left singular vectors |mn i into the min(M, N ) right singular vectors |ni scaled by their singular values A† |mn i = Sn |ni.
(1.426)
The N -dimensional vector space VN is the domain of the linear operator A. If N > M , then A annihilates (at least) N − M of the basis vectors |ni. The null space or kernel of A is the space spanned by the basis vectors |ni that A annihilates. The vector space spanned by the left singular vectors |mn i with non-zero singular values Sn > 0 is the range or image of A. It follows from the singular value decomposition (1.424) that the dimension N of the domain is equal to the dimension of the kernel N − M plus that of the range M , a result called the rank-nullity theorem. Incidentally, the vectors |mn i are the eigenstates of the hermitian matrix A A† as one may see from the explicit product of the expansion (1.424) with its adjoint min(M,N )
A A† =
X
min(M,N )
|mn iSn hn|
X
|n0 iSn0 hmn0 |
n0 =1
n=1 min(M,N ) min(M,N )
=
X
X
n=1
n0 =1
|mn iSn δnn0 Sn0 hmn0 |
min(M,N )
=
X
|mn iSn2 hmn |
(1.427)
n=1
which shows that |mn i is an eigenvector of A A† with eigenvalue en A A† |mn i = Sn2 |mn i.
(1.428)
The SVD expansion (1.424) usually is written as a product of three explicit
66
Linear Algebra
matrices, A = U ΣV † . The middle matrix Σ is an M × N matrix with √ the min(M, N ) singular values Sn = en on its main diagonal and zeros elsewhere. By convention, one writes the Sn in decreasing order with the biggest Sn as entry Σ11 . The first matrix U and the third matrix V † depend upon the bases one uses to compute the matrix elements of the linear operator A. If these basis vectors are |i, 0i & |k, 0i, then min(M,N )
X
Aik = hi, 0|A|k, 0i =
hi, 0|mn iSn hn|k, 0i
(1.429)
n=1
so that the i, nth entry in the matrix U is Uin = hi, 0|mn i. The columns of the matrix U are the left-singular vectors of the matrix A: U1n h1, 0|mn i U2n h2, 0|mn i (1.430) .. = . .. . . UM n
hM, 0|mn i
Similarly, the n, kth entry of the matrix V † is V † n,k = hn|k, 0i. Thus Vk,n = V T n,k = hn|k, 0i∗ = hk, 0|ni. The columns of the matrix V are the right-singular vectors of the matrix A: V1n h1, 0|ni V2n h2, 0|ni (1.431) .. = . .. . . VN n
hN, 0|ni
Since the columns of U and of V respectively are M and N orthonormal vectors, both of these matrices are unitary (U † U and V † V are the M × M and N × N identity matrices). The matrix form of the SVD for A then is † Aik = Uin0 Σn0 n Vnk
(1.432)
in which Σn0 n = Sn δn0 n is diagonal, or in matrix notation A = U ΣV † .
(1.433)
The usual statement of the SVD theorem is: Every M ×N complex matrix A can be written as the matrix product of an M × M unitary matrix U , an M × N matrix Σ that is zero except for its min(M, N ) diagonal elements, and an N × N unitary matrix V † A = U Σ V †.
(1.434)
The first min(M, N ) diagonal elements of S are the singular values Si ; they
1.35 The Singular-Value Decomposition
67
are real and non-negative. The first min(M, N ) columns of U and V are the left and right singular vectors of A. The right-most max(N − M, 0) + Z columns (1.431) of the matrix V span the null space or kernel of A, and the left-most min(M, N ) − Z columns (1.430)of the matrix U span the range of A. Numerical Example: Suppose A is the 2 × 3 matrix 0 1 0 A= . (1.435) 1 0 1 Then the positive hermitian matrix A† A is 1 0 1 A† A = 0 1 0 . 1 0 1
(1.436)
One may use Matlab’s eig or Maple’s Eigenvectors command to find the normalized eigenvectors and eigenvalues of A† A as 1 0 −1 1 1 |1i = √ 0 , e1 = 2; |2i = 1 , e2 = 1; |3i = √ 0 , e3 = 0. 2 1 2 0 1 (1.437) The third eigenvalue e3 had to vanish because A is a 3 × 2 matrix. The vector A|1i is 0 |A1i = A|1i = √ (1.438) 2 and its norm is
q √ h1|A† A|1i = 2
(1.439)
so that the normalized vector |m1 i is |A1i |m1 i = √ = 2
0 . 1
(1.440)
Similarly, the normalized vector |m2 i is 1 . 0
(1.441)
|mn iSn hn| = U ΣV †
(1.442)
|m2 i = p
A|2i h2|A† A|2i
=
The SVD of A then is A=
2 X n=1
68
where Sn = hk, 0|ni are
Linear Algebra
√
en and the unitary matrices Ui,n = hi, 0|mn i and Vk,n = 1 0 −1 1 √ and V = √ 0 2 0 2 1 0 1
0 1 U= 1 0
(1.443)
and the diagonal matrix Σ is 2 0 0 . 0 1 0
√ Σ=
(1.444)
So finally the SVD of A = U ΣV † is √ 1 √0 1 1 0 1 2 0 0 √ 0 A= 2 0 . 1 0 0 1 0 2 −1 0 1
(1.445)
The null space or kernel of A is the set of vectors that are real multiples c −1 c NA = √ (1.446) 0 2 1 of the third column of the matrix V displayed in (1.443). Application: We may use the SVD to solve, when possible, the matrix equation A |xi = |yi
(1.447)
for the N -dimensional vector |xi in terms of the M -dimensional vector |yi and the M × N matrix A. Using the SVD expansion (1.424), we have min(M,N )
X
|mn iSn hn|xi = |yi.
(1.448)
n=1
The orthonormality (1.417) of the vectors |mn i then tells us that Sn hn|xi = hmn |yi.
(1.449)
If the singular value is positive Sn > 0 whenever hmn |yi 6= 0, then we may divide by the singular value to get hn|xi = hmn |yi/Sn and so find the solution min(M,N ) X hmn |yi |xi = |ni. (1.450) Sn n=1
But this solution is not always available or unique.
1.35 The Singular-Value Decomposition
69
For instance, if for some n0 the inner product hmn0 |yi = 6 0 while the singular value Sn0 = 0, then there is no solution to equation (1.447). This problem often occurs when M > N . Example: Suppose A is the 3 × 2 matrix r1 p1 A = r2 p2 (1.451) r3 p3 and the vector |yi is the cross-product |yi = L = r × p. Then no solution |xi exists to the equation A|xi = |yi
(1.452)
(unless r and p are parallel) because A|xi is a linear combination of the vectors r and p while |yi = L is perpendicular to both r and p. Even when the matrix A is square, the equation (1.447) sometimes has no solutions. For instance, if A is a square matrix that vanishes, A = 0, then (1.447) has no solutions whenever |yi = 6 0. And when N > M , as in for instance x1 a b c y1 (1.453) x2 = d e f y2 x3 the solution (1.450) is never unique, for we may add to it any linear combination of the vectors |ni that A annihilates for M < n ≤ N min(M,N )
|xi =
X n=1
N X hmn |yi |ni + xn |ni. Sn
(1.454)
n=M +1
These are the vectors |ni for M < n ≤ N which A maps to zero since they do not occur in the sum (1.424) which stops at n = min(M, N ) < N . Example: In the standard model, the quark mass matrix is a 3 × 3 complex, symmetric matrix M . Since M is symmetric (M = M T ), its adjoint is its complex conjugate, M † = M ∗ . So the right singular vectors |ni are the eigenstates of M ∗ M as in(1.413) M ∗ M |ni = Sn2 |ni
(1.455)
and the left singular vectors |mn i are the eigenstates of M M ∗ as in (1.428) M M ∗ |mn i = (M ∗ M )∗ |mn i = Sn2 |mn i.
(1.456)
Thus the left singular vectors are just the complex conjugates of the right singular vectors, |mn i = |ni∗ . But this means that the unitary matrix
70
Linear Algebra
V is the complex conjugate of the unitary matrix U , so the SVD of M is (Autonne, 1915) M = U ΣU T .
(1.457)
The masses of the quarks then are the non-negative singular values Sn along the diagonal of the matrix Σ. By redefining the quark fields, one may make the (CKM) matrix U real—except for a single complex phase, which causes a violation of charge-conjugation-parity (CP ) symmetry. 1.36 The Moore-Penrose Pseudoinverse Although a matrix A has an inverse A−1 if and only if it is square and has a non-zero determinant, one may use the singular-value decomposition to make a pseudoinverse A+ for an arbitrary M × N matrix A. If the singularvalue decomposition of the matrix A is A = U ΣV †
(1.458)
then the Moore-Penrose pseudoinverse (Eliakim H. Moore 1862–1932, Roger Penrose 1931–) is A+ = V Σ+ U †
(1.459)
in which Σ+ is the transpose of the matrix Σ with every non-zero entry replaced by its inverse (and the zeros left as they are). One may show that the pseudoinverse A+ satisfies the four relations A A+ A = A A+ A A+ = A+ † A A+ = A A+ † A+ A = A+ A.
(1.460)
and that it is the only matrix that does so. Suppose that all the singular values of the M × N matrix A are positive. In this case, if A has more rows than columns, so that M > N , then the product AA+ is the N × N identity matrix IN A+ A = V † Σ+ ΣV = V † IN V = IN
(1.461)
and AA+ is an M × M matrix that is not the identity matrix IM . If instead A has more columns than rows, so that N > M , then AA+ is the M × M identity matrix IM AA+ = U ΣΣ+ U † = U IM U † = IM
(1.462)
1.36 The Moore-Penrose Pseudoinverse
71
but A+ A is an N × N matrix that is not the identity matrix IN . If the matrix is square with positive singular values, then it has a true inverse A−1 which is equal to its pseudoinverse A−1 = A+ .
(1.463)
If the columns of A are linearly independent, then the matrix A† A has an inverse, and the pseudoinverse is −1 A+ = A† A A† . (1.464) The solution (1.224) to the complex least-squares method used this pseudoinverse. If the rows of A are linearly independent, then the matrix AA† has an inverse, and the pseudoinverse is −1 . (1.465) A+ = A† AA† If both the rows and the columns of A are linearly independent, then the matrix A has an inverse A−1 , and it’s the pseudoinverse A−1 = A+ . Example: The pseudoinverse A+ of the matrix A 0 1 0 A= 1 0 1
(1.466)
(1.467)
with singular-value decomposition (1.445) is A+ = V Σ + U † √ 1 √0 −1 1/ 2 0 1 0 1 =√ 0 2 0 0 1 1 0 2 1 0 0 0 1 0 1/2 = 1 0 0 1/2 which satisfies the four conditions (1.460). identity matrix 0 0 1 0 A A+ = 1 1 0 1 0
(1.468)
The product A A+ gives the 2×2 1/2 1 0 = 0 0 1 1/2
(1.469)
72
Linear Algebra
which is an instance of (1.462). Moreover, the rows of A are linearly independent, and so the simple rule (1.465) works: −1 A+ = A† AA† −1 1 0 −1 1 0 1 0 0 1 0 0 1 0 1 = 0 1 = 0 1 1 0 1 2 0 1 0 1 0 1 0 0 1/2 1 0 0 1/2 (1.470) = 1 0 = 0 1 1 0 0 1/2 1 0 which is (1.468). The columns of the matrix A are not linearly independent, however, and so the simple rule (1.464) fails. Thus the product A+ A 0 1/2 1 0 1 1 0 1 0 A+ A = 1 0 = 0 2 0 (1.471) 1 0 1 2 0 1/2 1 0 1 is not the 3 × 3 identity matrix which it would be if (1.464) held. 1.37 The Rank of a Matrix There are at least three equivalent definitions of the rank R(A) of an M ×N matrix A. The rank of A is 1. 2. 3. 4.
the the the the
number number number number
of of of of
its linearly independent rows, its linearly independent columns, its nonzero singular values, and rows in its biggest square nonsingular submatrix.
A matrix of rank zero has no non-zero singular values and so vanishes identically. Example 1.7 (Rank)
The 3 × 4 matrix 1 0 1 −2 A = 2 2 0 2 4 3 1 1
(1.472)
has three rows, so its rank can be at most 3. But twice the first row added to thrice the second row equals twice the third row or 2r1 + 3r2 − 2r3 = 0
(1.473)
73
1.38 Software
so R(A) ≤ 2. The first two rows obviously are not parallel, so they are linearly independent. Thus the number of linearly independent rows of A is 2, and so A has rank 2.
1.38 Software Free, high-quality software for virtually all imaginable problems of linear algebra are available in lapack—the linear algebra package. The fortran version is available at the web-site http://www.netlib.org/lapack/ and the C++ version at http://math.nist.gov/tnt/. The commercial programs Maple, Mathematica, and matlab can solve problems in many areas, including linear algebra. A free gnu version of matlab is available at http://www.gnu.org/software/octave/.
1.39 The Tensor/Direct Product The tensor product (also called the direct product) is simple, but it can confuse students if they see it for the first time in a course on quantum mechanics. The tensor product is used to describe composite systems, such as an angular momentum composed of orbital and spin angular momenta. If A is an M × N matrix with elements Aij and Λ is a K × L matrix with elements Λαβ , then their direct product C = A ⊗ Λ is an M K × N L matrix with elements Ciα,jβ = Aij Λαβ .
(1.474)
This direct-product matrix A ⊗ Λ maps the vector Vjβ into the vector Wiα =
N X L X j=1 β=1
Ciα,jβ Vjβ =
N X L X
Aij Λαβ Vjβ .
(1.475)
j=1 β=1
In this sum, the second indices of A and Λ match those of the vector V . The most important case is when both A and Λ are square matrices, as will be their product C = A ⊗ Λ. We’ll focus on this case in the rest of this section. The key idea here is that the direct product is a product of two operators that act on two different spaces. The operator A acts on the space S spanned by the N kets |ii, and the operator Λ acts on the space Σ spanned by the K kets |αi. Let us assume that both operators map into these spaces, so that
74
Linear Algebra
we may write them as N X
A = IS AIS =
|iihi|A|jihj|
(1.476)
|αihα|Λ|βihβ|.
(1.477)
i,j=1
and as Λ = IΣ ΛIΣ =
K X α,β=1
Then the direct product C = A ⊗ Λ C =A⊗Λ=
N K X X
|ii ⊗ |αi hi|A|jihα|Λ|βi hj| ⊗ hβ|
(1.478)
i,j=1 α,β=1
acts on the direct product of the two vector spaces S ⊗ Σ which is spanned by the direct-product kets |i, αi = |ii |αi = |ii ⊗ |αi.
(1.479)
In general, the direct-product space S ⊗ Σ is much bigger than the spaces S and Σ. For although S ⊗ Σ is spanned by the direct-product kets |ii ⊗ |αi, most vectors in the space S ⊗ Σ are of the form |ψi =
N X K X
ψ(i, α)|ii ⊗ |αi
(1.480)
i=1 α=1
and not the direct product |si ⊗ |σi |si ⊗ |σi =
=
N X
!
si |ii i=1 N X K X
⊗
K X
! σα |αi
α=1
si σα |ii ⊗ |αi
(1.481)
i=1 α=1
of a pair of vectors |si ∈ S and |σi ∈ Σ. Using the simpler notation |i, αi for |ii ⊗ |αi, we may write the action of the direct-product operator A ⊗ Λ on the state |ψi =
N X K X
|i, αihi, α|ψi
(1.482)
i=1 α=1
as (A ⊗ Λ)|ψi =
N K X X i,j=1 α,β=1
|i, αi hi|A|jihα|Λ|βi hj, β|ψi.
(1.483)
1.39 The Tensor/Direct Product
75
Example: Suppose the states |n, `, mi are the eigenvectors of the hamiltonian H, the square L2 of the orbital angular momentum L, and the third component of the orbital angular momentum L3 for a hydrogen atom without spin: H|n, `, mi = En |n, `, mi L2 |n, `, mi = ~2 `(` + 1)|n, `, mi L3 |n, `, mi = ~m|n, `, mi.
(1.484)
Suppose the states |σi for σ = ± are the eigenstates of the third component S3 of the operator S that represents the spin of the electron ~ S3 |σi = σ |σi. 2 Then the direct- or tensor-product states |n, `, m, σi ≡ |n, `, mi ⊗ |σi ≡ |n, `, mi|σi
(1.485)
(1.486)
represent a hydrogen atom including the spin of its electron. They are eigenvectors of all four operators H, L2 , L3 , and S3 : H|n, `, m, σi = En |n, `, m, σi L2 |n, `, m, σi = ~2 `(` + 1)|n, `, m, σi L3 |n, `, m, σi = ~m|n, `, m, σi S3 |n, `, m, σi = σ~|n, `, m, σi.
(1.487)
Suitable linear combinations of these states are eigenvectors of the square J 2 of the composite angular momentum J = L + S as well as of J3 , L3 , and S3 . Example: The least-positive value of angular momentum is ~/2. The spin-one-half angular momentum operators S are represented by three 2 × 2 matrices ~ Sa = σa (1.488) 2 in which the σa are the Pauli matrices 0 1 0 −i 1 0 σ1 = , σ2 = , and σ3 = . (1.489) 1 0 i 0 0 −1 Consider two spin operators S (1) and S (2) acting on two spin-one-half (1) systems. The states |±i1 are eigenstates of S3 ~ (1) S3 |±i1 = ± |±i1 2
(1.490)
76
Linear Algebra (2)
and the states |±i2 are eigenstates of S3
~ (2) S3 |±i2 = ± |±i2 . 2 Then the direct-product states |±, ±i = |±i1 |±i2 = |±i1 ⊗ |±i2 (1)
are eigenstates of both S3
(2)
and S3
(1.491)
(1.492)
so that, for instance,
~ |+, −i 2 ~ (2) S3 |+, −i = − |+, −i. (1.493) 2 These states also are eigenstates of the third component of the spin operator of the combined system (1)
S3 |+, −i =
(1)
(2)
S3 = S3 + S3 .
(1.494)
The state |+, +i is an eigenstate of S3 with eigenvalue ~ (1)
(2)
S3 |+, +i = S3 |+, +i + S3 |+, +i =
~ ~ |+, +i + |+, +i = ~|+, +i. (1.495) 2 2
Similarly, the state |−, −i is an eigenstate of S3 with eigenvalue −~ ~ ~ (1) (2) S3 |−, −i = S3 |−, −i + S3 |−, −i = − |−, −i − |−, −i = − ~|−, −i. 2 2 (1.496) The states |+, −i and |−, +i are eigenstates of S3 with eigenvalue 0 (1)
(2)
S3 |+, −i = S3 |+, −i + S3 |+, −i =
~ ~ |+, −i − |+, −i = 0 2 2
(1.497)
and ~ ~ (1) (2) S3 |−, +i = S3 |−, +i + S3 |−, +i = − |−, +i + |−, +i = 0. 2 2
(1.498)
Now let’s consider the effect of the operator S12 on the state | + +i ~2 (1) (1) (2) 2 (2) 2 | + +i = S12 | + +i = S1 + S1 σ1 + σ1 | + +i 4 ~2 ~2 (1) (2) (1) (2) = 1 + σ1 σ1 | + +i = | + +i + σ1 |+iσ1 |+i 2 2 ~2 = (| + +i + | − −i) . (1.499) 2 The rest of this exercise will left to problem (27).
77
1.40 Density Operators
1.40 Density Operators A general quantum-mechanical system is represented by a density operator ρ that is hermitian, of unit trace, and positive ρ† = ρ
(1.500)
Trρ = 1
(1.501)
ρ ≥ 0.
(1.502)
By positive we mean that every diagonal matrix element is non-negative pV = hV |ρ|V i =
N X N X
Vi∗ ρij Vj ≥ 0.
(1.503)
i=1 j=1
If the state |V i is normalized, then pV is the non-negative probability that the system is in that state. We already knew from (1.36) that this probability is real because the density matrix is hermitian. If {|ni} is any complete set X I= |nihn| (1.504) n
of orthonormal states, then the probability that the system is in the state |ni is pn = hn|ρ|ni = Tr (ρ|nihn|) .
(1.505)
The unit-trace condition (1.501) implies that the sum of these probabilities is unity ! X X X pn = (1.506) hn|ρ|ni = Tr ρ |nihn| = Tr (ρI) = Trρ = 1. n
n
n
A system that is measured to be in a state |ni cannot simultaneously be measured to be in an orthogonal state |mi. The probabilities sum to unity because the system must be in some state. Since the density operator ρ is hermitian, it has a complete, orthonormal set of eigenvectors |k 0 i all of which have non-negative eigenvalues ρk ρ|k 0 i = ρk |k 0 i.
(1.507)
They afford for it the expansion ρ=
N X k=1
ρk |k 0 ihk 0 |
(1.508)
78
Linear Algebra
in which the eigenvalue ρk is the probability that the system is in the state |k 0 i. 1.41 Correlation Functions Two degenerate inner products can be defined in terms of a density matrix ρ. If |f i and |gi are two states, then the inner product (f, g) = hf |ρ|gi
(1.509)
satisfies the conditions (1.76–1.79) for a degenerate inner product. It will not, however, satisfy the hermitian condition (1.80) if the probability that the system is in some non-zero state |f i pf = hf |ρ|f i = (f, f ) = 0
(1.510)
vanishes, as often might happen. The second degenerate inner product applies to operators A and B and is defined (Titulaer and Glauber, 1965) as (1.511) (A, B) = Tr ρA† B = Tr BρA† = Tr A† Bρ . This inner product satisfies the conditions (1.76–1.79) for a degenerate inner product but not (1.80) since the operators A and B could be taken to be the outer products of a non-zero state |f i with pf = 0. Our derivation of the Schwarz inequality (1.95) (f, f )(g, g) ≥ |(f, g)|2
(1.512)
did not use the hermiticity condition (1.80), and so the Schwarz inequality holds for all inner products, including degenerate ones like (1.509 & 1.511). Applied to the vector inner product (1.509), the Schwartz inequality gives hf |ρ|f ihg|ρ|gi ≥ |hf |ρ|gi|2
(1.513)
which is a useful property of density matrices. Application of the Schwarz inequality to the operator inner product (1.511) gives (Titulaer and Glauber, 1965) Tr ρA† A Tr ρB † B ≥ |Tr ρA† B |2 . (1.514) The operator Ei (x) that represents the ith component of the electric field (+) at the point x is the hermitian sum of the “positive-frequency” part Ei (x) (−) (+) and its adjoint Ei (x) = (Ei (x))† (+)
Ei (x) = Ei
(−)
(x) + Ei
(x).
(1.515)
79
1.42 Problems (1)
Glauber has defined the first-order correlation function Gij (x, y) as (Glauber, 1963b) (1) (−) (+) Gij (x, y) = Tr ρEi (x)Ej (y) (1.516) or in terms of the inner product (1.511) as (1)
(+)
Gij (x, y) = (Ei
(+)
(x), Ej (y)).
(1.517)
(+)
By setting A = Ei (x), etc., it follows then from the Schwartz inequality (1) (1.514) that the correlation function Gij (x, y) is bounded by (Titulaer and Glauber, 1965) (1)
(1)
(1)
Gii (x, x)Gjj (y, y) ≥ |Gij (x, y)|2 .
(1.518)
Incidentally, the interference fringes are sharpest when this inequality is saturated (1)
(1)
(1)
Gii (x, x)Gjj (y, y) = |Gij (x, y)|2 .
(1.519)
(1)
which can occur only if the correlation function Gij (x, y) factorizes (Titulaer and Glauber, 1965) (1)
Gij (x, y) = Ei∗ (x)Ej (y)
(1.520)
as it does when the density operator is an outer product of coherent states ρ = |{αk }ih{αk }| (+)
which are eigenstates of Ei
(+)
Ei
(1.521)
(x) with eigenvalue Ei (x)
(x)|{αk }i = Ei (x)|{αk }i
(1.522)
as discussed by Glauber (Glauber, 1963b,a). The higher-order correlation functions (n) (−) (−) (+) (+) Gi1 ...i2n (x1 . . . x2n ) = Tr ρEi1 (x1 ) . . . Ein (xn )Ein+1 (xn+1 ) . . . Ei2n (xn ) (1.523) satisfy similar inequalities (Glauber, 1963b) which also follow from the Schwartz inequality (1.514).
1.42 Problems 1. Show that any function that is a power series in two Grassmann numbers (1.13) is a polynomial with at most four terms as in (1.14).
80
Linear Algebra
2. Show that the two 4 × 4 matrices (1.48) satisfy Grassmann’s algebra (1.11) for N = 2. 3. Derive (1.66) from (1.63–linear operator 5.3). 4. Show that the matrix (1.43) is positive on the space of all real 2-vectors but not on the space of all complex 2-vectors. 5. Show that the inequality (1.99) follows from the Schwarz inequality (1.98). 6. Show that the inequality (1.101) follows from the Schwarz inequality (1.100). 7. Show that (AB) T = B T AT , which is Eq.(1.29). 8. Show that a real hermitian matrix is symmetric. 9. Show that (AB)† = B † A† , which is Eq.(1.32). 10. Show that the anti-linearity (1.78) of the inner product follows from its first two properties (1.76 & 1.77). 11. Derive Eq.(1.94) and use it to show that (1.92) implies the Schwarz inequality (1.95). 12. Find orthonormal linear combinations of the three vectors 1 1 1 s1 = 0 , s2 = 1 , s 3 = 1 . (1.524) 0 0 1 13. Derive the linearity (1.124) of the outer product from its definition (1.123). 14. Derive the cyclicity (1.27) of the trace from Eq.(1.26). 15. For the 2 × 2 matrices 1 2 2 −1 A= and B = (1.525) 3 −4 4 −3 16. 17. 18. 19. 20. 21.
verify equations (1.203–1.205). Derive the least-squares solution (1.224) for complex A, x, and y when the matrix A† A is positive. Show that a linear operator A that is represented by a hermitian matrix (1.239) in an orthonormal basis is self adjoint (1.236). Show that the eigenvalues λ of a unitary matrix are unimodular, that is, |λ| = 1. Show that the sum of the eigenvalues of an anti-symmetric matrix vanishes. Use (1.309) to derive expression (1.310) for the 2 × 2 rotation matrix exp(iθ · σ). Compute the characteristic equation for the matrix −iθ · J in which the generators are (Jk )ij = iikj and ijk is totally antisymmetric with 123 = 1.
1.42 Problems
81
22. Use the characteristic equation of problem (21) to derive identities (1.313) and (1.314) for the 3×3 real orthogonal matrix exp(−iθ · J ). 23. Consider the 2 × 3 matrix A 1 2 3 A= . (1.526) −3 0 1 Perform the singular value decomposition A = U SV > , where V > the transpose of V . Find the singular values and the real orthogonal matrices U and V . Students may use Lapack, Octave, Matlab, Maple or any other program to do this problem. 24. Consider the 6 × 9 matrix A with elements Aj,k = x + xj + i(y − y k )
(1.527)
in which x = 1.1 and y = 1.02. Find the singular values, and the first left and right singular vectors. Students may use Lapack, Octave, Matlab, Maple or any other program to do this problem. 25. Show that the totally antisymmetric Levi-Civita tensor ijk satisfies the useful relation 3 X ijk inm = δjn δkm − δjm δkn . (1.528) i=1
26. Consider the hamiltonian H = 12 ~ωσ3
(1.529)
where σ3 is defined in (1.489). The entropy S of this system at temperature T is S = −kTr [ρ ln(ρ)]
(1.530)
in which the density operator ρ is ρ=
e−H/(kT ) . Tr e−H/(kT )
(1.531)
Find expressions for the density operator ρ and its entropy S. 2 27. Find the action of the operator S2 = S(1) + S(2) defined by (1.488) on the four states | ± ±i and then find the eigenstates and eigenvalues of S2 in the space spanned by these four states.
2 Fourier Series
“There are two things that matter in politics. The first is money, and I can’t remember the second.” Senator Marcus Hanna (R, OH, 1837–1904)
2.1 Complex Fourier Series A set of orthonormal vectors spans a vector space; a set of orthonormal functions spans a space of functions. There are infinitely many basis sets of orthonormal functions to choose from. Fourier chose sines and cosines, which are periodic and easy to integrate, and invented the series that bear his name in order to compute how heat diffuses in solid bodies (Joseph Fourier, 1768–1830). An even simpler set of orthonormal functions is the complex exponentials en (x) = hx|en i =
exp(inx) √ 2π
(2.1)
one for each integer n. Their orthonormality follows from the integral formula Z 2π Z 2π hen |em i = hen |xihx|em i dx = hx|en i∗ hx|em i dx 0 0 Z 2π inx ∗ imx e e 1 if n = m √ √ = dx = δnm = (2.2) 0 if n 6= m. 2π 2π 0 Let’s expand a function f in terms of these vectors |en i as |f i =
∞ X n=−∞
cn |en i
(2.3)
83
2.1 Complex Fourier Series
or, equivalently as f (x) = hx|f i =
∞ X
cn hx|en i =
n=−∞
∞ X
∞ X
cn en (x) =
n=−∞
n=−∞
einx cn √ . 2π
(2.4)
Then by using the orthonormality (2.2) of the vectors |en i, we may identify the complex number cm as the inner product hem |f i hem |f i =
∞ X
cn hem |en i =
n=−∞
∞ X
cn δm,n = cm
(2.5)
n=−∞
which is the integral Z
2π
Z
hem |xihx|f i dx = cm = hem |f i = 0 Z 2π −imx e √ = f (x) dx. 2π 0
2π
dx
0
eimx √ 2π
∗ f (x) (2.6)
What kind of functions can we represent this way? They are periodic functions with period 2π because the relation exp(in2π) = 1 implies that f (x ± 2π) =
∞ X n=−∞
∞ X ein(x±2π) einx cn √ = cn √ = f (x). 2π 2π n=−∞
(2.7)
Note that if the function f (x) is not periodic, then its Fourier series (2.4) will nevertheless be strictly periodic, as illustrated in Figs. 2.1 & 2.2. In equations (2.2–2.7), we singled out the interval [0, 2π], but to represent a periodic function with period 2π, we could have used any interval of length 2π, such as the interval [ − π, π] or [r, r + 2π], as explained in Sec. 2.2. The expansion (2.4) for f (x) and the rule (2.6) for the coefficients cn lead to a useful statement of the completeness of the exponential functions en (x) on our arbitrarily chosen interval [0, 2π] Z 2π ∞ ∞ X X einx e−iny einx f (x) = cn √ = dy √ f (y) √ (2.8) 2π 2π 2π 0 n=−∞ n=−∞ for x ∈ [0, 2π]. Interchanging and rearranging, we find for x ∈ [0, 2π] ! Z 2π ∞ X ein(x−y) f (y). (2.9) f (x) = dy 2π 0 n=−∞ Dirac’s notation for this is Z f (x) =
2π
dy δ(x − y) f (y) 0
(2.10)
84
Fourier Series
for x ∈ [0, 2π]. So the exponential functions en (x) provide for Dirac’s delta function on the interval [0, 2π] the representation δ(x − y) =
∞ X
e∗n (y)en (x)
n=−∞
∞ X ein(x−y) = 2π n=−∞
(2.11)
which is valid on the space of functions f (x) that can be represented by a Fourier series of the form (2.4). With hx|en i = en (x) and hen |yi = e∗n (y), one may write this as ! ∞ X δ(x − y) = hx|yi = hx|I|yi = hx| |en ihen | |yi (2.12) n=−∞
whence I=
∞ X
|en ihen |.
(2.13)
n=−∞
This completeness relation is analogous to that (1.379) for the vectors |ni. In the representation (2.11) for the Dirac delta function, the sine terms for ±n cancel, and so we may drop them to get the simpler representation δ(x − y) =
∞ ∞ X ein(x−y) 1 X = [cos n(x − y) + i sin n(x − y)] 2π 2π n=−∞ n=−∞
∞ ∞ 1 1 X 1X cos n(x − y) = = + cos n(x − y). 2π n=−∞ 2π π
(2.14)
n=1
How is the Fourier series for the complex-conjugate function f ∗ (x) related to the series for f (x)? The complex conjugate of Eqs.(2.4) is f ∗ (x) =
∞ X n=−∞
∞ X e−inx einx = c∗n √ c∗−n √ 2π 2π n=−∞
(2.15)
so the coefficients cn (f ∗ ) for f ∗ (x) are related to those cn (f ) for f (x) by cn (f ∗ ) = c∗−n (f ).
(2.16)
Thus, if the function f (x) is real, then cn (f ) = cn (f ∗ ) = c∗−n (f ).
(2.17)
Dropping all reference to the functions, we see that the Fourier coefficients cn for a real function f (x) satisfy cn = c∗−n .
(2.18)
2.2 The Interval
85
2.2 The Interval The interval of integration need not run from 0 to 2π; it can run from r to r + 2π. The choice r = −π often is convenient. With this choice of interval, the coefficient cn is the integral (2.6) shifted by −π: Z π e−inx f (x). (2.19) cn = dx √ 2π −π Note that for functions f (x) that are periodic with period 2π, this integral is the same as that of Eq.(2.6).
2.3 Where to Put the 2Pi’s
√ In Eqs.(2.4 & 2.6), we √ used the orthonormal functions exp(inx)/ 2π, and so we had factors of 1/ 2π in both equations. If one gets tired of having so many explicit square-roots, then one may set cn dn = √ 2π
(2.20)
and write (2.4) and (2.6) as f (x) =
∞ X
dn einx
(2.21)
n=−∞
with cn 1 dn = √ = 2π 2π
Z
2π
dx e−inx f (x).
(2.22)
0
2.4 Example Let’s compute the Fourier series for the real function f (x) = exp(−m|x|) on the interval (−π, π). Using Eq.(2.19) for the shifted interval and the 2π-placement convention (2.20–2.22), we find for the coefficient dn Z π dx −inx −m|x| dn = e e (2.23) −π 2π which we may split into the two pieces Z 0 Z π dx (m−in)x dx −(m+in)x dn = e + e . 2π 2π −π 0
(2.24)
86
Fourier Series
1
0.8
0.6 e−2|x| 0.4
0.2
0 -6
-4
-2
0
2
4
6
x Figure 2.1 The Fourier series (2.27) for the function exp(−2|x|) with 10 terms (solid, red) and 100 terms (dashed, green). The function f (x) = exp(−2|x|) itself is plotted in blue dashes. The Fourier series is periodic, but the function f (x) is not.
After doing the integrals and clearing away the mathematical foliage, we find
dn =
1 m 1 − (−1)n e−πm . 2 2 π m +n
(2.25)
Here, since m is real, dn = d∗n , but also dn = d−n . So the coefficients dn satisfy the condition dn = d∗n (Eq. (2.18)) which holds when the function
87
2.5 Real Fourier Series for Real Functions
f (x) is real. The Fourier series for exp(−m|x|) then is
e−m|x| = =
∞ X
dn einx
(2.26)
n=−∞ ∞ X
m 1 1 − (−1)n e−πm einx 2 2 π m +n n=−∞
∞ X 1 1 m −πm n −πm 1−e +2 1 − (−1) e cos(nx) = mπ π m2 + n2 n=1
In Fig. 2.1, the 10-term (solid, red) and 100-term (green dashes) Fourier series for m = 2 are plotted from x = −2π to x = 2π. The function exp(−2|x|) itself is represented by short blue dashes. Note that its Fourier series is periodic with period 2π even though the function exp(−2|x|) is not periodic. Usually, one does not use different letters cn and dn to distinguish between the symmetric (2.4 & 2.6) and asymmetric (2.20–2.22) conventions on the placement of the 2π’s. So we’ll use cn for the coefficients in both conventions.
2.5 Real Fourier Series for Real Functions The Fourier series outlined above are simple and apply to functions that are continuous and periodic — whether complex or real. If the function f (x) is real, then by (2.18)
c−n = c∗n
whence
c0 = c∗0
(2.27)
88
Fourier Series
so c0 is real. Thus the Fourier series (2.21) for a real function f (x) is f (x) = c0 + = c0 + = c0 + = c0 + = c0 +
∞ X n=1 ∞ X n=1 ∞ X n=1 ∞ X n=1 ∞ X
−1 X
cn einx +
cn einx
n=−∞
cn einx + c−n e−inx
cn einx + c∗n e−inx cn (cos nx + i sin nx) + c∗n (cos nx − i sin nx) (cn + c∗n ) cos nx + i(cn − c∗n ) sin nx.
(2.28)
n=1
Let’s write cn as cn =
1 (an − ibn ), 2
so that
an = cn + c∗n
and bn = i(cn − c∗n ). (2.29)
Then the Fourier series (2.28) for a real function f (x) is ∞
f (x) =
a0 X + an cos nx + bn sin nx. 2
(2.30)
n=1
What are the formulas for an and bn ? By (2.29 & 2.22), the coefficient an is dx e−inx + einx an = f (x) π 2 0 0 (2.31) since the function f (x) is real. So the coefficient an of cos nx in (2.30) is the cosine integral of f (x) Z 2π dx an = cos nx f (x). (2.32) π 0 Z
2π
dx −inx e f (x) + einx f ∗ (x) = 2π
Z
2π
Similarly, Eqs.(2.29) & (2.22) and the reality of f (x) imply that the coefficient bn is the sine integral of f (x) Z 2π Z 2π dx dx e−inx − einx bn = i f (x) = sin nx f (x). (2.33) π 2 π 0 0 The real Fourier series (2.30) and the cosine (2.32) and sine (2.33) integrals
89
2.5 Real Fourier Series for Real Functions
for the coefficients an and bn also follow from the orthogonality relations 2π
Z
dx sin mx sin nx =
0 2π
Z 0
Z
π if n = m 6= 0 0 otherwise,
π if n = m 6= 0 dx cos mx cos nx = 2π if n = m = 0 0 otherwise, and
(2.34)
(2.35)
2π
dx sin mx cos nx = 0,
(2.36)
0
which hold for integer values of n and m. What if a function f (x) is not periodic? The Fourier series for a function that is not periodic is itself strictly periodic. In such cases, the Fourier series differs somewhat from the function near the ends of the interval and differs markedly from it outside the interval, where the series but not the function is periodic. Example 2.1 (The Gibbs Overshoot) The function f (x) = x on the interval (−π, π) is certainly not periodic. So we expect trouble if we represent it as a Fourier series. We will not be disappointed. Since x is an odd function, Eq.(2.32) tells us that the coefficients an all vanish. By (2.33), the bn ’s are π
dx 1 x sin nx = 2 (−1)n+1 . π n −π
Z bn =
(2.37)
As shown in Fig. 2.2, the series ∞ X n=1
2 (−1)n+1
1 sin nx n
(2.38)
differs from the function f (x) = x near the ends x = ±π of the interval at [ − π, π]. This defect, amounting to about 9% of the discontinuity, is called the Gibbs phenomenon or the Gibb overshoot (J. Willard Gibbs, 1839– 1903; incidentally, Gibbs’s father successfully defended the Africans of the schooner Amistad ). The Fourier series (2.38) is periodic with period 2π while the function f (x) = x has no periodicity. Thus, the Fourier series (2.38) differs markedly from the function outside the interval [ − π, π].
90
Fourier Series
6
4
2
f (x)
0
-2
-4
-6 -6
-4
-2
0
2
4
6
x Figure 2.2 The Fourier series (2.38) for the function x with 10 terms (solid, red), 100 terms (dashed, green), and 1000 terms (short blue dashes). The function f (x) = x itself is plotted in purple dots. The Fourier series is peridic, but the function f (x) is not.
2.6 Stretched Intervals If the interval of periodicity is of length L instead of 2π, then we may use the functions en,L (x) = hx|en , Li =
ei2πnx/L √ L
(2.39)
ei2πmx/L √ = δnm . L
(2.40)
which are orthonormal on the interval (0, L) Z hen , L|em , Li =
L
dx 0
ei2πnx/L √ L
!∗
91
2.6 Stretched Intervals
The coefficient f` in the expansion |f i =
∞ X
fn |en , Li
(2.41)
n=−∞
is given by the inner product ∞ X
he` , L|f i =
fn he` , L|en , Li =
n=−∞
∞ X
fn δ`,n = f`
(2.42)
n=−∞
which is the integral L
Z f` = he` , L|f i = 0
e−i2π`x/L √ f (x) dx. L
(2.43)
The Fourier series for f (x) on the stretched interval (0, L) is then f (x) = hx|f i =
∞ X
fn hx|en , Li =
n=−∞
∞ X
fn
n=−∞
ei2πnx/L √ . L
(2.44)
The relations (2.40, 2.44, & 2.43) generalize to an interval of length L our earlier formulas (2.2, 2.4, & 2.6) which hold for an interval of length 2π. If the function f (x) is periodic with period L, f (x ± L) = f (x), then we may shift the domain of integration by any real number r Z L+r −i2π`x/L e √ f` = f (x) dx. (2.45) L r An obvious choice is r = −L/2 for which this equation and its counterpart (2.44) gives the pair Z L/2 −i2πnx/L ∞ X e ei2πnx/L √ f (x) = fn √ and fn = f (x) dx. (2.46) L L −L/2 n=−∞ Combining equations (2.41 & 2.42), we get |f i =
∞ X
|en , Lihe` , L|f i
(2.47)
n=−∞
which gives for the identity operator the resolution I=
∞ X
|en , Lihen , L|.
(2.48)
n=−∞
Thus, for functions that are suitably smooth and periodic with period L,
92
Fourier Series
Dirac’s delta function is given by the sum ∞ X
∞ X ei2πn(x−y)/L δ(x − y) = hx|yi = hx|I|yi = hx|en , Lihe` , L|yi = . L n=−∞ n=−∞ (2.49) Another way to arrive at the same result is to combine Eqs. (2.43 & 2.44):
f (x) =
∞ X n=−∞
Z L ∞ X ei2πnx/L i2πnx/L 1 e−i2πny/L f (y) dy (2.50) = fn √ e L L 0 n=−∞
or Z f (x) = 0
L
"
# Z L ∞ 1 X i2πn(x−y)/L δ(x − y) f (y) dy, (2.51) e f (y) dy = L n=−∞ 0
where δ(x − y) is Dirac’s delta function. Thus ∞ ∞ 1 X i2πn(x−y)/L 1 2X δ(x − y) = e = + cos[2πn(x − y)/L] L n=−∞ L L
(2.52)
n=1
is the stretched version of (2.11 & 2.14). Dirac’s delta function is a distribution, probably the most useful one. A distribution is a continuous linear map from a space of functions to a field of numbers. The delta function δ(x − y), for example, maps the function f (x) into the number f (y): Z f (y) = δ(x − y) f (x) dx. (2.53) It follows that an expression that represents a distribution on one space of functions must be changed to apply to another space of functions. That is why the representation (2.49 & 2.52) for Dirac’s delta function is valid for smooth functions that are periodic with period L, while the representation (2.14) applies to smooth periodic functions of period 2π. As one example of the use of this delta-function formula, let us consider two functions — f (x) with Fourier coefficients fn and g(x) with Fourier coefficients dn . Then using Eqs.(2.44) & (2.43) for f (x) and g(x), we may relate the integral of the product f ∗ (x) g(x) to a sum of products fn∗ dn of their Fourier coefficients: Z L Z L ∞ ∞ X X e−i2πny/L ei2πnx/L ∗ ∗ √ √ f (x) dy g(y). (2.54) fn dn = dx L L 0 0 n=−∞ n=−∞
93
2.6 Stretched Intervals
This sum contains Dirac’s delta function (2.52) and so Z L Z L ∞ ∞ X 1 X i2πn(x−y)/L ∗ ∗ e fn dn = dx dy f (x) g(y) L n=−∞ 0 0 n=−∞ Z L Z L = dx dy f ∗ (x) g(y) δ(x − y) 0 0 Z L = dx f ∗ (x) g(x) (2.55) 0
is the inner product (f, g). In particular, if the two functions are the same, then Z L ∞ X 2 dx |f (x)|2 (2.56) |fn | = 0
n=−∞
which is Parseval’s identity. Thus if a function is square integrable on an interval, then the sum of the squares of the absolute values of its Fourier coefficients converges to the integral of the square of its absolute value. Formulas (2.55 & 2.56) follow more simply from the orthonormality (2.40) of the vectors |en , Li on taking the inner product of the two expansions |f i = |gi =
∞ X n=−∞ ∞ X
fn |en , Li gn |en , Li
(2.57)
n=−∞
for we then have ∞ X
fn∗ gn
L
Z = hf |gi =
f ∗ (x)g(x) dx.
(2.58)
0
n=−∞
If the function f (x) is real, then on the interval (0, L) in place of Eqs.(2.30), (2.32), & (2.33), one has ∞ a0 X 2πnx 2πnx f (x) = + an cos + bn sin , (2.59) 2 L L n=1
L
2 L
Z
2 bn = L
Z
an =
dx cos
0
2πnx L
2πnx L
f (x),
(2.60)
f (x).
(2.61)
and L
dx sin
0
94
Fourier Series
The corresponding orthogonality relations, which (2.35), & (2.36), are: Z L 2πnx 2πmx L/2 sin = dx sin 0 L L 0 L/2 Z L 2πmx 2πnx dx cos cos = L L L 0 0 Z L 2πmx 2πnx dx sin cos = 0. L L 0
follow from Eqs.(2.34), if n = m 6= 0 otherwise,
(2.62)
if n = m 6= 0 if n = m = 0 otherwise, and
(2.63)
(2.64)
They hold for integer values of n and m, and they imply Eqs(2.59), (2.60), & (2.61).
2.7 Fourier Series in Several Variables On the interval [−L, L], the Fourier-series formulas (2.43 & 2.44) are ∞ X
eiπnx/L fn √ 2L n=−∞ Z L −iπnx/L e √ f (x) dx. fn = 2L −L
f (x) =
(2.65) (2.66)
We may generalize these equations from a single variable x to N variables x = (x1 . . . xN ) ∞ X
f (x) =
···
n1 =−∞
Z
∞ X nN =−∞
L
fn =
Z
L
dx1 . . . −L
fn
dxN −L
eiπn·x/L (2L)N/2 e−iπn·x/L f (x). (2L)N/2
(2.67) (2.68)
2.8 How Fourier Series Converge A Fourier series represents a function f (x) as the limit of a sequence of functions fN (x) given by fN (x) =
N X k=−N
fk
ei2πnx/L √ . L
(2.69)
2.8 How Fourier Series Converge
95
A sequence of functions fN (x) is said to converge to a function f (x) on an interval [a, b] if for every > 0 and each point x ∈ [a, b], there exists an integer N (, x) such that |f (x) − fN (x)| < for all
N > N (, x).
(2.70)
If one can find an N () that is independent of x ∈ [a, b], then the sequence of functions fN (x) is said to converge uniformly to f (x) on the interval [a, b]. A function f (x) is continuous on an interval [a, b] if for every point x ∈ [a, b] and every > 0, there is a δ > 0 such that |f (y) − f (x)| <
(2.71)
whenever |x−y| < δ. The function is uniformly continuous on the interval [a, b] if one can pick δ independently of x ∈ [a, b]. A function is piecewise continuous on [a, b] if it is continuous there except at a finite number of points. If the functions fN (x) of a uniformly convergent sequence are all continuous on an interval [a, b], then the sequence converges to a function f (x) that is continuous on that interval. The exponentials exp(i2πnx/L) (and the cosines and sines that are their real and imaginary parts) are continuous functions on the whole real axis (−∞, ∞). So any uniformly convergent Fourier series fN (x) defines a continuous function f (x). Hence, a function that is not continuous cannot be represented by a uniformly convergent Fourier series. But if a function f (x) is continuous on an interval [a, b], satisfies periodic boundary conditions f (b) = f (a), and has a derivative f 0 (x) that is piecewise continuous on [a, b], then the Fourier series fN (x) for f (x) will converge uniformly (and absolutely) to f (x) on [a, b] (Courant, 1937, p. 439). We shall give two examples of this result, known as Fourier’s convergence theorem, in section 2.9. Fourier series, however, can represent a much wider class of functions than those that are continuous. If a function f (x) is square integrable on an interval [a, b], then its Fourier series fN (x) will converge to f (x) in the mean, that is Z b lim dx|f (x) − fN (x)|2 = 0. (2.72) N →∞ a
What happens to the convergence of a Fourier series if we integrate or
96
Fourier Series
differentiate term by term? If we integrate the series ∞ X
f (x) =
n=−∞
fn
ei2πnx/L √ , L
(2.73)
then we get a series Z x ∞ L X fn i2πnx/L 0 0 √ dx f (x ) = c0 (x − a) − i F (x) = e − ei2πna/L 2π n=−∞ n L a (2.74) that converges better because of the extra factor of 1/n. In general, a function f (x) becomes smoother when it is integrated; this is why its Fourier series converges better. But if we differentiate the same series, then we get a series f 0 (x) = i
∞ 2π X n fn ei2πnx/L L3/2 n=−∞
(2.75)
that converges less well because of the extra factor of n. In general, a function becomes less smooth when it is differentiated; this is why its Fourier series converges less well.
2.9 Quantum-Mechanical Examples Suppose a particle if mass m is trapped in an infinite, one-dimensional square well of potential energy V (x) 0 if 0 < x < L V (x) = (2.76) ∞ otherwise. The hamiltonian operator is H=−
~2 d2 + V (x), 2m dx2
(2.77)
in which ~ is Planck’s constant divided by 2π. This tiny bit of action, ~ = 1.055 × 10−34 J s, sets the scale at which quantum mechanics becomes important; quantum-mechanical corrections to classical predictions are often important in processes whose action is less than ~. The eigenfunction ψ(x) of the hamiltonian H with energy E satisfies the equation Hψ(x) = Eψ(x),
(2.78)
97
2.9 Quantum-Mechanical Examples
which breaks into two simple equations: −
~2 d2 ψ(x) = Eψ(x) 2m dx2
for
0 0 and square integrable for Res > . The function F (t) = exp(kt) diverges exponentially for Rek > 0, but its Laplace transform Z ∞ Z ∞ 1 −st dt e−(s−k)t = dt e F (t) = f (s) = (3.169) s − k 0 0 is well defined for Res > k with a simple pole at s = k (as explained in Sec.5.10) and is square integrable for Res > k + . The Laplace transform of cosh kt and sinh kt are Z ∞ Z 1 ∞ s −st dt e cosh kt = dt e−st ekt + e−kt = 2 (3.170) f (s) = 2 0 s − k2 0 and Z
∞
−st
f (s) =
dt e 0
1 sinh kt = 2
Z
∞
dt e−st ekt − e−kt =
0
s2
k . (3.171) − k2
Similarly, the Laplace transform of cos ωt is Z Z ∞ s 1 ∞ −st dt e−st eiωt + e−iωt = 2 (3.172) f (s) = dt e cos ωt = 2 0 s + ω2 0 while that of the sin ωt is Z ∞ Z ∞ ω 1 dt e−st eiωt − e−iωt = 2 . f (s) = dt e−st sin ωt = 2i s + ω2 0 0 (3.173) Example 3.11 (Lifetime of a Fluorophore) Fluorophores are molecules that emit visible light when excited by light. The probability P (t, t0 ) that a fluorophore with a lifetime τ will emit a photon at time t if excited by a photon at time t0 is 0
P (t, t0 ) = A e−(t−t )/τ θ(t − t0 )
(3.174)
in which A is a constant and θ(t − t0 ) = (t − t0 + |t − t0 |)/(2|t − t0 |) is the Heaviside function. One way to measure the lifetime τ of a fluorophore is to modulate the exciting laser beam at a frequency ν = 2πω of the order of 60 MHz and to detect the phase-shift θ in the light L(t) emitted by the fluorophore. That light is the integral of P (t, t0 ) times the modulated beam sin ωt or equivalently the convolution of e−t/τ θ(t) with sin ωt Z ∞ Z ∞ 0 0 0 0 L(t) = P (t, t ) sin(ωt ) dt = A e−(t−t )/τ θ(t − t0 ) sin(ωt0 ) dt0 −∞ t
Z =
−∞ 0
A e−(t−t )/τ sin(ωt0 ) dt0 .
−∞
(3.175)
138
Fourier and Laplace Transforms
Letting u = t − t0 and using the trigonometric formula sin(a − b) = sin a cos b − cos a sin b
(3.176)
we may relate this integral to the Laplace transform of a sine (3.173) and a cosine (3.172) Z ∞ L(t) = − A e−u/τ sin ω(u − t) du 0 Z ∞ e−u/τ (sin ωu cos ωt − cos ωu sin ωt) du = −A 0 ω cos ωt sin(ωt)/τ − . (3.177) =A 1/τ 2 + ω 2 1/τ 2 + ω 2 p p Setting cos θ = (1/τ )/ 1/τ 2 + ω 2 and sin θ = ω/ 1/τ 2 + ω 2 , we have A A (sin ωt cos θ − cos ωt sin θ) = p sin(ωt − θ). L(t) = p 2 2 1/τ + ω 1/τ 2 + ω 2 (3.178) The phase-shift θ is then given by ω π θ = arcsin p ≤ . 2 2 2 1/τ + ω
(3.179)
One finds the lifetime of the fluorophore by measuring the phase-shift θ τ = (1/ω) tan θ
(3.180)
which is much easier than measuring the lifetime directly.
3.11 Derivatives and Integrals of Laplace Transforms If we differentiate Laplace transform f (s) using its definition (3.167), then we find Z ∞ dn f (s) = dt (−t)n e−st F (t) (3.181) dsn 0 which usually is well defined if f (s) is. For instance, if we differentiate the Laplace transform f (s) = 1/s of the function F (t) = 1 as given by (3.168), then we find Z ∞ n −1 n! n d s (−1) = n+1 = dt e−st tn (3.182) dsn s 0 which tells us that the Laplace transform of tn is n!/sn+1 .
3.12 Laplace Transforms and Differential Equations
139
The result of differentiating the function F (t) also has a simple form. Integrating by parts, we find for the Laplace transform of F 0 (t) Z ∞ Z ∞ d −st d −st −st 0 dt e F (t) = dt e F (t) − F (t) e dt dt 0 0 Z ∞ dt F (t) s e−st = − F (0) + 0
= − F (0) + s f (s).
(3.183)
The indefinite integral of the Laplace transform (3.167) is Z Z ∞ e−st 0 dt f (s) ≡ ds1 f (s1 ) = F (t) (−s) 0 and its nth indefinite integral is Z Z Z (n) f (s) ≡ dsn . . . ds1 f (s1 ) =
∞
dt
0
e−st F (t). (−s)n
(3.184)
(3.185)
If f (s) is a well-behaved function, then these indefinite integrals usually are well defined, except possibly at s = 0.
3.12 Laplace Transforms and Differential Equations Suppose we wish to solve the differential equation P (d/ds) f (s) = j(s). By writing f (s) and j(s) as the Laplace transforms Z ∞ f (s) = e−st F (t) dt 0 Z ∞ e−st J(t) dt. j(s) =
(3.186)
(3.187)
0
and using the formula (3.181) for the nth derivative of a Laplace transform, we see that the differential equation (3.186) amounts to Z ∞ Z ∞ −st e P (−t) F (t) dt. = e−st J(t) dt. (3.188) 0
0
which is equivalent to the algebraic equation F (t) =
J(t) . P (−t)
(3.189)
140
Fourier and Laplace Transforms
A particular solution to the inhomogeneous equation (3.186) is then the Laplace transform of this ratio Z ∞ J(t) e−st f (s) = dt. (3.190) P (−t) 0 The general solution of the associated homogeneous equation P (d/ds) f (s) = 0
(3.191)
is the Laplace transform Z f (s) =
∞
e−st δ(P (−t)) H(t) dt
(3.192)
0
in which the function H(t) is arbitrary. The general solution of the inhomogeneous equation (3.186) is the sum of the two Z ∞ Z ∞ −st J(t) dt + e−st δ(P (−t)) H(t) dt. (3.193) f (s) = e P (−t) 0 0 One easily may generalize this method to differential equations in n variables. Of course, to carry out this procedure, one must be able to find the inverse Laplace transform J(t) of the source function j(s) as outlined in the next section.
3.13 Inversion of Laplace Transforms How do we invert the Laplace transform Z ∞ f (s) = dt e−st F (t)?
(3.194)
0
First, we realize that this definition also works when s is replaced by the complex number x + is Z ∞ f (x + is) = dt e−(x+is)t F (t). (3.195) 0
Next, we choose x to be sufficiently positive that the integral Z ∞ Z ∞ Z ∞ ds (x+is)t ds 0 e f (x + is) = dt0 e(x+is)t e−(x+is)t F (t0 ) (3.196) 2π 2π −∞ −∞ 0
141
3.14 Problems
converges, and then we apply to it the delta-function formula (3.32) Z ∞ Z ∞ Z ∞ ds is(t−t0 ) ds (x+is)t 0 x(t−t0 ) 0 dt e F (t ) e f (x + is) = e −∞ 2π −∞ 2π Z0 ∞ 0 = dt0 ex(t−t ) F (t0 ) δ(t − t0 ) 0
= F (t).
(3.197)
So the inversion formula is F (t) = e
xt
Z
∞
−∞
ds ist e f (x + is) 2π
(3.198)
for sufficiently large x. Some call this inversion formula a Bromwich integral, others a Fourier-Mellin integral. 3.14 Problems 1. For the state |ψ, ti given by Eqs. (3.81 & 3.76), find the wave function ψ(x, t) = hx|ψ, ti at time t. Then find the dispersion of the position operator at that time. Does it grow as time goes by? 2. Show that the commutation relations (3.112) of the annihilation and creation operators imply the equal-time commutation relations (3.113) for the field φ and its conjugate momentum π. 3. Use the linearity of the diffusion equation and equations (3.129–3.132) to derive the general solution (3.133) of the diffusion equation. 4. Derive (3.152) from (3.150 & 3.151). 5. Derive (3.153) from (3.152). 6. Derive (3.154) from (3.153). 7. Use the Green’s function relations (3.145) and (3.146) to show that (3.154) satisfies (3.152).
4 Infinite Series
4.1 Convergence A sequence of partial sums SN =
N X
cn
(4.1)
n=0
converges to a number S if for every > 0, there exists an integer N () such that |S − SN | < for all
N > N ().
(4.2)
The number S is then said to be the limit of the convergent infinite series S=
∞ X
cn .
(4.3)
n=0
Some series that do not converge wander or oscillate; others diverge, heading for ±∞. A series whose absolute values converges S=
∞ X
|cn |
(4.4)
n=0
is said to converge absolutely. A convergent series that is not absolutely convergent is said to converge conditionally. Example 4.1 (Two Infinite Series) The series of inverse factorials converges to the number e = 2.718281828 . . . ∞ X 1 = e. n!
n=0
(4.5)
4.2 Tests of Convergence
143
But the harmonic series of inverse integers diverges ∞ X 1 →∞ n
(4.6)
n=1
as one may see by grouping its terms 1 1 1 1 1 1 1 1 1 1 1+ + + + · · · ≥ 1 + + + + . . . (4.7) + + + + 2 3 4 5 6 7 8 2 2 2 to form a series that obviously diverges.
4.2 Tests of Convergence The Cauchy criterion for the convergence of a sequence SN is that for every > 0 there is an integer N () such that for N > N () and M > N () one has |SN − SM | < .
(4.8)
Cauchy’s criterion is equivalent to the defining condition (4.2). Suppose the convergent series ∞ X
bn
(4.9)
n=0
has only positive terms, 0 ≤ bn , and that for all n, |cn | ≤ bn . Then the series ∞ X
cn
(4.10)
n=0
also (absolutely) converges. This test is called the comparison test. Similarly, if for all n, the inequality 0 ≤ cn ≤ bn holds and the series of numbers cn diverges, then so does the series of numbers bn . If for some N , the terms cn satisfy |cn |1/n ≤ x < 1
(4.11)
for all n > N , then the series ∞ X n=0
converges by the Cauchy root test.
cn
(4.12)
144
Infinite Series
In the ratio test of d’Alembert, the series cn+1 =r 1. Probably the most useful test is the Intel test in which one writes a computer program to sum the first N terms of the series and then runs it for N = 100, N = 10, 000, N = 1, 000, 000, etc., as seems appropriate.
4.3 Convergent Series of Functions A sequence of partial sums SN (z) =
N X
fn (z)
(4.14)
n=0
of functions fn (z) converges to a function S(z) on a set D if for every > 0 and every z ∈ D, there exists an integer N (, z) such that |S(z) − SN (z)| < for all
N > N (, z).
(4.15)
The numbers z may be real or complex. The function S(z) is said to be the limit on D of the convergent infinite series of functions S(z) =
∞ X
fn (z).
(4.16)
n=0
A sequence of partial sums SN (z) of functions converges uniformly on the set D if the integers N (, z) can be chosen to be independent of the point z ∈ D, that is, if for every > 0 and every z ∈ D, there exists an integer N () such that |S(z) − SN (z)| < for all
N > N ().
(4.17)
A real or complex-valued function f (x) of a real variable x is square integrable on an interval [a, b] if the integral Z b |f (x)|2 dx (4.18) a
exists and is finite. A sequence of partial sums SN (x) =
N X n=0
fn (x)
(4.19)
145
4.4 Power Series
of square-integrable functions fn (x) converges in the mean to a function S(x) if Z b |S(x) − SN (x)|2 dx = 0. (4.20) lim N →∞ a
Convergence in the mean sometimes is defined as Z b ρ(x) |S(x) − SN (x)|2 dx = 0. lim N →∞ a
(4.21)
in which ρ(x) ≥ 0 is a weight function that is positive except at isolated points where it may vanish. If the functions fn are real, then this definition of convergence in the mean takes the simpler form Z b ρ(x) [S(x) − SN (x)]2 dx = 0. (4.22) lim N →∞ a
4.4 Power Series A power series is a series of functions with fn (z) = cn z n S(z) =
∞ X
cn z n .
(4.23)
n=0
By the ratio test (4.13), this power series converges if cn+1 z n+1 cn+1 |z| = |z| lim ≡ lim 0) are nearly decoupled from those in the cytosol. 4.15 Problems 1. Test the following series for convergence: 1. ∞ X
(ln n)−2
n=2
2. ∞ X n! 20n
n=1
3. ∞ X n=1
1 n(n + 2)
4. ∞ X n=2
1 . n ln n
In each case, say whether the series converges and how you found out.
170
Infinite Series
2. Olber’s paradox: Assume a static universe with a uniform density of stars. With you at the origin, divide space into successive shells of thickness t, and assume that the stars in each shell subtend the same solid angle ω (as follows from the first assumption). Take into account the occulting of distant stars by nearer ones and show that the total solid angle subtended by all the stars would be 4π. The sky would be dazzlingly bright at night. 3. (a) Find the radius of convergence of the series (4.36) for the Bessel function Jn (ρ). (b) Show that this series converges even faster than the one (4.34) for the exponential function. 4. Derive the approximation (9.85) for j` (ρ) for |ρ| 1. 5. Derive formula (4.50) for the gamma function from its definition (5.125). 6. Use formula (4.50) to compute Γ(1/2). 7. Show that z! = Γ(z + 1) diverges when z is a negative integer. 8. Derive formula (4.52) for Γ((2n + 1)/2). 9. Show that the area of the surface of the unit sphere in d dimensions is Ad = 2π d/2 /Γ(d/2).
(4.184)
Hint: Compute the integral of the gaussian exp(−x2 ) in d dimensions using both rectangular and spherical coordinates. This formula (4.184) is used in dimensional regularization (Weinberg, 1995, p. 477). 10. Derive 4.72) from 4.70) and (4.71) from (4.67). √ √ 11. Derive the expansions (4.76 & 4.77) for 1 + x and 1/ 1 + x. 12. Find the first three Bernoulli polynomials Bn (y) by using their generating function (4.99). 13. How are the two definitions (4.97) and (4.100) of the Bernoulli numbers related? 14. Show that the Lerch transcendent Φ(z, s, α) defined by the series (4.101) converges when |z| < 1 and 0 and 0. 15. Langevin’s classical formula for the electrical polarization of a gas or liquid of molecules of electric dipole moment p is cosh x 1 P (x) = N p − (4.185) sinh x x
4.15 Problems
171
where x = pE/(kT ), E is the applied electric field, and N is the number density of the molecules per unit volume. (a) Expand P (x) as an infinite power series involving the Bernoulli numbers for small x. (b) What are the first three terms expressed in terms of familiar constants? (c) Find the saturation limit of P (x) as x → ∞. 16. Derive the closed form (4.161) from the sum above it.
5 Complex-Variable Theory
5.1 Analytic Functions A complex-valued function f (z) of a complex variable z is differentiable at z with derivative f 0 (z) if the limit f 0 (z) = lim 0
z →z
f (z 0 ) − f (z) z0 − z
(5.1)
exists as z 0 approaches z in the complex plane. The interesting point is that the limit must exist no matter how or from what direction the variable z 0 approaches z. If the function f (z) is differentiable in a small region, e.g., a small disk, around a point z0 , then f (z) is said to be analytic at z0 . Example, Polynomials: Suppose f (z) = z n is an integral power of z. Then for tiny dz and z 0 = dz + z, the difference f (z 0 ) − f (z) is f (z 0 ) − f (z) = (z + dz)n − z n ≈ nz n−1 dz
(5.2)
and so the limit lim 0
z →z
f (z 0 ) − f (z) nz n−1 dz = lim = nz n−1 dz→0 z0 − z dz
(5.3)
exists independently of the direction from which z 0 approaches z. Thus the function z n is analytic for all z with derivative dz n = nz n−1 . dz A function that is everywhere analytic is entire. All polynomials P (z) =
N X n=0
cn z n
(5.4)
(5.5)
5.2 Cauchy’s Integral Theorem
173
are entire.
5.2 Cauchy’s Integral Theorem Suppose f (z) is analytic at z0 . Then to first order in z − z0 f (z) = f (z0 ) + f 0 (z0 ) (z − z0 )
(5.6)
near z0 . Let’s compute the contour integral of f (z) along a small circle of radius and center z0 . The points on the contour are z = z0 + eiθ
(5.7)
for θ ∈ [0, 2π]. So dz = i eiθ dθ, and the contour integral is I Z 2π f (z) dz = f (z0 ) + f 0 (z0 ) (z − z0 ) i eiθ dθ.
(5.8)
0
Since z − z0 = eiθ , the contour integral breaks into two pieces I Z 2π Z 2π iθ 0 f (z) dz = f (z0 ) i e dθ + f (z0 ) eiθ i eiθ dθ 0
(5.9)
0
which vanish because the θ-integrals are zero. So the contour integral of the analytic function f (z) I f (z) dz = 0 (5.10) is zero around the tiny circle — at least to order 2 . What about the contour integral of an analytic function f (z) around a tiny square of size ? Again, we use the analyticity of f (z) at z = z0 to expand it as f (z) = f (z0 ) + f 0 (z0 ) (z − z0 )
(5.11)
on the tiny square. Now the contour consists of the four complex segments dzn are , i , −, and −i , so that dzn = in−1 for n = 1, 2, 3, 4. The centers of these segments are displaced from z0 by −i /2, /2, i /2, and −/2, so zn − z0 = in+2 /2. The contour integral of f (z) around the square then is I f (z) dz =
4 X n=1
f (zn ) dzn =
4 X
f (z0 ) + (zn − z0 ) f 0 (z0 ) dzn .
(5.12)
n=1
Again the contour integral splits into two pieces, one involving f (z0 ), and
174
Complex-Variable Theory
the other involving the derivative f 0 (z0 ) I f (z) dz = f (z0 ) I1 + f 0 (z0 ) I2 .
(5.13)
The four complex segments dzn add up to a path that goes around the square and ends where it started, so the first piece f (z0 )I1 is zero f (z0 ) I1 = f (z0 ) [ + i + (−) + (−i )] = 0.
(5.14)
And so is the second one f 0 (z0 )I2 f 0 (z0 ) I2 = f 0 (z0 )
4 X
(zn − z0 ) dzn
n=1
= f 0 (z0 ) [(−i /2) + (/2) i + (i /2) (−) + (−/2) (−i )] = f 0 (z0 ) (2 /2) [(−i ) + i + (−i ) + i] = 0.
(5.15)
So the contour integral of an analytic function f (z) around a tiny square of side is zero to order 2 . Thus, the integral around such a square can be at most of order 3 . This is very important. We’ll use it to prove Cauchy’s integral theorem. Let’s now consider a function f (z) that is analytic on a square of side L, as pictured in Fig. 5.1. The contour integral of f (z) around the square can be expressed as the sum of L2 /2 contour integrals around tiny squares of side . All interior integrals cancel, leaving the integral around the perimeter. Each contour integral around its tiny square is at most of order 3 . So the sum of the L2 /2 tiny contour integrals is at most (L2 /2 ) 3 = L2 , which vanishes as → 0. Thus the contour integral of a function f (z) along the perimeter of a square of side L vanishes if f (z) is analytic on the perimeter and inside the square. This is an example of Cauchy’s integral theorem. Suppose a function f (z) is analytic in a region R and that I is a contour integral along a straight line within that region from z1 to z2 . The contour integral of f (z) around any square inside the region R of analyticity is zero. So by successively adding contour integrals around small squares to the straight-line contour integral, one may deform the straight-line contour into an arbitrary contour from z1 to z2 without changing its value. So if a function f (z) is analytic in a region R, then its integral I along a contour C from z1 to z2 remains invariant as we continuously deform its contour C to C 0 as long as these contours and all the intermediate contours lie entirely within the region R and the end points z1 and z2 are kept fixed
175
5.2 Cauchy’s Integral Theorem
Figure 5.1 A contour integral around a big L × L square is equal to the sum of the contour integrals around the L2 /2 tiny × squares that tile the big square.
and lie within R: Z
z2
I=
Z
z2
dz f (z) = z1 C
dz f (z).
(5.16)
z1 C 0
Thus a contour integral depends upon its end points and upon the function f (z) but not upon the actual contour as long as the deformations of the contour do not push it outside of the region R of analyticity. If the end points z1 and z2 are the same, then the contour C is closed, and
176
Complex-Variable Theory
Figure 5.2 As long as the four contours are within the domain of analyticity and have the same end-points, the value of the contour integral is unchanged.
we write the integral as I
z1
I=
I dz f (z) ≡
z1 C
dz f (z)
(5.17)
C
with a little circle to denote that the contour is a closed loop. The value of that integral is independent of the contour as long as our deformations of the contour keep it within the domain of analyticity of the function and as long as the contour starts and ends at z1 = z2 . Now suppose that the function f (z) is analytic at all points within the contour. Then we can shrink the contour, staying within the domain of analyticity of the function, until the area enclosed is zero and the contour is of zero length — all this without changing the value of the integral. But the value of the integral along this null contour of zero length is zero. Thus the value of the original contour integral also must be zero I
z1
dz f (z) = 0.
(5.18)
z1 C
Thus we arrive at Cauchy’s integral theorem: The contour integral of a function f (z) around a closed contour C lying entirely within the domain R
5.2 Cauchy’s Integral Theorem
of analyticity of the function vanishes I dz f (z) = 0
177
(5.19)
C
as long as the function f (z) is analytic at all points within the contour. A region in the complex plane is simply connected if we can shrink every loop in the region to a point while keeping the loop in the region. A slice of American cheese is simply connected, but a slice of Swiss cheese is not. A dime is simply connected, but a washer isn’t. The surface of a sphere is, but the surface of a bagel isn’t. With this definition, we can restate the integral theorem of Cauchy (1789– 1857): The contour integral of a function f (z) around a closed contour C vanishes I dz f (z) = 0
(5.20)
C
if the contour lies within a simply connected domain of analyticity of the function. If a region R is simply connected, then we may deform any contour C from z1 to z2 in R into any other contour C 0 from z1 to z2 in R while keeping the moving contour in the region R. So another way of understanding the Cauchy integral theorem is to ask, What is the value of the contour integral Z IM =
z1
dz f (z)?
(5.21)
z2 C 0
This integral is the same as the integral along C 0 from z1 to z2 , except for the sign of the dz’s and the order in which the terms are added. Thus Z z1 Z z2 IM = dz f (z) = − dz f (z). (5.22) z2 C 0
z1 C 0
Now consider a closed contour running along the contour C from z1 to z2 and backwards along C 0 from z2 to z1 all within a simply connected region R of analyticity. Since IM = −I, the integral of f (z) along this closed contour vanishes: I dz f (z) = I + I 0 = I − I = 0
(5.23)
and we have again derived Cauchy’s integral theorem. Example: Every polynomial P (z) =
N X n=0
cn z n
(5.24)
178
Complex-Variable Theory
is analytic everywhere, and so the integral of any polynomial along any closed contour C is zero I P (z) dz = 0. (5.25)
5.3 Cauchy’s Integral Formula Let’s now consider a simply connected region R in which a function f (z) is analytic. Pick a point z0 well inside the region R . We will compute the value of a closed contour integral around the point z0 . We take the contour to be a tiny counterclockwise circle of radius with center at the point z0 . Then the points on the contour are z = z0 + eiθ
(5.26)
for θ ∈ [0, 2π], and so dz = i eiθ dθ. As our function in this contour integral I0 , we’ll pick not f (z) but f (z)/(z − z0 ). So since z − z0 = eiθ , the contour integral I0 in the limit → 0 is I Z 2π [f (z0 ) + f 0 (z0 ) (z − z0 )] f (z) I0 = dz = i eiθ dθ z − z0 z − z0 0 Z 2π f (z0 ) + f 0 (z0 ) eiθ i eiθ dθ = iθ e 0 Z 2π h i (5.27) f (z0 ) + f 0 (z0 ) eiθ idθ = 2πi f (z0 ) = 0
since the θ-integral involving f 0 (z0 ) vanishes. This result I 1 f (z) f (z0 ) = dz 2πi z − z0
(5.28)
is a miniature version of Cauchy’s integral formula, in which the subscript means that the contour is around a tiny circle. Now consider a big counterclockwise contour C 0 within the simply connected region R in which a function f (z) is analytic as in Fig. 5.3. The whole contour lies within a simply connected part of R in which the ratio f (z)/(z − z0 ) is analytic. So the integral along the contour C 0 vanishes I f (z) 1 dz. (5.29) 0= 2πi C 0 z − z0 We let the two straight-line segments approach each other so that they
179
5.3 Cauchy’s Integral Formula
z
Figure 5.3 The full contour is the sum of a big ccw contour and a small cw contour, both around z0 .
cancel. Now split the contour C 0 into a contour C that is a big counterclockwise contour around z0 and a tiny clockwise circle of radius around z0 . That tiny clockwise-circle integral is the negative of the corresponding counterclockwise-circle integral, so I I I 1 f (z) 1 f (z) 1 f (z) 0= dz = dz − dz. (5.30) 2πi C 0 z − z0 2πi C z − z0 2πi z − z0 Using the miniature result (5.28), we find I f (z) 1 f (z0 ) = dz 2πi C z − z0
(5.31)
which is Cauchy’s integral formula. Let’s now use this formula to compute the first derivative f 0 (z) of f (z): f (z + dz) − f (z) dz I 1 1 1 1 0 0 = dz f (z ) − 2πi dz z 0 − z − dz z 0 − z I 1 f (z 0 ) = dz 0 0 . 2πi (z − z − dz)(z 0 − z)
f 0 (z) =
(5.32)
So in the limit dz → 0, we have 1 f (z) = 2πi 0
I
dz 0
f (z 0 ) (z 0 − z)2
(5.33)
180
Complex-Variable Theory
which is what we’d have gotten had we simply differentiated behind the integral sign. If we continue in this way, we find that the second derivative f (2) (z) of f (z) is I 2 f (z 0 ) f (2) (z) = dz 0 0 . (5.34) 2πi (z − z)3 And the nth derivative f (n) (z) is f
(n)
n! (z) = 2πi
I
dz 0
f (z 0 ) , (z 0 − z)n+1
(5.35)
where the contour runs counter-clockwise about the point z and all the points inside the contour are within the domain R in which f (z) is analytic. Thus, if a function f (z) is analytic in a region R, then it is infinitely differentiable there. Example 5.1 (Bessel Functions of the First Kind) A simple application of Cauchy’s integral formula is the counter-clockwise integral around the unit circle z = eiθ of the ratio z m /z n in which both m and n are integers I Z 2π Z 2π 1 zm 1 1 dz n = ieiθ dθ ei(m−n)θ = dθ ei(m+1−n)θ . (5.36) 2πi z 2πi 0 2π 0 If m + 1 − n 6= 0, this integral vanishes because exp 2πi(m + 1 − n) = 1 " #2π Z 2π 1 1 ei(m+1−n)θ i(m+1−n)θ dθ e = = 0. (5.37) 2π 0 2π i(m + 1 − n) 0
If m + 1 − n = 0, the exponential is unity exp i(m + 1 − n)θ = 1, and the integral is 2π/2π = 1. Thus, the original integral is the Kronecker delta I 1 zm dz n = δm+1,n . (5.38) 2πi z The generating function for Bessel functions Jm of the first kind is e
t(z−1/z)/2
=
∞ X
z m Jm (t).
(5.39)
m=−∞
Applying our integral formula (5.38) to it, we find I I ∞ X 1 1 1 zm dz et(z−1/z)/2 n+1 = dz J (t) n+1 m 2πi z 2πi z m=−∞ =
∞ X m=−∞
δm+1,n+1 Jm (t) = Jn (t).
(5.40)
5.4 The Cauchy-Riemann Conditions
Thus letting z = eiθ , we have " # Z 2π eiθ − e−iθ 1 dθ exp t − inθ Jn (t) = 2π 0 2 or more simply (problem 1) Z 2π Z 1 1 π Jn (t) = dθ ei(t sin θ−nθ) = dθ cos(t sin θ − nθ). 2π 0 π 0
181
(5.41)
(5.42)
5.4 The Cauchy-Riemann Conditions Why do the real and imaginary parts u and v of an analytic function f (z) = f (x + iy) = u(x, y) + iv(x, y)
(5.43)
satisfy the Cauchy-Riemann conditions ∂v ∂u = ∂x ∂y
and
∂u ∂v =− ? ∂y ∂x
(5.44)
The reason is that the derivative f 0 is independent of the direction from which dz → 0. If we use subscripts for partial differentiation, ux = ∂u/∂x and uy = ∂u/∂y, etc., then the derivative in the x-direction is df = (ux + ivx ) dx while that in the iy-direction is df 1 = (uy + ivy ) = −i (uy + ivy ) . diy i
(5.45)
(5.46)
Equating the two directional derivatives, we get the Cauchy-Riemann conditions ux = vy
and uy = −vx .
(5.47)
The directions in the x-y plane in which the real u(x, y) and imaginary v(x, y) parts of an analytic function increase most rapidly are given by the vectors (ux , uy ) and (vx , vy ). The Cauchy-Riemann conditions (5.47) imply that these directions must be perpendicular (ux , uy ) · (vx , vy ) = ux vx + uy vy = vy vx − vx vy = 0.
(5.48)
The Cauchy-Riemann conditions (5.47) and a two-dimensional version of
182
Complex-Variable Theory
Stokes’s theorem provide an alternative proof of Cauchy’s integral theorem (5.19). We recall that Stokes’s theorem says that the line integral of a vector field V(r) along a closed contour C is equal to the surface integral of the curl of V(r) over any surface bounded by the contour Z Z (∇ × V) · da. (5.49) V · dr = S
C
The direction of the normal vector da is given by the right-hand rule: If one curls the index finger of one’s right hand parallel to a representative fraction of the contour, then one’s right thumb will point more or less in the direction of da. Let us try to express the line integral of a not necessarily analytic function f (x, y) = u(x, y) + iv(x, y) along a closed ccw contour C as an integral over the surface enclosed by the contour. The contour integral is I I I (u + iv)(dx + idy) = (u dx − v dy) + i (v dx + u dy). (5.50) C
C
C
Now since the contour C is counterclockwise, the differential dx is negative at the top of the curve with coordinates (x, y+ (x) and positive at the bottom (x, y− (x). So the first line integral is the surface integral I Z u dx = [u(x, y− (x)) − u(x, y+ (x))] dx C # Z "Z y+ (x)
uy (x, y)dy dx
= − y− (x)
Z = −
Z uy |dxdy| = −
uy da
(5.51)
in which da = |dxdy| is a positive element of area. Similarly, we find I Z Z i v dx = −i vy |dxdy| = −i vy da. (5.52) C
The dy integrals are then: I Z Z − v dy = − vx |dxdy| = − vx da IC Z Z i u dy = i ux |dxdy| = i ux da.
(5.53) (5.54)
C
Combining (5.50–5.54), we find I Z Z (u + iv)(dx + idy) = − (uy + vx ) da + i (−vy + ux ) da. C
(5.55)
5.4 The Cauchy-Riemann Conditions
183
This formula holds whether or not the function f (x, y) is analytic. But if f (x, y) is analytic on and within the contour C, then it satisfies the CauchyRiemann conditions (5.47) within the contour, and so both surface integrals vanish. The contour integral then is zero, which is Cauchy’s integral theorem (5.19). The extent to which the contour integral of the function f (x, y) = u(x, y)+ iv(x, y) differs from zero, its value if f (x, y) is analytic in z = x+iy, is related to surface integrals of uy + vx and ux − vy 2 Z 2 Z 2 2 I I f (z)dz = (u + iv)(dx + idy) = (uy + vx )da + (ux − vy )da C
C
(5.56) which vanish when f = u + iv satisfies the Cauchy-Riemann conditions (5.47). Example: The Integral of Nonanalytic Function. The integral formula (5.55) can be useful for evaluating contour integrals of functions that are not analytic. The function f (x, y) =
1 1 x + iy + i 1 + x2 + y 2
(5.57)
is the product of an analytic function 1/z and a nonanalytic real one r(x, y) = 1/(z ∗ z) . The real and imaginary parts of f are U (x, y) = u(x, y) r(x, y) =
x2
x 1 2 + (y + ) 1 + x2 + y 2
(5.58)
x2
−y − 1 2 + (y + ) 1 + x2 + y 2
(5.59)
and V (x, y) = v(x, y) r(x, y) =
in which 1/z = u + iv. Let’s use Eq.(5.55) to compute the contour integral I of f along the real axis from −∞ to ∞ and then around the upper half plane along z = x + iy = Reiθ as θ runs from 0 to π and R → ∞ I Z ∞ Z ∞ I = f (x, y) dz = dx dy [−Uy − Vx + i (−Vy + Ux )] . (5.60) −∞
0
The contour integral around the UHP is an example of the ghost contours discussed in Sec.(5.13). Because u and v satisfy the Cauchy-Riemann conditions (5.47), the terms in the area integral simplify to −Uy − Vx = −ury − vrx
(5.61)
−Vy + Ux = −vry + urx .
(5.62)
and
184
Complex-Variable Theory
So the integral I is Z I=
∞
∞
Z
dx
−∞
dy [−ury − vrx + i(−vry + urx )]
(5.63)
0
or explicitly Z
∞
I=
Z
dx −∞
∞
dy 0
−2x − 2i(x2 + y 2 + y) [x2 + (y + )2 ] (1 + x2 + y 2 )2
.
(5.64)
We let → 0 and find Z I = −2i
∞
Z
−∞
∞
dy
dx 0
1 (1 +
x2
+ y 2 )2
.
Changing variables to ρ2 = x2 + y 2 , we have Z ∞ Z ∞ ρ d 1 dρ I = −4πi dρ = 2πi = −2πi. 2 2 (1 + ρ ) dρ 1 + ρ2 0 0
(5.65)
(5.66)
5.5 Harmonic Functions Let’s use the Cauchy-Riemann conditions (5.47) to compute the laplacian of the real part u of an analytic function f . First, the second x-derivative uxx is uxx = vyx = vxy = −uyy .
(5.67)
So the real part u of an analytic function f is a harmonic function uxx + uyy = 0
(5.68)
or equivalently one with a vanishing laplacian. Similarly, vxx = −uyx = −vyy
(5.69)
so the imaginary part of an analytic function f also is a harmonic function vxx + vyy = 0.
(5.70)
A harmonic function h(x, y) can have saddle points, but not local minima or maxima because at a local minimum of both hxx > 0 and hyy > 0, while at a local maximum both hxx < 0 and hyy < 0. So in its domain of analyticity, the real and imaginary parts of an analytic function f have neither minima nor maxima. For static fields, the electrostatic potential φ(x, y, z) is a harmonic function
5.5 Harmonic Functions
185
of the three spatial variables x, y , and z in regions that are free of charge because the electric field is E = −∇φ
(5.71)
∇·E=0
(5.72)
and its divergence vanishes
where the charge density is zero. Thus the laplacian of the electrostatic potential φ(x, y, z) vanishes ∇ · ∇φ = φxx + φyy + φzz = 0
(5.73)
and so φ(x, y, z) is harmonic. Note that in the presence of point charges, the location of each positive charge is a local maximum of the electrostatic potential φ(x, y, z) and the location of each negative charge is a local minimum of φ(x, y, z). But in the absence of charges, the electrostatic potential has neither local maxima nor local minima. For this reason, one cannot trap charged particles with an electrostatic potential, a result known as Earnshaw’s theorem. We have seen that the real and imaginary parts of an analytic function are harmonic functions with two-dimensional gradients that are mutually perpendicular. And we know that the electrostatic potential is a harmonic function. Thus the real part u(x, y) (or the imaginary part v(x, y)) of any analytic function f (z) = u(x, y) + iv(x, y) describes the electrostatic potential φ(x, y) for some electrostatic problem that does not involve the third spatial coordinate z. The surfaces of constant u(x, y) are the equipotential surfaces, and since the two gradients are orthogonal, the surfaces of constant v(x, y) are the electric field lines. Examples of Two-dimensional Potentials: The function f (z) = Ez = Ex + iEy
(5.74)
has u = Ex and v = Ey. It may represent a potential V (x, y, z) = Ex for which the electric-field lines E = −E x ˆ are lines of constant y. Equivalently, it may represent a potential V (x, y, z) = Ey in which E points in the negative y-direction, which is to say along lines of constant x. Another simple example is the function f (z) = z 2 = x2 − y 2 + 2ixy
(5.75)
for which u = x2 − y 2 and v = 2xy. This function gives us a potential V (x, y, z) whose equipotentials are the hyperbolas u = x2 − y 2 = c2
(5.76)
186
Complex-Variable Theory
and whose electric-field lines are the perpendicular hyperbolas v = 2xy = d2 .
(5.77)
Equivalently, we may take these last hyperbolas (5.77) to be the equipotentials and the other ones (5.77) to be the lines of the electric field. As a third example, the function of z = reiθ = exp(ln r + iθ) λ λ ln z = − (ln r + iθ) (5.78) 2π0 2π0 p describes the potential V (x, y, z) = −(λ/(2π0 )) ln x2 + y 2 due to a line of charge per unit length λ = q/L. The electric-field lines are the lines of constant v, that is, of constant θ f (z) = u(x, y) + iv(x, y) = −
E=
λ (x, y, 0) . 2π0 x2 + y 2
(5.79)
5.6 Taylor Series for Analytic Functions Let’s consider the contour integral of the function f (z 0 )/(z 0 − z) along a circle C inside the region R in which f (z) is analytic. For any point z inside the circle, Cauchy’s integral formula (5.31) tells us that I 1 f (z 0 ) 0 f (z) = dz . (5.80) 2πi C z 0 − z Now, we rewrite the denominator z 0 − z in terms of the center z0 of the circle: I 1 f (z 0 ) f (z) = dz 0 . (5.81) 2πi C z 0 − z0 − (z − z0 ) Next, we factor the denominator I 1 f (z 0 ) f (z) = 2πi C (z 0 − z ) 1 − 0
z−z0 z 0 −z0
dz 0 .
(5.82)
Now by the drawing (5.4), the modulus of the ratio (z − z0 )/(z 0 − z0 ) is less than unity, so the power series ∞ z − z0 −1 X z − z0 n 1− 0 = (5.83) z − z0 z 0 − z0 n=0
converges absolutely and uniformly on the circle by (4.23–4.26). We are
187
5.6 Taylor Series for Analytic Functions
z
z
0
z
Figure 5.4 Contour integral for Taylor series.
therefore allowed to integrate the series f (z) =
1 2πi
I C
∞ f (z 0 ) dz 0 X z − z0 n z 0 − z0 z 0 − z0
(5.84)
f (z 0 ) dz 0 . (z 0 − z0 )n+1
(5.85)
n=0
term by term f (z) =
∞ X n=0
1 (z − z0 ) 2πi n
I C
By Eq.(5.35), the integral is just the nth derivative f (n) (z) divided by n-
188
Complex-Variable Theory
factorial. Thus the function f (z) possesses the Taylor series ∞ X (z − z0 )n (n) f (z) = f (z0 ) n!
(5.86)
n=0
which converges as long as the point z is inside a circle centered at z0 that lies within a simply connected region R in which f (z) is analytic.
5.7 Cauchy’s Inequality Suppose a function f (z) is analytic in a region that includes the disk with perimeter z = R eiθ
(5.87)
|f (z)| ≤ M
(5.88)
and that f (z) is bounded by
on that circle. Then by using Cauchy’s integral formula (5.35), we may bound the nth derivative f (n) (0) of f (z) at z = 0 by I |f (z)||dz| n! (n) |f (0)| ≤ 2π |z|n+1 Z 2π R dθ n!M n!M = n (5.89) ≤ n+1 2π 0 R R which is called Cauchy’s inequality.
5.8 Liouville’s Theorem Suppose now that f (z) is analytic everywhere (such functions are called entire) and is bounded by |f (z)| ≤ M
for all |z| ≥ R0 .
(5.90)
Then by applying Cauchy’s inequality (5.89) at successively larger values of R, we have n!M |f (n) (0)| ≤ lim = 0 for n ≥ 1 (5.91) R→∞ Rn which shows that every derivative of f (z) vanishes f (n) (0) = 0
for
n≥1
(5.92)
5.9 The Fundamental Theorem of Algebra
189
at z = 0. But then the Taylor series (4.53) about z = 0 for the function f (z) consists of only a single term, and f (z) is a constant ∞ X z n (n) f (z) = f (0) = f (0) (0) = f (0). n!
(5.93)
n=0
In particular, any function that is everywhere bounded and analytic is a constant, which is Liouville’s theorem. 5.9 The Fundamental Theorem of Algebra Gauss applied Liouville’s theorem to the function −1 N X 1 = cj z j f (z) = PN (z)
(5.94)
j=0
which is the inverse of an arbitrary polynomial of order N . Suppose that the polynomial PN (z) had no zero, that is, no root anywhere in the complex plane. Then f (z) would be analytic everywhere. Moreover, for sufficiently large |z|, the polynomial PN (z) is approximately PN (z) ≈ cN z N , and so f (z) is bounded by something like |f (z)| ≤
1 ≡M |cN |R0N
for all |z| ≥ R0 .
(5.95)
So if PN (z) had no root, then the function f (z) would be analytic everywhere and would satisfy condition (5.90); hence f (z) would be a constant, Eq.(5.93). But of course, f (z) = 1/PN (z) is not a constant unless N = 0. Thus, the polynomial PN (z) must have a root, a pole of f (z), so that f (z) is not entire; this is the only exit from the contradiction. If the root of PN (z) is at z = z1 , then PN (z) = (z − z1 ) PN −1 (z), in which PN −1 (z) is a polynomial of order N − 1, and we may repeat the argument for its reciprocal f1 (z) = 1/PN −1 (z). In this way, one arrives at P j the fundamental theorem of algebra: Every polynomial PN (z) = N j=0 cj z has N roots somewhere in the complex plane PN (z) = cN
N Y
(z − zj ).
(5.96)
j=1
5.10 Laurent Series Consider a function f (z) that is analytic between an outer circle C1 of radius R1 and an inner circle C2 of radius R2 , as in Fig. 5.5. We will integrate f (z)
190
Complex-Variable Theory
* * z
*
*
z0
*
*
Figure 5.5 The contour consisting of two concentric circles with center at z0 encircles the point z in a counter-clockwise sense. The asterisks are poles or other singularities of the function f (z).
along a contour C12 that encircles the point z in a counter-clockwise fashion by following C1 counter-clockwise and C2 clockwise and a line joining them in both directions. By Cauchy’s integral formula (5.31), this contour integral yields f (z) I 1 f (z 0 )dz 0 f (z) = . (5.97) 2πi C12 z 0 − z The integrations in opposite directions along the line joining C1 and C2 cancel, and we are left with a counter-clockwise integral around the outer circle C1 and a clockwise one around C2 or minus a counter-clockwise integral around C2 I I 1 f (z 0 )dz 0 1 f (z 00 )dz 00 f (z) = − . (5.98) 2πi C1 z 0 − z 2πi C2 z 00 − z Now from the figure (5.5), the center z0 of the two concentric circles is
5.10 Laurent Series
191
closer to the points z 00 on the inner circle C2 than it is to z and also closer to z than to the points z 0 on C1 00 z − z0 < 1 and z − z0 < 1. (5.99) z − z0 z 0 − z0 To use these inequalities, as we did in Eq.(5.83), we add and subtract z0 from each of the denominators and absorb the minus sign before the second integral into its denominator I I f (z 0 )dz 0 f (z 00 )dz 00 1 1 f (z) = + . (5.100) 2πi C1 z 0 − z0 − (z − z0 ) 2πi C2 z − z0 − (z 00 − z0 ) After factoring the two denominators I f (z 0 )dz 0 1 f (z) = 2πi C1 (z 0 − z0 ) [1 − (z − z0 )/(z 0 − z0 )] I 1 f (z 00 )dz 00 + 2πi C2 (z − z0 ) [1 − (z 00 − z0 )/(z − z0 )]
(5.101)
we expand them, as in Eq.(5.83), in power series that converge absolutely and uniformly on the two contours I ∞ X f (z 0 )dz 0 1 f (z) = (z − z0 )n 2πi C1 (z 0 − z0 )n+1 n=0 I ∞ X 1 1 (z 00 − z0 )m f (z 00 )dz 00 . (5.102) + (z − z0 )m+1 2πi C2 m=0
Having removed the point z from the two integrals, we now we apply cosmetics. Since the functions being integrated are analytic between the two circles, we may shift them to a common counter-clockwise (ccw) contour C about any circle of radius R2 ≤ R ≤ R1 between the two circles C1 and C2 . Then we set m = −n − 1, or n = −m − 1, so as to combine the two sums into one sum on n from −∞ to ∞ I ∞ X f (z 0 )dz 0 n 1 f (z) = (z − z0 ) (5.103) 2πi C (z 0 − z0 )n+1 n=−∞ which is called a Laurent series. The Laurent series (5.103) is often written as f (z) =
∞ X n=−∞
an (z0 ) (z − z0 )n
(5.104)
192
Complex-Variable Theory
with 1 an (z0 ) = 2πi
I C
f (z) dz . (z − z0 )n+1
(5.105)
In particular, the coefficient a−1 (z0 ) is called the residue of the function f (z) at z0 . It is of special significance in contour integrals, as we shall see in Sec.5.12. Most functions have Laurent series that start at some least integer L f (z) =
∞ X
an (z0 ) (z − z0 )n
(5.106)
n=L
rather than at −∞. For such functions, we can pick off the coefficients an one by one without doing the integrals (5.105). The first one aL is the limit aL (z0 ) = lim (z − z0 )−L f (z). z→z0
(5.107)
The second is given by aL+1 (z0 ) = lim (z − z0 )−L+1 f (z) − (z − z0 )L aL (z0 ) . z→z0
(5.108)
The third requires two subtractions, etc. This is a reasonable place to gather some definitions: A function f (z) that is analytic for all z is called entire or, equivalently, holomorphic. Entire functions have no singularities, except possibly as |z| → ∞, which is called the “point at infinity.” A function f (z) has an isolated singularity at z0 if it is analytic in a small disk about z0 but not analytic that point. A function f (z) has a pole of order n > 0 at a point z0 if (z − z0 )n f (z) is analytic at z0 but (z − z0 )n−1 f (z) has an isolated singularity at z0 . A pole of order n = 1 is called a simple pole. Poles are isolated singularities. A pole of infinite order is called an essential singularity. If a function f (z) has an essential singularity at z0 , then its Laurent series (5.103) really runs from n = −∞ and not from n = L as in (5.106). Essential singularities are spooky: if a function f (z) has an essential singularity at w, then inside every disk around w, f (z) takes on every complex number, except possibly one, an infinite number of times—a result due to Picard (1856–1941). A function is said to be meromorphic if it is analytic for all z except for poles. Example: The function f (z) = exp(1/z) has an essential singularity at
193
5.10 Laurent Series
z = 0 because its Laurent series (5.103) f (z) = e1/z =
0 ∞ X X 1 n 1 1 = z m m! z |n|! n=−∞
(5.109)
m=0
runs from n = −∞. Near z = 0, f (z) = exp(1/z) takes on every complex number except 0 an infinite number of times. Example: The function f (z) =
1 z(z + 1)
(5.110)
has poles at z = 0 and at z = −1 but is otherwise analytic; it is meromorphic. We may expand it in a Laurent series (5.104–5.105) about z = 0 ∞ X 1 f (z) = = an z n z(z + 1) n=−∞
where the coefficient an is the integral I dz 1 an = n+2 2πi C z (z + 1)
(5.111)
(5.112)
in which the contour C is a ccw circle of radius r < 1. Since |z| < 1, we may expand 1/(1 + z) as the series ∞ X 1 = (−z)m . 1+z
(5.113)
m=0
Doing the integral, we find an = =
∞ X m=0 ∞ X
1 2πi
I
(−z)m
C
dz z n+2
(−1)m rm−n−1 δm,n+1
m=0
= (−1)n+1
for n ≥ −1
(5.114)
and zero otherwise. So the Laurent series for f (z) is ∞ X 1 = (−1)n+1 z n . f (z) = z(z + 1)
(5.115)
n=−1
This series starts at n = −1, not at n = −∞, because f (z) is meromorphic with only a simple pole at z = 0.
194
Complex-Variable Theory
Example 5.2 (The Argument Principle) Suppose f (z) is analytic and g(z) meromorphic in a region R. Consider the counter-clockwise integral I 1 g 0 (z) dz (5.116) f (z) 2πi C g(z) along a contour C that lies inside R. If the function g(z) is simply a zero or a pole of order n at w ∈ R g(z) = an (w)(z − w)n
(5.117)
g 0 (z) n n(z − w)n−1 = = n g(z) (z − w) z−w
(5.118)
then the ratio g 0 /g is
and the integral is I I g 0 (z) n 1 1 f (z) f (z) dz = dz = nf (w). 2πi C g(z) 2πi C z−w
(5.119)
More generally, any function g(z) meromorphic in R will possess a Laurent series ∞ X g(z) = ak (w)(z − w)k (5.120) k=n
about each point w ∈ R. One may show (problem 13) that as z → w the ratio g 0 /g again approaches (5.118). It follows that the integral (5.117) is a sum of f (w` ) at the zeros of g(z) minus a similar sum at the poles of g(z) I X 1 g 0 (z) f (z) dz = n` f (w` ) (5.121) 2πi C g(z) `
in which |n` | is the multiplicity of the `th zero or pole. 5.11 Analytic Continuation We saw in Sec. 5.6 that a function f (z) that is analytic within a circle of radius R about a point z0 possesses a Taylor series (5.86) f (z) =
∞ X (z − z0 )n (n) f (z0 ) n!
(5.122)
n=0
that converges for all z inside the disk |z − z0 | < R. Suppose z 0 is the singularity of f (z) that is closest to z0 . Pick a point z1 in the disk |z−z0 | < R that is not on the line from z0 to the nearest singularity z 0 . The function
5.11 Analytic Continuation
195
f (z) is analytic at z1 because z1 is within the circle of radius R about the point z0 , and so f (z) has a Taylor series expansion like (5.122) but about the point z1 . Usually the circle of convergence of this power series about z1 will extend beyond the original disk |z − z0 | < R. If so, the two power series, one about z0 and the other about z1 , define the function f (z) and extend its domain of analyticity beyond the original disk |z − z0 | < R. Such an extension of the range of an analytic function is called analytic continuation. We often can analytically continue a function more easily than by the successive construction of Taylor series. Example 5.3 (The Geometric Series) f (z) =
The power series
∞ X
zn
(5.123)
n=0
converges and defines an analytic function for |z| < 1. But for such z, we may sum the series to 1 f (z) = . (5.124) 1−z By summing the series (5.123), we have analytically continued the function f (z) to the whole complex plane apart from its simple pole at z = 1. Example 5.4 (The Gamma Function) Euler’s form of the gamma function is the integral Z ∞ Γ(z) = e−t tz−1 dt = (z − 1)! (5.125) 0
which make Γ(z) analytic in the right half-plane 0. But by successively using the relation Γ(z + 1) = z Γ(z), we may extend Γ(z) into the left halfplane; the last expression defines it as a function that is analytic for −3 Γ(z) =
1 1 1 1 1 1 Γ(z + 1) = Γ(z + 2) = Γ(z + 3) z z z+1 z z+1 z+2
(5.126)
apart from simple poles at z = 0, −1, and −2. Proceeding in this way, we may analytically continue the gamma function to the whole complex plane apart from the negative integers and zero. The analytically continued gamma function may be represented by the formula "∞ #−1 1 −γz Y z −z/n Γ(z) = e 1+ e (5.127) z n n=1
196
Complex-Variable Theory
due to Weierstrass. Example 5.5 (Dimensional Regularization) The loop diagrams of quantum field theory involve badly divergent integrals like Z ∞ p4+1 dp . (5.128) I(4) ≡ [p2 + ν 2 ]2 0 Gerardus ’t Hooft (1946–) and Martinus J. G. Veltman (1931–) promoted the number of space-time dimensions from 4 to a complex number d. The resulting integral is well-defined for 0 and arbitrary complex z √ Z ∞ Z ∞ π −m2 (x+z)2 −m2 x2 dx e = dx e = . m −∞ −∞
(5.145)
Example: Third-Harmonic Microscopy An ultra-short laser pulse intensely focused in a featureless medium generates a third-harmonic electric field E3 in the forward direction proportional to the integral (Boyd, 2000) Z ∞ dz (3) 3 E3 ∝ χ E0 ei ∆k z (5.146) (1 + 2iz/b)2 −∞ along the axis of the beam in the medium as in the drawing on the righthand side of Fig. 5.6. Here b = 2πw02 n/λ = kw02 in which n = n(ω) is the index of refraction of the medium, λ is the wavelength of the laser light in the medium, and w0 is the (tranverse) waist radius of the gaussian beam, defined by E(r) = Ee−r
2 /w 2 0
.
(5.147)
When the dispersion is normal, that is when dn(ω)/dω > 0, the shift in the wave vector ∆k = 3ω(n(ω) − n(3ω))/c is negative. Since ∆k < 0, the exponential is damped when z = x + iy is in the lower half plane ei ∆k z = ei ∆k (x+iy) = ei ∆k x e−∆k y .
(5.148)
So we may add a ghost contour GC around the lower half plane (LHP) z = R eiθ ,
π ≤ θ ≤ 2π,
dz = iReiθ dθ,
(5.149)
200
Complex-Variable Theory
Figure 5.6 In the limit in which the distance L is much larger than the wavelength λ, the integral (5.146) would be zero when the focused beam lies within the medium as in the drawing on the right, but not when it extends beyond the edge of the medium as in the drawing on the left.
because in the limit R → ∞, the absolute value of the integral along it vanishes Z Z 2π π dz 1 i ∆k z lim e ≤ lim dθR 2 = lim → 0. 2 R→∞ R→∞ R R→∞ π (1 + 2iz/b) R GC (5.150) The function f (z) = exp(i ∆k z)/(1+2iz/b)2 has a double pole at z = ib/2 which is in the UHP since the length b > 0, but no singularity in the LHP y < 0. So the integral of f (z) along the closed contour from z = −R to z = R and then along the ghost contour GC vanishes. But since the integral along the ghost contour vanishes, so does the integral from −R to R. Thus, when the dispersion is normal, the third-harmonic signal vanishes E3 = 0—as long as the medium effectively extends from −∞ to ∞, as in the drawing on the RHS of Fig. 5.6. But when the medium is off-center, as in the drawing on the LHS of the figure, a third-harmonic signal E3 does
5.13 Ghost Contours
201
appear. Third-harmonic microscopy lets us see edges or features instead of background. Example 5.6 (Green and Bessel) Let us evaluate the integral Z ∞ eikx I(x) = dk 2 , k + m2 −∞
(5.151)
which is the unnormalized Fourier transform of the function 1/(k 2 + m2 ). If x > 0, then the exponential deceases with k in the upper half plane (UHP). So along the semicircular ghost contour k = R eiθ ,
0 ≤ θ ≤ π,
dk = iReiθ dθ,
(5.152)
and in the limit of large R, the absolute value of the integral along the ghost contour vanishes Z π 1 π dθR 2 = lim lim |I(x)GC | ≤ lim → 0. (5.153) R→∞ R R→∞ R→∞ 0 R So if x > 0, then we may add this ghost contour to the integral I(x) without changing it. Thus I(x) is equal to the closed contour integral along the real axis and also along the semicircular ghost contour (5.152) I I eikx eikx I(x) = dk 2 = dk . (5.154) k + m2 (k + im)(k − im) This closed contour encircles the simple pole at k = im and no other singularity, and so we may shrink the contour into a tiny circle around the pole. Along that tiny circle, the function eikx /(k + im) is simply e−mx /2im, and so I e−mx dk e−mx πe−mx I(x) = = 2πi = for x > 0. (5.155) 2im k − im 2im m Similarly, if x < 0, we can add the semicircular ghost contour k = R eiθ ,
π ≤ θ ≤ 2π,
dk = iReiθ dθ
(5.156)
with k running around the perimeter of the lower half plane (LP). So if x < 0, then we may write the integral I(x) as a shrunken closed contour that runs clockwise around the pole at k = −im I dk emx πemx emx I(x) = = −2πi = for x < 0. (5.157) −2im k + im −2im m The two cases (5.155) and (5.157) may be combined into the result Z ∞ eikx π −m|x| dk 2 = e . (5.158) 2 k + m m −∞
202
Complex-Variable Theory
We may use this formula to develop an expression for the Green’s function of the Laplacian in cylindical coordinates. Setting x0 = 0 and r = |x| = p ρ2 + z 2 in the Coulomb Green’s function (3.148), we have Z 1 1 d3 k 1 ik·x p G(r) = = = e . (5.159) 4πr (2π)3 k2 4π ρ2 + z 2 The integral over the z-component of k is (5.158) with m2 = kx2 + ky2 ≡ k 2 Z
∞
dkz −∞
eikz z π = e−k|z| . 2 2 kz + k k
So with kx x + ky y ≡ kρ cos φ, the Green’s function is Z ∞ Z 2π πdk 1 p = dφ eikρ cos φ e−k|z| . (2π)3 0 4π ρ2 + z 2 0 The φ integral is a representation (9.7) Bessel function J0 (kρ) Z 2π dφ ikρ cos φ J0 (kρ) = e . 0 2π
(5.160)
(5.161)
(5.162)
Thus we arrive at Bessel’s formula for the Coulomb Green’s function Z ∞ 1 dk p = J0 (kρ) e−k|z| (5.163) 2 2 4π 4π ρ + z 0 in cylindical coordinates (Schwinger et al., 1998, p. 166). Example 5.7 (Yukawa and Green) We saw in Example 3.10 that the Green’s function for Yukawa’s differential operator (3.165) is Z 3 eik·x d k GY (x) = . (5.164) (2π)3 k2 + m2 Letting k · x = kr cos θ in which r = |x|, we find Z ∞ 2 Z eikr cos θ k dk 1 GY (r) = d cos θ 2 k 2 + m2 0 (2π) −1 Z ∞ 1 dk k ikr −ikr = e − e ir 0 (2π)2 k 2 + m2 Z k 1 ∞ dk = eikr 2 2 ir −∞ (2π) k + m2 Z 1 ∞ dk k = eikr . ir −∞ (2π)2 (k − im)(k + im)
(5.165)
203
5.13 Ghost Contours
We add a ghost contour that loops over the upper-half plane and get GY (r) =
e−mr 2πi im −mr e = (2π)2 ir 2im 4πr
(5.166)
which Yukawa proposed as the potential between two hadrons due to the exchange of a particle of mass m, the pion. Because the mass of the pion is 140 MeV, the range of the Yukawa potential is ~/mc = 1.4 × 10−15 m. Example: As another example, let’s consider the integral Z ∞ eikx J(x) = dk. 2 2 2 −∞ (k + m )
(5.167)
We may add ghost contours as in the preceding example, but now the integrand has double poles at k = ±im, and so we must use Cauchy’s integral formula (5.35) for the case of n = 1, which is Eq.(5.33). For x > 0, we add a ghost contour in the UHP and find I eikx d eikx J(x) = dk = 2πi (k + im)2 (k − im)2 dk (k + im)2 k=im π 1 = x+ e−mx . (5.168) 2m2 m If x < 0, then we add a ghost contour in the LHP and find I d eikx eikx dk = −2πi J(x) = = (k + im)2 (k − im)2 dk (k − im)2 k=−im π 1 = −x + emx . (5.169) 2 2m m Putting the two together, we get as the Fourier transform of 1/(k 2 + m2 )2 Z ∞ eikx π 1 dk = |x| + e−m|x| . (5.170) J(x) = 2 2 2 2m2 m −∞ (k + m ) Example: A Complex Gaussian Integral As another example of the use of ghost contours, let us use one to do the integral Z ∞ 2 I= ewx dx (5.171) −∞
in which the real part of the non-zero complex number w = u + iv = ρeiφ is negative or zero π 3π u ≤ 0 ⇐⇒ ≤φ≤ . (5.172) 2 2
204
Complex-Variable Theory
We first write the integral I as twice that along half the x-axis Z ∞ 2 ewx dx. I=2
(5.173)
0
If we promote x to a complex variable z = reiθ , then wz 2 will be negative if φ + 2θ = π, that is, if 1 θ = (π − φ) (5.174) 2 where in view of (5.172) θ lies in the interval π π − ≤θ≤ . (5.175) 4 4 The closed pie-shaped contour of Fig. 5.7—down the real axis from z = 0 to z = R, along the arc z = R exp(iθ0 ) as θ0 goes from 0 to θ, and then down the line z = r exp(iθ) from z = R exp(iθ) to z = 0—encloses no singularities of the function f (z) = exp(wz 2 ). Hence the integral of exp(wz 2 ) along that contour vanishes. To show that the arc is a ghost contour, we bound it by Z θ Z θ 2 e2iθ 0 (u+iv)R 0 e Rdθ ≤ exp uR2 cos 2θ0 − vR2 sin 2θ0 Rdθ0 0
0
Z ≤
θ
e−vR
2
sin 2θ0
Rdθ0 .
(5.176)
0
Here v sin 2θ0 ≥ 0, and so if v is positive, then so is θ. In this case, 0 ≤ θ0 ≤ π/4, and so sin(2θ0 ) ≥ 4θ0 /π. We therefore have the upper bound Z θ Z θ 2 0 0 π(e−4vR θ /π − 1) −4vR2 θ0 /π 0 e(u+iv)R2 e2iθ Rdθ0 ≤ e Rdθ = (5.177) 4vR 0 0 which vanishes in the limit R → ∞. (If v is negative, then so is θ, the pie-shaped contour is in the fourth quadrant, sin(2θ0 ) ≤ 4θ0 /π, and the inequality (5.177) holds with absolute-value signs around the second integral.) Since by Cauchy’s integral theorem (5.20) the integral along the pieshaped contour of Fig. 5.7 vanishes, it follows that Z 0 1 2 I+ ewz dz = 0. (5.178) 2 Reiθ But the choice (5.174) implies that on the line z = r exp(iθ) the quantity wz 2 is negative, wz 2 = −ρr2 . Thus with dz = exp(iθ)dr, we have Z Reiθ Z R 2 wz 2 iθ I=2 e dz = 2e e−ρr dr (5.179) 0
0
5.13 Ghost Contours
Figure 5.7 The integral of the entire function exp(wz 2 ) along the pieshaped closed contour vanishes by Cauchy’s theorem.
205
206
Complex-Variable Theory
so that as R → ∞ I = 2e
iθ
∞
Z
−ρr2
e
iθ
r
dr = e
0
π = ρ
π
r
ρe−2iθ
.
Finally, using (5.174) and w = ρ exp(iφ), we get for 0 and z is an arbitrary complex number √ Z ∞ π −m2 (x+z)2 e dx = . (5.184) m −∞ These last two formulas are used in chapter 17 on path integrals.
5.14 Logarithms and Cuts When exponentiated, the logarithm ln z of a complex number z returns z exp(ln z) = z.
(5.185)
z = r exp(iθ)
(5.186)
ln z = ln r + iθ.
(5.187)
So if z is then the logarithm is But what is θ? In the polar representation (5.186) of z, the argument θ can just as well be θ plus any integral multiple of 2π θ0 = θ + 2πn
(5.188)
z = r exp(iθ) = r exp(iθ + i2πn).
(5.189)
because both give z
5.15 Roots
207
Two conventions are common. In the first convention, the angle θ is zero along the positive real axis and increases continuously as the point z moves around the origin in a ccw way, until at points just below the positive real axis, θ = 2π − is slightly less than 2π. In this convention, the value of θ drops by 2π as one crosses the positive real axis moving ccw or up. Because of this discontinuity, the positive real axis is called a cut in this convention. The second common convention puts the cut on the negative real axis. Here, the value of θ is the same as in the first convention when the point z is in the upper half-plane. But in the lower half-plane, θ decreases from 0 to −π as the point z moves clockwise from the positive real axis to just below the negative real axis, where θ = −π + . As one crosses the negative real axis moving clockwise or up, θ jumps by 2π, so the negative real axis is a cut. The two conventions agree in the upper half-plane but differ by 2π in the lower half-plane. Sometimes, it is convenient to place the cut on the positive or negative imaginary axis — or along a line that makes an arbitrary angle with the real axis. In any particular calculation, we are at liberty to define the polar angle θ by placing the cut anywhere we like, but we must not change from one convention to another in the same computation. 5.15 Roots The logarithm is the key to many other functions and passes its arbitrariness to them. The square-root, for instance, is defined as √ z = exp 12 ln z . (5.190) It changes sign when we change θ by 2π as we cross a cut. Similarly, the m-th root is √ 1 1/m m z=z = exp ln z . (5.191) m It changes by exp(±2πi/m) when we change θ by 2π crossing a cut. We could, of course, define a sequence of mth-root functions 1 1/m [ln r + i(θ + 2πn)] (5.192) z = exp m n one for each integer n. These functions are called the branches of the mthroot function. If we didn’t use the index n and just merged all the branches, then the mth-root function would be multivalued. Using a convention for θ, we could extend the n = 0 branch to the n = 1 branch by winding around
208
Complex-Variable Theory
the point z = 0 in a ccw fashion. One encounters no discontinuity as one passes from one branch to another. The point z = 0, where the cut starts, is called a branch point because by winding around it, one passes smoothly from one branch to another. Such branches, introduced by Riemann, can be associated with any multivalued analytic function not just with the mth root. Examples of Cuts Cuts cause discontinuities, so people often place them where they do the least harm. For instance, if we had to do with the function p p f (z) = z 2 − 1 = (z − 1)(z + 1) (5.193) then either of the two principal conventions would work. We could put the cut in the definition of the angle θ along either the positive or the negative real axis. And we’d get a bonus: the√sign discontinuity (a factor of −1) from √ z − 1 would cancel the one from z + 1 except for −1 ≤ z ≤ 1. So the function f (z) would have a discontinuity or a cut only for −1 ≤ z ≤ 1. But now suppose we had to work with the function p p f (z) = z 2 + 1 = (z − i)(z + i). (5.194) If we used one of the principal conventions, we’d have two semi-infinite cuts. So we place the θ-cut on the positive or negative imaginary axis, and the function f (z) now has a cut running along the imaginary axis only from −i to i. Example: Contour Integral with Cut Let’s compute the integral Z ∞ xa I= dx (5.195) 2 0 (x + 1) for −1 < a < 1. We promote x to a complex variable z and put the cut on the positive real axis. Since lim
|z|→∞
|z|a+1 = 0, |z + 1|2
(5.196)
the integrand vanishes faster than 1/|z|, and we may add two ghost contours, G+ ccw around the UHP and G− ccw around the LHP, as shown in the figure (5.8). We also add a contour C− that runs from −∞ to the double pole at z = −1, loops around that pole, and then runs back to −∞; the two long contours along the negative real axis cancel because the cut in θ lies on the positive real axis. So the contour integral along C− is just the clockwise
209
5.15 Roots
Figure 5.8 The contour consisting of the ghost contours G+ and G− , the contour C− , the contour I− , and the contour I encircles no poles of the function f (z) = z a /(z + 1), and so the integral of f (z) along that contour is zero.
integral around the double pole which by Cauchy’s integral formula (5.33) is I za dz a dz = − 2πi = 2πi a eπai . (5.197) 2 (z − (−1)) dz C− z=−1 We also add the integral I− from ∞ to 0 just below the real axis Z 0 Z 0 exp(a(ln(x) + 2πi)) (x − i)a I− = dx = dx (5.198) 2 (x + 1)2 ∞ ∞ (x − i + 1) which is I− = − e
2πai
Z 0
∞
xa dx = − e2πai I. (x + 1)2
(5.199)
Now the sum of all these contour integrals is zero because it is a closed
210
Complex-Variable Theory
contour that encloses no singularity. So 0 = 1 − e2πai I + 2πi a eπai
(5.200)
or I=
πa . sin(πa)
(5.201)
5.16 Cauchy’s Principal Value Suppose that f (x) is differentiable or analytic at and near the point x = 0, and that we wish to evaluate the integral Z b f (x) dx K = lim (5.202) →0 −a x − i for a > 0 and b > 0. First, we regularize the pole at x = 0 by using a method devised by Cauchy: Z b Z −δ Z δ f (x) f (x) f (x) + dx + dx . (5.203) K = lim lim dx δ→0 →0 x − i x − i x − i −δ δ −a In the first and third integrals, since |x| ≥ δ, we may set = 0 Z −δ Z b Z δ f (x) f (x) f (x) K = lim dx + dx + lim lim dx . (5.204) →0 δ→0 δ→0 x x x − i −a δ −δ We’ll discuss the first two integrals before analyzing the last one. The limit of the first two integrals is called Cauchy’s principal value Z −δ Z b Z b f (x) f (x) f (x) P dx ≡ lim dx + dx . (5.205) δ→0 x x x −a −a δ If the function f (x) is nearly constant near x = 0, then the large negative values of 1/x for x slightly less than zero cancel the large positive values of 1/x for x slightly greater than zero. The point x = 0 is not special; Cauchy’s principal value is more generally defined by the limit Z y−δ Z b Z b f (x) f (x) f (x) P dx ≡ lim dx + dx . (5.206) x − y δ→0 x−y x−y −a −a y+δ Using Cauchy’s principal value, we may write the quantity K as Z b Z δ f (x) f (x) K=P dx + lim lim dx . →0 δ→0 x x − i −a −δ
(5.207)
5.16 Cauchy’s Principal Value
211
To evaluate the last integral, we use differentiability of f (x) near x = 0 to write f (x) = f (0) + xf 0 (0) and then extract f (0) from the integral: Z δ Z δ f (0) + x f 0 (0) f (x) dx lim lim dx = lim lim δ→0 →0 −δ x − i δ→0 →0 −δ x − i Z δ 1 = f (0) lim lim dx . (5.208) →0 δ→0 x − i −δ Now since 1/(z − i) is analytic, we may deform the straight contour from x = −δ to x = δ into the tiny semicircle x → z = δ eiθ
for π ≤ θ ≤ 2π
which avoids the point x = 0 Z δ Z b 1 f (x) + f (0) lim lim dz . K=P dx δ→0 →0 −δ x z − i −a We now can set = 0 and so write K as Z b Z 2π f (x) 1 K=P dx + f (0) lim iδeiθ dθ iθ δ→0 π x δe −a Z b f (x) =P dx + iπf (0). x −a Recalling the definition (5.257) of K, we have Z b Z b f (x) f (x) =P + iπf (0). lim dx →0 −a x − i x −a
(5.209)
(5.210)
(5.211)
(5.212)
for any function f (z) that is differentiable at x = 0. This trick is of wide applicability. Physicists write it as 1 1 = P + iπδ(x). x − i x
(5.213)
1 1 = P − iπδ(x) x + i x
(5.214)
1 1 =P ∓ iπδ(x − y). x − y ± i x−y
(5.215)
It has a brother
and cousins
Examples of Cauchy’s Trick We may use trick (5.214) to evaluate the integral Z ∞ 1 1 I= dx (5.216) x + i 1 + x2 −∞
212
Complex-Variable Theory
as Z I=P
∞
dx −∞
1 1 − iπ x 1 + x2
Z
∞
dx −∞
δ(x) . 1 + x2
(5.217)
Because the function 1/[x(1 + x2 )] is odd, the principal part is zero. The integral over the delta function gives unity, so we have I = −iπ. Example: To compute the integral Z ∞ dk I= sin k k 0
(5.218)
(5.219)
which we used to derive the formula (3.148) for the Green’s function of the laplacian in three dimensions, we first express it as an integral along the whole real axis Z ∞ Z ∞ dk dk ik e − e−ik = eik (5.220) I= 2ik 2ik −∞ 0 and then add a ghost contour along the path k = R exp(iθ) for θ = 0 → π in the limit R → ∞ I I dk ik dk ik I= e ≡P e (5.221) 2ik 2ik in which we interpret the integral across the point k = 0 as Cauchy’s principal value. Using Cauchy’s trick (5.214), we have I I I dk ik dk dk ik I=P e = e + iπδ(k) eik . (5.222) 2ik 2i(k + i) 2i The first integral vanishes because the pole is below the real axis leaving the desired result Z ∞ dk π I= sin k = (5.223) k 2 0 as stated in (3.147). Example—The Feynman Propagator: Adding ±i to the denominator of a pole term of an integral formula for a function f (x) can slightly shift the pole into the upper or lower half plane, causing the pole to contribute if a ghost contour goes around the UHP or the LHP. Such i’s impose boundary conditions on Green’s functions. The Feynman propagator ∆F (x) is a Green’s function for the KleinGordon differential operator (Weinberg, 1995, pp. 274–280) (2 − m2 )∆F (x) = − δ 4 (x)
(5.224)
5.16 Cauchy’s Principal Value
213
in which x = (x0 , x) and 2=4−
∂2 ∂2 =4− 2 ∂t ∂(x0 )2
(5.225)
is the four-dimensional version of the laplacian 4 ≡ ∇ · ∇. Here δ 4 (x) is the four-dimensional version of Dirac’s delta function (3.32) Z Z d4 q d4 q iqx 4 0 0 δ (x) = exp[i(q · x − q x )] = e (5.226) (2π)4 (2π)4 in which qx = q · x − q 0 x0 is the Lorentz-invariant inner product of the 4vectors q and x. There are many Green’s functions that satisfy Eq.(5.224). Feynman’s propagator ∆F (x) is the one that satisfies certain boundary conditions which will become evident when we analyze the effect of its i Z d4 q exp(iqx) ∆F (x) = . (5.227) 4 2 (2π) q + m2 − i p The quantity Eq = q 2 + m2 is the energy of a particle of mass m and momentum q in natural units with the speed of light c = 1. Using this abbreviation and setting 0 = /(2Eq ), we may write the denominator as 2 q 2 + m2 − i = q · q − q 0 + m2 − i = Eq − i0 − q 0 Eq − i0 + q 0 + 02 (5.228) in which 02 is negligible. We now drop the prime on the and do the q 0 integral Z ∞ 0 dq −iq0 x0 1 I(q) = − e . (5.229) 0 [q − (Eq − i)] [q 0 − (−Eq + i)] −∞ 2π The function f (q 0 ) = e−iq
0 x0
[q 0
1 − (Eq − i)] [q 0 − (−Eq + i)]
(5.230)
has poles at Eq − i and at −Eq + i, as shown in Fig. 5.9. If x0 > 0, then we can add a ghost contour that goes cw around the LHP, and we get 0
I(q) = ie−iEq x
1 2Eq
x0 > 0.
(5.231)
If x0 < 0, we add a ghost contour that goes ccw around the UHP, and we get 1 0 I(q) = ieiEq x x0 < 0. (5.232) 2Eq
214
Complex-Variable Theory
Figure 5.9 In Eq. (5.230), the function f (q 0 ) has poles at ±(Eq − i), and the function exp(−iq 0 x0 ) is exponentially suppressed in the LHP if x0 > 0 and in the UHP if x0 < 0. So we can add a ghost contour in the LHP if x0 > 0 and in the UHP if x0 < 0.
Using Heaviside’s step function θ(x) =
x + |x| , 2
(5.233)
we may combine the last two equations into i 1 h 0 −iEq x0 0 iEq x0 −iI(q) = θ(x ) e + θ(−x ) e . 2Eq
(5.234)
5.16 Cauchy’s Principal Value
In terms of the Lorentz-invariant function Z 3 d q 1 exp[i(q · x − Eq x0 )] ∆+ (x) = 3 (2π) 2Eq
215
(5.235)
and with a factor of −i, the Feynman propagator is −i∆F (x) = θ(x0 ) ∆+ (x) + θ(−x0 ) ∆+ (x, −x0 ).
(5.236)
But the integral (5.235) defining ∆+ (x) is insensitive to the sign of q, and so Z 3 1 d q exp[i(−q · x + Eq x0 )] (5.237) ∆+ (−x) = 3 (2π) 2Eq Z 3 1 d q = exp[i(q · x + Eq x0 )] = ∆+ (x, −x0 ). 3 (2π) 2Eq Thus we arrive at the standard form of the Feynman propagator −i∆F (x) = θ(x0 ) ∆+ (x) + θ(−x0 ) ∆+ (−x).
(5.238)
The Lorentz-invariant function ∆+ (x−y) is the commutator of the positivefrequency part Z d3 p + p φ (x) = exp[i(p · x − p0 x0 )] a(p) (5.239) (2π)3 2p0 of a scalar field φ = φ+ + φ− with its negative-frequency part Z d3 q − p φ (y) = exp[−i(q · y − q 0 y 0 )] a† (q) (5.240) (2π)3 2q 0 p where p0 = Ep = p2 + m2 and q 0 = Eq . For since the annihilation operators a(q) and the creation operators a† (p) satisfy the commutation relation [a(q), a† (p)] = δ 3 (q − p)
(5.241)
we have d3 p d3 q p eipx−iqy [a(p), a† (q)] 3 0 0 (2π) 2 q p Z d3 p = eip(x−y) = ∆+ (x − y) (2π)3 2p0
[φ+ (x), φ− (y)] =
Z
(5.242)
in which px = p · x − p0 x0 , etc. Incidentally, at points x that are space-like x2 = x2 − (x0 )2 ≡ r2 > 0
(5.243)
216
Complex-Variable Theory
√ the Lorentz-invariant function ∆+ (x) depends only upon r = + x2 and has the value (Weinberg, 1995, p. 202) ∆+ (x) =
m K1 (mr) 4π 2 r
(5.244)
in which the Hankel function K1 is z π 1 z 1 ln + ... K1 (z) = − [J1 (iz) + iN1 (iz)] = + +γ− 2 z 2j + 2 2 2j + 2 (5.245) where J1 is the first Bessel function, N1 is the first Neumann function, and γ = 0.57721 . . . is the Euler-Mascheroni constant. The Feynman propagator arises most simply as the mean value in the vacuum of the time-ordered product of the fields φ(x) and φ(y) T {φ(x)φ(y)} ≡ θ(x0 − y 0 )φ(x)φ(y) + θ(y 0 − x0 )φ(y)φ(x).
(5.246)
Since the operators a(p) and a† (p) respectively annihilate the vacuum ket a(p)|0i = 0 and bra h0|a† (p) = 0, by (5.239 & 5.240) the same is true of the positive- and negative-frequency parts of the field: φ+ (z)|0i = 0 and h0|φ− (z) = 0. Thus, the mean value in the vacuum of the time-ordered product is proportional to the Feynman propagator −i∆F (x − y) h0|T {φ(x)φ(y)} |0i = h0|θ(x0 − y 0 )φ(x)φ(y) + θ(y 0 − x0 )φ(y)φ(x)|0i = h0|θ(x0 − y 0 )φ+ (x)φ− (y) + θ(y 0 − x0 )φ+ (y)φ− (x)|0i = h0|θ(x0 − y 0 )[φ+ (x), φ− (y)] + θ(y 0 − x0 )[φ+ (y), φ− (x)]|0i.
(5.247)
But by (5.242), these commutators are ∆+ (x − y) and ∆+ (y − x). It follows then from the standard form (5.238) of the Feynman propagator that h0|T {φ(x)φ(y)} |0i is −i∆F (x − y) h0|T {φ(x)φ(y)} |0i = θ(x0 − y 0 )∆+ (x − y) + θ(y 0 − x0 )∆+ (y − x) = −i∆F (x − y).
(5.248)
Feynman put i in the denominator of the Fourier transform of his propagator to get this result. Example 5.8 (The Abel-Plana Formula) Suppose the function f (z) is analytic and bounded for n1 ≤ 0 (and n1 < p, they are p p p2 − y 2 ± 2iy = ±i y 2 − p2 .
(5.280)
222
Complex-Variable Theory
p Their difference is 2i y 2 − p2 , and so E(`) is Z ∞ p 2 Z −2 y − p2 θ(y − p) E(`) π 2 ~c ∞ p dp = dy A 2`3 0 e2πy − 1 0
(5.281)
in which the Heaviside function θ(x) ≡ (x+|x|)/(2|x|) enforces the condition y>p Z ∞p 2 Z π 2 ~c y y − p2 E(`) p dp = − 3 dy. (5.282) 2πy A ` e −1 0 0 The p-integration is elementary, and so the energy difference is Z π 2 ~c ∞ y 3 dy π 2 ~c B2 π 2 ~c E(`) = − = − = − A 3`3 0 e2πy − 1 3`3 8 720 `3
(5.283)
in which B2 is the second Bernoulli number (4.100). The pressure forcing the plates together then is 1 ∂E(`) π 2 ~c =− (5.284) A ∂` 240 `4 a result due to Casimir (Hendrik Brugt Gerhard Casimir, 1909–2000). Although the Casimir effect is very attractive because of its direct connection with the symmetric ordering of the creation and annihilation operators in the hamiltonian (5.262), the reader should keep in mind that neutral atoms are mutually attractive, which is why most gases are diatomic, and that Lifshitz explained the effect in terms of the mutual attraction of the atoms in the metal plates (Lifshitz, 1956; Milonni and Shih, 1992) (Evgeny Mikhailovich Lifshitz, 1915–1985). p= −
5.17 Dispersion Relations In many physical contexts, functions occur that are analytic in the upper half plane (UHP). Suppose for instance that fˆ(t) is a transfer function that determines an effect e(t) due to a cause c(t) Z ∞ e(t) = dt0 fˆ(t − t0 ) c(t0 ). (5.285) −∞
If this system is causal, then the transfer function fˆ(t − t0 ) will vanish for t − t0 < 0, and so its Fourier transform Z ∞ Z ∞ dt dt √ fˆ(t) eizt = √ fˆ(t) eizt f (z) = (5.286) 2π −∞ 2π 0 will be analytic in the UHP and will shrink as the imaginary part of z = x+iy increases.
5.17 Dispersion Relations
223
So let us assume that the function f (z) is analytic in the UHP and on the real axis and further that lim |f (reiθ )| = 0
for
r→∞
0 ≤ θ ≤ π.
(5.287)
By Cauchy’s integral formula (5.31), if z0 lies in the UHP, then f (z0 ) is given by the closed ccw contour integral I 1 f (z) f (z0 ) = dz (5.288) 2πi z − z0 in which the contour runs along the real axis and then loops over the semicircle lim reiθ
for
r→∞
0 ≤ θ ≤ π.
(5.289)
Our assumption (5.287) about the behavior of f (z) in the UHP implies that this contour (5.289) is a ghost contour because its modulus is bounded by Z 1 |f (reiθ )|r lim dθ = lim |f (reiθ )| = 0. (5.290) r→∞ 2π r→∞ r So we may drop the ghost contour and write f (z0 ) as Z ∞ 1 f (x) f (z0 ) = dx. 2πi −∞ x − z0 We now let the imaginary part y0 of z0 = x0 + iy0 shrink to Z ∞ f (x) 1 f (x0 ) = dx. 2πi −∞ x − x0 − i and use the trick (5.215): Z ∞ Z ∞ 1 f (x) iπ f (x0 ) = P dx + f (x) δ(x − x0 ) dx 2πi 2πi −∞ −∞ x − x0
(5.291)
(5.292)
(5.293)
so that 1 P f (x0 ) = 2πi
Z
∞
−∞
f (x) 1 dx + f (x0 ). x − x0 2
(5.294)
The result 1 f (x0 ) = P πi is called a dispersion relation.
Z
∞
−∞
f (x) dx x − x0
(5.295)
224
Complex-Variable Theory
If we break f (z) = u(z)+iv(z) into its real u(z) and imaginary v(z) parts, then this dispersion relation (5.295) Z ∞ u(x) + iv(x) 1 dx u(x0 ) + iv(x0 ) = P πi x − x0 −∞ Z ∞ Z ∞ v(x) i u(x) 1 dx − P dx (5.296) = P π π −∞ x − x0 −∞ x − x0 breaks into its real and imaginary parts Z ∞ 1 v(x) u(x0 ) = P dx π −∞ x − x0
(5.297)
and 1 v(x0 ) = − P π
∞
Z
−∞
u(x) dx. x − x0
(5.298)
One says that u and v are Hilbert transforms of each other. In applications of dispersion relations, the function f (z) for z < 0 sometimes is either physically meaningless or experimentally inaccessible. In such cases, there may be a symmetry that relates f (−x) to f (x). For instance, if f (x) is the Fourier transform of a real function fˆ(k), then by Eq.(3.28) it obeys the symmetry relation f ∗ (x) = u(x) − iv(x) = f (−x) = u(−x) + iv(−x),
(5.299)
which implies that u(x) is even u(−x) = u(x)
(5.300)
v(−x) = −v(x).
(5.301)
and that v(x) is odd
Using these relations (5.300 & 5.301), one may show that the Hilbert transformations (7.56 & 7.57) become Z ∞ 2 x v(x) dx (5.302) u(x0 ) = P π x2 − x20 0 and 2x0 v(x0 ) = − P π
Z 0
∞
u(x) dx, − x20
x2
which do not require input at negative values of x.
(5.303)
5.18 Kramers-Kronig Relations
225
5.18 Kramers-Kronig Relations If in Maxwell’s equation ∇ × B = µJ + µE˙
(5.304)
we replace the current density J by σE and the time derivative E˙ by −iωE, then we find σ ∇ × B = −iωµ 1 + i E ≡ −iωn2 0 µ0 E (5.305) ω which reveals the squared index of refraction as σ µ 1+i . (5.306) n2 (ω) = 0 µ0 ω The imaginary part of n2 represents the scattering of light mainly by electrons. At high frequencies in non-magnetic materials n2 (ω) → 1, and so Kramers and Kronig applied the Hilbert-transform relations (5.302–5.303) to the function n2 (ω)−1 in order to satisfy condition (5.287). Their relations are Z ∞ 2 ω Im(n2 (ω)) Re(n2 (ω0 )) = 1 + P dω (5.307) π ω 2 − ω02 0 and Im(n2 (ω0 )) = −
2ω0 P π
Z 0
∞
Re(n2 (ω)) − 1 dω. ω 2 − ω02
(5.308)
What Kramers and Kronig actually wrote was slightly different from the dispersion relations (5.307 & 5.308). H. A. Lorentz had shown that the index of refraction n(ω) is related to the forward scattering amplitude f (ω) for the scattering of light by a density N of scatterers (Sakurai, 1982) 2πc2 N f (ω). (5.309) ω2 They used this formula to infer that the real part of the index of refraction approached unity in the limit of infinite frequency and applied the Hilbert transform (5.302) Z ∞ 0 2 ω Im[n(ω 0 )] 0 Re[n(ω)] = 1 + P dω . (5.310) π ω 02 − ω 2 0 n(ω) = 1 +
The Lorentz relation (5.309) expresses the imaginary part Im[n(ω)] of the index of refraction in terms of the imaginary part of the forward scattering amplitude f (ω) Im[n(ω)] = 2π(c/ω)2 N Im[f (ω)].
(5.311)
226
Complex-Variable Theory
And the optical theorem relates that quantity to the total cross-section σtot =
4πc 4π Im[f (ω)] = Im[f (ω)]. |k| ω
(5.312)
Thus we have Im[n(ω)] = cN σtot /(2ω), and by the Lorentz relation (5.309) Re[n(ω)] = 1 + 2π(c/ω)2 N Re[f (ω)]. Insertion of these formulas into the Kramers-Kronig integral (5.310) gives a dispersion relation for the real part of the forward scattering amplitude f (ω) in terms of the total cross-section σtot (ω) Z ∞ ω2 σtot (ω 0 ) Re[f (ω)] = 2 dω 0 . (5.313) 2π c 0 ω 02 − ω 2 Dispersion relations have many applications, but they can be overemphasized. In the 1960’s, many otherwise-bright theoretical physicists abandoned quantum field theory and tried to explain the strong interactions in terms of causality and the analyticity of scattering amplitudes. It was a wild-goose chase.
5.19 Conformal Mapping Any analytic function f (z) maps curves in the z plane into curves in the f (z) plane. In general, this mapping preserves angles. To see why, consider the angle dθ between two tiny complex lines dz = exp(iθ) and dz 0 = exp(iθ0 ) that radiate from the same point z. This angle dθ = θ0 − θ is the argument of the phase of the ratio 0
dz 0 eiθ 0 = iθ = ei(θ −θ) = exp(i(θ0 − θ)). dz e
(5.314)
Let’s use w = ρeiφ for f (z). Then the analytic function f (z) maps dz into dw = f (z + dz) − f (z) ≈ f 0 (z) dz
(5.315)
dw0 = f (z + dz 0 ) − f (z) ≈ f 0 (z) dz 0 .
(5.316)
and dz 0 into
The angle dφ = φ0 − φ between dw and dw0 is the argument of the phase of the ratio 0
0
eiφ f 0 (z) dz 0 dz 0 eiθ dw0 = iφ = 0 = = iθ . dw f (z) dz dz e e
(5.317)
5.20 The Method of Steepest Descent
227
So as long as the derivative f 0 (z) does not vanish, the angle in the w-plane is the same as the angle in the z-plane dφ = dθ.
(5.318)
Analytic functions preserve angles. They are conformal maps. 00 00 What if f 0 (z) = 0? In this case, dw ≈ f (z) dz 2 /2 and dw0 ≈ f (z) dz 02 /2, and so the angle dφ = dφ0 − dφ between these two tiny complex lines is the argument of the phase of the ratio 0
00
f (z) dz 02 dz 02 dw0 eiφ = = iφ = 00 = exp(2i(θ0 − θ)). dw dz 2 e f (z) dz 2
(5.319)
So angles are doubled, dφ = 2dθ. In general, if the first non-zero derivative is f (n) (z), then 0
eiφ f (n) (z) dz 0n dz 0n dw0 = iφ = (n) = = exp(ni(θ0 − θ)) dw dz n e f (z) dz n
(5.320)
and so dφ = ndθ. The angles increase by a factor of n. Example: The function f (z) = cz n has only one non-zero derivative at the origin z = 0 f (k) (0) = c n! δnk
(5.321)
so at z = 0 the conformal map z → cz n stretches angles by n, dφ = n dθ.
5.20 The Method of Steepest Descent Suppose we have an integral involving two analytic functions h(z) and f (z) Z I(x) =
b
dz h(z) exp(xf (z))
(5.322)
a
that we want to approximate in the limit of large positive values of the real variable x. We assume that the functions h(z) and f (z) are analytic in a simply connected region around the complex points a and b, and so we may vary the contour from a to b, keeping the end-points fixed, without changing the value of the contour integral. In the limit x → ∞, the integral I(x) is dominated by the exponential. So the key factor is the real part u of f (z) = u(z) + iv(z). But since f (z) is analytic, by Eq.(5.68) its real and imaginary parts are harmonic functions and so have no minima or maxima, only saddle points. Suppose that the real part u(z) of f (z) has only one saddle point between
228
Complex-Variable Theory
the points a and b. (If it has more than one, then we must repeat the computation that follows.) If w is the saddle point, then f 0 (w) = 0
(5.323)
and near w the function f (z) is approximately 1 00 f (z) ≈ f (w) + (z − w)2 f (w). 2
(5.324)
Let’s write the second derivative as 00
f (w) = ρ eiφ .
(5.325)
We choose our contour through the saddle point w to be a straight line near w z = w + y eiθ
(5.326)
with θ fixed. As we vary y along this line, we want 00
(z − w)2 f (w) = y 2 ρ e2iθ eiφ < 0
(5.327)
2θ + φ = π.
(5.328)
or
In these terms, f (z) ≈ f (w) −
1 2 ρy . 2
(5.329)
Since z = w + y eiθ , the differential dz is dz = exp(iθ) dy. So the integral I(x) is approximately Z ∞ 1 2 00 I(x) ≈ h(w) exp x f (w) + (z − w) f (w) dz 2 −∞ Z ∞ = h(w) eiθ exf (w) exp − 21 xρy 2 dy −∞ r 2π = h(w) eiθ exf (w) . (5.330) xρ We may resolve the phase exp(iθ) by moving it inside the square-root r 2π xf (w) I(x) ≈ h(w) e (5.331) xρ e−2iθ and using Eqs.(5.325 & 5.328) to show that 00
ρ e−2iθ = ρ eiφ−iπ = −ρ eiφ = −f (w).
(5.332)
229
5.21 Phase and Group Velocities
Our formula for the saddle-point integral (5.322) then is 1/2 2π I(x) ≈ h(w) exp (x f (w)) −xf 00 (w)
(5.333)
which echoes (5.181). If there are N saddle points wj for j = 1 . . . N , then we form the sum 1/2 N X 2π I(x) ≈ h(wj ) exp (x f (wj )) . (5.334) −xf 00 (wj ) j=1
5.21 Phase and Group Velocities Suppose A(x, t) is the amplitude Z Z i(p·x−Et)/~ 3 A(x, t) = e A(p) d p = ei(k·x−ωt) B(k) d3 k
(5.335)
where B(k) = ~A(~k) varies slowly compared to the phase exp[i(k · x − ωt)]. The phase velocity v p is the linear relation x = v p t between x and t that keeps the phase φ = p · x − Et constant, that is 0 = p · dx − E dt = (p · v p − E) dt
⇐⇒
vp =
ωˆ E pˆ = k p k
(5.336)
ˆ in which p = |p|, and k = |k|. For light in the vacuum, v p = c = (ω/k) k. The group velocity v g is the linear relation x = v g t between x and t that maximizes the amplitude A(x, t) by keeping the phase φ = p · x − Et constant as a function of the momentum p ∇p (px − Et) = x − ∇p E(p) t = 0.
(5.337)
This condition of stationary phase gives the group velocity as v g = ∇p E(p) = ∇k ω(k).
(5.338)
If E = p2 /(2m), then v g = p/m. When light traverses a medium with a complex index of refraction n(k), the wave vector k becomes complex, and its (positive) imaginary part represents the scattering of photons from forward direction of the beam, typically by the electrons of the medium. For simplicity, we’ll consider the propagation of light through a medium in one dimension, that of the forward direction of the beam. In this case, the (real) frequency ω(k) is related to the (complex) wave number k by k = n(k) ω(k)/c
(5.339)
230
Complex-Variable Theory
and the phase velocity of the light is vp =
c ω = . Re(k) Re(n(k))
(5.340)
If we regard the index of refraction as a function of the frequency ω, instead of the wave-number k, then by differentiating the real part of the relation ωn(ω) = ck
(5.341)
with respect to ω, we find nr (ω) + ω
dnr (ω) dkr =c dω dω
(5.342)
in which the subscript r means real part. Thus the group velocity (5.338) of the light is vg =
c dω = . dkr nr (ω) + ω dnr /dω
(5.343)
Optical physicists call the denominator of this formula the group index of refraction dnr (ω) (5.344) ng (ω) = nr (ω) + ω dω so that in harmony with the expression (5.340) for the phase velocity vp , the group velocity is vg =
c c = ng (ω) nr (ω) + ω
dnr (ω) dω
.
(5.345)
In some media, the derivative dnr /dω is large and positive, and the group velocity vg of light there can be much less than c—as slow as 30 m/s! This effect is called slow light. In certain other media, the derivative dn/dω is so negative that the group index of refraction ng (ω) is less than unity, and in them the group velocity vg exceeds c ! This effect is called fast light. In some media, the derivative dnr /dω is so negative that dnr /dω < −nr (ω)/ω, and then ng (ω) is not only less than unity but also less than zero. In such a medium, the group velocity vg of light is negative! This effect is called backwards light. None of these effects violates causality or special relativity. Slow, fast, and backwards light can occur when the frequency ω of the light is near a peak or resonance in the total cross-section σtot for the scattering of light by the atoms of the medium. To see why, recall that the
5.21 Phase and Group Velocities
231
index of refraction n(ω) is related to the forward scattering amplitude f (ω) and the density N of scatterers by the formula (5.309) 2πc2 N f (ω) (5.346) ω2 and that the real part of the forward scattering amplitude is given by the Kramers-Kronig integral (5.313) of the total cross-section Z ∞ σtot (ω 0 ) dω 0 ω2 Re(f (ω)) = 2 . (5.347) 2π c 0 ω 02 − ω 2 n(ω) = 1 +
In the idealized case in which σtot (ω) = σ0 δ(ω/ω0 −1), this integral reduces to Re(f (ω)) =
σ0 ω0 ω 2 2π 2 c ω02 − ω 2
(5.348)
so that the real part of the index of refraction is 2πc2 2πc2 σ 0 ω0 ω 2 N Re(f (ω)) = 1 + N ω2 ω2 2π 2 c ω02 − ω 2 cN σ0 ω0 =1+ . 2 π ω0 − ω 2
nr (ω) = 1 +
(5.349)
The derivative of the real part of n(ω) then is dnr (ω) cN σ0 2ω0 ω = dω π (ω02 − ω 2 )2 so that by (5.345) the group velocity is −1 cN σ0 ω0 (ω02 + ω 2 ) vg = c 1 + π (ω02 − ω 2 )2
(5.350)
(5.351)
which implies slow light at frequencies near the resonance at ω0 . A more realistic form for nr (ω) nr (ω) = 1 +
cN σ0 ω0 π (ω − ω0 )2 + Γ2
has as its derivative cN σ0 2ω0 (ω0 − ω) dnr (ω) = dω π [(ω − ω0 )2 + Γ2 ]2 so that by (5.345) the group velocity is −1 ω02 − ω 2 + Γ2 cN σ0 ω0 vg = c 1 + . π [(ω − ω0 )2 + Γ2 ]2
(5.352)
(5.353)
(5.354)
232
Complex-Variable Theory
Now the group velocity vg is less than c whenever ω 2 < ω02 + Γ2 and much < ω . But we get fast less than c when ω ≈ ω0 . So we get slow light for ω ∼ 0 2 2 2 > ω light, vg > c, if ω > ω0 + Γ , and even backwards light, vg < 0, if ω ∼ 0 with Γ2 /ω02 tiny and N σ0 huge. Robert W. Boyd’s papers explain how to make slow and fast light (Bigelow et al., 2003) and backwards light (Gehring et al., 2006). 5.22 Applications to String Theory String theory may or may not have anything to do with physics, but it does provide many amusing applications of complex-variable theory. The coordinates σ and τ of the world sheet of a string form the complex variable z = e2(τ −iσ) .
(5.355)
The product of one operator T (z) with another operator U (w) often has poles in z − w as z → w and is well defined only if they are radially ordered, that is T (z) U (w)
if |z| > |w| and U (w) T (z)
if |w| > |z|.
(5.356)
Since by (5.355) the modulus |z| depends only upon τ , which is a temporal variable, radial order is like time order τz > τw . The modes Ln of the principal component of the energy-momentum tensor T (z) are defined by its Laurent series T (z) =
∞ X
Ln z n+2 n=−∞
(5.357)
and the inverse relation 1 Ln = 2πi
I
z n+1 T (z) dz.
Thus the commutator of two modes involves two loop integrals I I 1 1 z m+1 T (z) dz, wn+1 T (w) dw [Lm , Ln ] = 2πi 2πi
(5.358)
(5.359)
which we may deform as long as we cross no poles. Let’s hold w fixed and deform the z loop so as to keep the T ’s radially ordered when z is near w, as in the drawing (5.10). The operator-product expansion of the radially ordered product R{T (z)T (w)} is R{T (z)T (w)} =
c/2 2 1 + T (w) + T 0 (w) + . . . (z − w)4 (z − w)2 z−w
(5.360)
5.22 Applications to String Theory
233
Figure 5.10 The two counterclockwise circles about the origin preserve radial order when z is near w by veering slightly to |z| > |w| for the product T (z)T (w) and to |w| > |z| for the product T (w)T (z).
in which the prime means derivative, c is a constant, and the dots denote terms that are analytic in z and w. The commutator introduces a minus sign that cancels most of the two contour integrals and converts what remains into a tiny circle Cw about the point w as in Fig. 5.10 I I dw n+1 dz m+1 c/2 2T (w) T 0 (w) [Lm , Ln ] = w z + + . 2πi (z − w)4 (z − w)2 z−w Cw 2πi (5.361) After doing the z-integral, which is left as a homework problem (15), one may use the Laurent series (5.357) for T (w) to do the w-integral, which
234
Complex-Variable Theory
one may choose to be along a tiny circle about w = 0, and so find the commutator c m(m2 − 1) δm+n,0 (5.362) [Lm , Ln ] = (m − n) Lm+n + 12 of the Virasoro algebra.
5.23 Problems 1. Derive the two integral representations (5.42) for Bessel’s functions Jn (t) of the first kind from the integral formula (5.41). Hint: Think of the integral (5.41) as running from −π to π. 2. Do the integral I C
dz −1
z2
in which the contour C is ccw about the circle |z| = 2. 3. Let P (z) be the polynomial P (z) = (z − a1 )(z − a2 )(z − a3 )
(5.363)
with roots a1 , a2 , and a3 . Let R be the maximum of the three moduli |ak |. (a) If the three roots are all different, evaluate the integral I dz I= (5.364) C P (z) along the ccw contour z = 2Reiθ for 0 ≤ θ ≤ 2π. (b) Same problem, but for a1 = a2 6= a3 . 4. Derive the first three non-zero terms of the Laurent series for f (z) = 1/(ez − 1) about z = 0. 5. Assume that a function g(z) is meromorphic in R and has a Laurent series (5.120) about a point w ∈ R. Show that as z → w, the ratio g 0 (z)/g(z) becomes (5.118). 6. Derive (5.145) from (5.144). 7. Show that if the real part of w is negative, then for arbitrary complex z r Z ∞ π w(x+z)2 e dx = . (5.365) −w −∞
235
5.23 Problems
8. Use a ghost contour to evaluate the integral Z ∞ x sin x dx. 2 + a2 x −∞ Show your work; do not just quote the result of a commercial math program. 9. Show that the quarter-circles of the contours C± contribute f (n2 )) to the sum S in Abel-Plana example 5.8.
1 2 (f (n1 )
+
10. Derive (5.257) from (5.256). 11. Derive (5.259) from (5.258). 12. Show that the function (5.275) is analytic in the right half-plane. 13. The Bessel function Jn (x) is given by the integral I dz 1 e(x/2)(z+1/z) n+1 Jn (x) = 2πi C z
(5.366)
along a ccw contour about the origin. Find the generating function for these Bessel functions, that is, the function G(x, z) whose Laurent series has the Jn (x)’s as coefficients G(x, z) =
∞ X
Jn (x) z n .
(5.367)
n=−∞
14. Show that the Heaviside function θ(y) = (y + |y|)/(2|y|) is given by the integral Z ∞ 1 dx θ(y) = eiyx (5.368) 2πi −∞ x − i in which is an infinitesimal positive number. 15. (a) Perform the z-integral in Eq.(5.361). (b) Use the result of the part (a) to find the commutator [Lm , Ln ] of the Virasoro algebra. Hint: use the Laurent series (5.357). 16. Suppose that (z) is analytic in a disk that contains a tiny circular contour Cw about the point w as in drawing Fig 5.10. Do the contour integral I c/2 2T (w) T 0 (w) dz (5.369) (z) + + (z − w)4 (z − w)2 z − w 2πi Cw and express your result in terms of (w), T (w), and their derivatives.
6 Differential Equations
There are many kinds of differential equations — linear and nonlinear, ordinary and partial, homogeneous and inhomogeneous. Any way of correctly solving any of them is legitimate. The subject is therefore intrinsically varied and eclectic. We start with some definitions. 6.1 Ordinary Linear Differential Equations An operator of the form Lx =
N X m=0
hm (x)
dm . dxm
(6.1)
is an N th-order, ordinary, linear, differential operator. It is N thorder because the highest derivative is dN /dxN . It is ordinary because all the derivatives are with respect to the same independent variable x. And it is linear because derivatives are linear operators Lx [a1 f1 (x) + a2 f2 (x)] = a1 Lx f1 (x) + a2 Lx f2 (x).
(6.2)
If, in addition, the functions hm (x) in equation (6.1) are all constants, independent of x, then that equation is an N th-order, ordinary, linear, differential operator with constant coefficients. Example: The operator d2 + k2 dx2 is a 2d-order, linear, differential operator with constant coefficients. A differential equation of the form Lx =
Lx f (x) = 0
(6.3)
(6.4)
6.1 Ordinary Linear Differential Equations
237
in which Lx is an N th-order, ordinary, linear, differential operator is a homogeneous N th-order, linear, ordinary, differential equation. It is homogeneous because each of its terms is linear in f or one of its derivatives f (m) —there is no term that is not proportional to f or one of its derivatives. The equation Lx f (x) = g(x)
(6.5)
is an inhomogeneous N th-order, linear, ordinary, differential equation. If a differential equation is linear and homogeneous, then we can add solutions. Suppose f1 (x) and f2 (x) are two solutions of the linear, homogeneous differential equation (6.4) Lx f1 (x) = 0 Lx f2 (x) = 0.
(6.6)
Then any linear combination of these solutions f (x) = a1 f1 (x) + a2 f2 (x)
(6.7)
in which a1 and a2 are numbers, is also a solution since Lx f (x) = Lx [a1 f1 (x) + a2 f2 (x)] = a1 Lx f1 (x) + a2 Lx f2 (x) = 0.
(6.8)
This additivity of solutions often makes it possible to find general solutions of linear homogeneous differential equations. For instance, two solutions of the 2d-order linear, homogeneous ODE 2 d 2 + k f (x) = 0 (6.9) dx2 are sin kx and cos kx or equivalently exp(±ikx), and so any linear combination f (x) = a1 cos kx + a2 sin kx
(6.10)
is also a solution. If the only numbers k1 . . . kN for which the linear combination of N functions 0 = k1 y1 (x) + k2 y2 (x) + · · · + kN yN (x)
(6.11)
vanishes for all x are k1 = 0, k2 = 0, . . . kN = 0, then the functions y1 , . . . , yN are linearly independent. Otherwise, they are linearly dependent. Suppose an ordinary homogeneous linear differential equation Lx f (x) = 0
(6.12)
has N linearly independent solutions fj and that all other solutions can be
238
Differential Equations
expressed in terms of these N solutions. Then these linearly independent solutions form a complete basis for the space of solutions of equation (6.12), and the general solution is a linear combination of the fj ’s f (x) =
N X
aj fj (x)
(6.13)
j=1
in which the aj are arbitrary constants. With an added source term s(x), the differential equation (6.12) becomes an inhomogeneous ordinary linear differential equation Lx f i (x) = s(x).
(6.14)
If f1i (x) and f2i (x) are any two solutions of this inhomogeneous differential equation, then their difference f1i (x) − f2i (x) is a solution of the associated homogeneous equation (6.12) Lx f1i (x) − f2i (x) = Lx f1i (x) − Lx f2i (x) = s(x) − s(x) = 0. (6.15) Thus this difference must be given by the general solution (6.13) of (6.12) for some set of coefficients aj f1i (x)
−
f2i (x)
=
N X
aj fj (x).
(6.16)
j=1
It follows therefore that every solution f1 (x) of the inhomogeneous differential equation (6.14) is the sum of another solution f2 (x) of that equation and some solution of the associated homogeneous equation (6.12) f1i (x) = f2i (x) +
N X
aj fj (x).
(6.17)
j=1
In other words, the general solution of an inhomogeneous equation is a particular solution of that equation plus the general solution of the associated homogeneous equation. Notation: Often one uses primes or dots to denote derivatives: d2 f (x) df (x) and f 00 ≡ dx dx2 df (t) d2 f (t) f˙(t) ≡ and f¨(t) ≡ . dt dt2 For higher derivatives, one sometimes uses superscripts, as in f 0 (x) ≡
f (k) (x) ≡
dk f (x) . dxk
(6.18)
(6.19)
6.2 Linear Partial Differential Equations
239
Some use subscripts for partial derivatives, as in fx (x, y) ≡
∂f (x, y) ∂x
and fx,yy (x, y) ≡
∂ 3 f (x, y) . ∂x∂y 2
(6.20)
A nonlinear differential equation is one in which n a power of the unknown function f n (x) or one of its derivatives f (k) (x) other than n = 1 or n = 0 appears. For instance, the equations f 00 (x) = f 3 (x) f 0 (x)
2
+ f (x) = 0
(6.21)
are nonlinear differential equations. In general, we can not add two solutions of a nonlinear equation and expect to get a third solution.
6.2 Linear Partial Differential Equations An operator of the form Lx1 ,...,xk =
N1X ,...,Nk m1 ,...,mk =0
gm1 ,...,mk (x1 , . . . , xk )
∂ m1 +···+mk mk 1 ∂xm 1 . . . ∂xk
(6.22)
is a linear partial differential operator of order N = N1 + . . . Nk in k variables. The equation Lx1 ,...,xk f (x1 , . . . , xk ) = 0
(6.23)
is a linear homogeneous partial differential equation of order N = N1 + . . . Nk in k variables. That is, a partial differential equation is a (whole) differential equation that has partial derivatives. Example: The equation 2 ∂2 ∂2 ∂ Lxy f (x, y) = + + f (x, y) = 0 (6.24) ∂x2 ∂x∂y ∂y 2 is a 2d-order linear homogeneous partial differential equation. Linear combinations of solutions of a linear homogeneous partial differential equation (6.23) are also solutions of that equation. Thus if f1 and f2 are solutions of (6.24), then so is f = a1 f1 + a2 f2 Lxy f (x, y) = a1 Lxy f1 (x, y) + a2 Lxy f2 (x, y) = 0.
(6.25)
Additivity of solutions is a property of all linear homogeneous differential equations, whether ordinary or partial.
240
Differential Equations
The general solution f of a linear homogeneous partial differential equation (6.23) is a sum over a complete set of solutions fi of that equation with arbitrary coefficients ai X f (x1 . . . xk ) = ai fi (x1 . . . xk ). (6.26) i
A linear partial differential equation with a source term s(x1 , . . . , xk ) Lx1 ,...,xk f (x1 , . . . , xk ) = s(x1 , . . . , xk )
(6.27)
is an inhomogeneous linear partial differential equations because of the added source term. Just as with ordinary differential equations, the difference of two solutions of the inhomogeneous linear partial differential equation (6.27) is a solution of the associated homogeneous equation (6.23) Lx1 ,...,xk [f1 (x1 , . . . , xk ) − f2 (x1 , . . . , xk )] = 0.
(6.28)
Thus the general solution of the inhomogeneous linear partial differential equation (6.27) is the sum of a particular solution of that equation and the general solution (6.26) of the associated homogeneous equation (6.23) X f1 (x1 . . . xk ) = f2 (x1 . . . xk ) + ai fi (x1 . . . xk ). (6.29) i
6.3 Separable Partial Differential Equations A linear partial differential equation (PDE) is separable if it can be decomposed into ordinary differential equations (ODEs). One then finds solutions to the ODEs and thus to the original PDE. The general solution to the PDE is then a sum over all of its linearly independent solutions with arbitrary coefficients. Sometimes the separability of a differential operator or of a differential equation depends upon the choice of coordinates. Example—The Helmholtz equation: In several coordinate systems, one can convert the linear homogeneous partial differential equation −∇ · ∇f (r) = k 2 f (r)
(6.30)
known as Helmholtz’s equation, into ordinary differential equations by writing the function f (r) as a product of functions of a single variable. In two dimensions and in rectangular coordinates r = (x, y), the function f (x, y) = X(x)Y (y) is a solution of the Helmholtz equation as long as
6.3 Separable Partial Differential Equations
241
X and Y satisfy −
d2 X(x) = kx2 X(x) dx2
and
−
d2 Y (y) = ky2 Y (y) dy 2
(6.31)
with kx2 +ky2 = k 2 . The general solution to these ODEs is given by Eq. (6.10). In polar coordinates (r, θ), the laplacian is (11.261) ∇ · ∇f = 4f =
∂2f 1 ∂f 1 ∂2f + + ∂r2 r ∂r r2 ∂θ2
(6.32)
and Helmholtz’s equation − 4f = k 2 f is again separable. We let f (r, θ) = R(r) Θ(θ) and get − R00 Θ + r−1 R0 Θ + r−2 RΘ00 = k 2 R Θ. (6.33) Multiplying both sides by r2 /R Θ, we find R00 R0 Θ00 − r − r2 k2 = = −λ. R R Θ Thus if Θ is a linear combination of sines and cosines √ √ Θ(θ) = a sin( λ θ) + b cos( λ θ) − r2
(6.34)
(6.35)
and if function R satisfies Bessel’s equation r2 R00 + rR0 + r2 k 2 R = λR
(6.36)
(Friedrich Bessel, 1784–1846), then the function f = RΘ is a solution to Helmholtz’s equation − 4f = k 2 f . The eigenvalue λ must be an integer if Θ is to be single valued on the interval [0, 2π]. In three dimensions and in rectangular coordinates r = (x, y, z), the function f (x, y, z) = X(x)Y (y)Z(z) is a solution of the Helmholtz equation as long as X, Y , and Z satisfy d2 Z(z) = kz2 Z(z) dz 2 (6.37) 2 2 2 2 with kx + ky + kz = k . The general solution to these ODEs is given by Eq. (6.10). In cylindrical coordinates r = (ρ, φ, z), the laplacian is (11.263) 1 1 ∇ · ∇f = 4f = (ρ f,ρ ),ρ + f,φφ + ρ f,zz . (6.38) ρ ρ −
d2 X(x) = kx2 X(x), dx2
−
d2 Y (y) = ky2 Y (y), dy 2
and
−
Thus the function f (ρ, φ, z) = B(ρ)Φ(φ)Z(z)
(6.39)
242
Differential Equations
will obey Helmholtz’s equation −4f = α2 f if B satisfies Bessel’s equation d ρ dρ
dB ρ + (α2 + k 2 )ρ2 − n2 B = 0 dρ
(6.40)
and Φ and Z satisfy −
d2 Φ = n2 Φ(φ) dφ2
and
d2 Z = k 2 Z(z). dz 2
(6.41)
For by substituting (6.39) for f into (6.38) and using (6.41), we get 1 BZ ∇ · ∇f = ΦZ (ρ B,ρ ),ρ + Φ,φφ + ρBΦ Z,zz ρ ρ 1 2 BΦZ 2 = ΦZ (ρ B,ρ ),ρ − n + k ρBΦ Z ρ ρ n2 1 2 (ρ B,ρ ),ρ + (k − 2 )B . (6.42) = ΦZ ρ ρ Thus by using Bessel’s equation (6.40), we find that f = BΦZ satisfies the Helmholtz equation (6.30) i ΦZ h ∇ · ∇ + α2 BΦZ = 2 ρ (ρ B,ρ ),ρ + (α2 + k 2 )ρ2 − n2 B = 0. (6.43) ρ The solution f is f (ρ, φ, z) = Jn (
p α2 + k 2 ρ)e±inφ e±kz
(6.44)
where n must be an integer if the solution is to apply to the full range of φ from 0 to 2π. The case in which α = 0 corresponds to Laplace’s equation with solution f (ρ, φ, z) = Jn (kρ)e±inφ e±kz .
(6.45)
In spherical coordinates, the laplacian is given by the formula (11.266) of Sec. 11.40 as r2 f,r ,r (sin θf,θ ),θ f,φφ 4f = + + 2 2 . (6.46) 2 2 r r sin θ r sin θ The function f (r, θ, φ) f (r, θ, φ) = R(r)Θ(θ)Φ(φ)
(6.47)
will be a solution of the Helmholtz equation (6.30) if the radial function R satisfies Bessel’s equation for spherical coordinates 1 d R 2 dR r + k 2 R − `(` + 1) 2 = 0 (6.48) r2 dr dr r
6.4 Wave Equations
243
or r2 R00 + 2rR0 + k 2 r2 − `(` + 1) R = 0 and if Θ satisfies the associated Legendre equation dΘ m2 1 d sin θ + `(` + 1) − Θ=0 sin θ dθ dθ sin2 θ
(6.49)
(6.50)
and if Φ obeys the equation d2 Φ = −m2 Φ dφ2
(6.51)
in which ` and m are integers, satisfying −` ≤ m ≤ ` ≥ 0. In three dimensions, Helmholtz’s equation separates in 11 standard coordinate systems (Morse and Feshbach, 1953, pp. 655–664).
6.4 Wave Equations Some of the most important linear homogeneous partial differential equations are easy to solve. The Klein-Gordon equation, for example, is the wave equation ∂2 2 2 ij − m A(x) = 0 (6.52) 2 − m A(x) ≡ η ∂xi ∂xj in natural units with ~ = c = 1. We can solve this linear homogeneous partial differential equation by choosing A(x) to depend only upon the specific linear combination px = pi xi A(x) = B(px) = B(pi xi ).
(6.53)
∂ ∂ A(x) = B(pi xi ) = pi B 0 (pi xi ) ∂xi ∂xi
(6.54)
For then
and so η ij
∂2 ∂2 2 ij 2 − m A(x) = η − m B(px) ∂xi ∂xj ∂xi ∂xj = η ij pi pj B 00 (pi xi ) − m2 B(px) = p2 B 00 (pi xi ) − m2 B(px) = 0.
(6.55)
So if B is a function like sin px or cos px or exp(±ipx) that satisfies B 00 = −B
(6.56)
244
Differential Equations
then A(x) = B(pi xi ) will solve the wave equation (6.52) as long as the momentum p satisfies p2 + m2 = 0
(6.57)
which is the mass-shell condition (11.60) for a particle of mass m. Example—Field of a Spinless Boson: A spinless boson of mass m is described by a quantum field of the form Z h i d3 p p φ(x) = a(p)eipx + a† (p)e−ipx (6.58) 2p0 (2π)3 p which satisfies the wave equation (6.52) because p0 = p2 + m2 . The † annihilation and creation operators a(p) and a (p) respectively represent the annihilation and the creation of the bosons and obey the commutation relations [a(p), a† (p0 )] = δ 3 (p − p0 )
(6.59)
in units with ~ = c = 1. Incidentally, these relations imply that the field ˙ φ(x) and its time derivative φ(y) satisfy the canonical, equal-time commutation relation ˙ [φ(x), φ(y)] 0 0 = i δ 3 (x − y). (6.60) x =y
Example—The Field of the Photon: Similarly, in the Coulomb or radiation gauge ∇ · A(x) = 0
(6.61)
the field A0 is a function of the charge density and the vector potential A in the absence of charges and currents satisfies the wave equation 2A(x) = 0.
(6.62)
Unlike the wave equation (6.52) for a massive scalar field, this equation is a vector equation without a mass term because the photon is a spin-one, massless particle. The radiation-gauge electromagnetic field in vacuum is 2 Z X
h i d3 p p e(p, s) a(p, s) eipx + e∗ (p, s) a† (p, s) e−ipx 2p0 (2π)3 s=1 (6.63) in which the sum is over the two possible polarizations s; the energy p0 A(x) =
6.4 Wave Equations
245
is equal to the modulus |p| of the momentum, so that p2 = 0; and the polarization vectors e(p, s) are perpendicular to the momentum p p · e(p, s) = 0
(6.64)
so as to respect the gauge condition (6.61). The annihilation and creation operators obey the commutation relation [a(p, s), a† (p0 , s0 )] = δ 3 (p − p0 ) δs,s0 .
(6.65)
Example—The Dirac Equation: Fields χc (x) that describe particles of spin one-half have four components, c = 1 . . . 4. Apart from interactions, they satisfy the Dirac equation which has the explicit form a (γbc ∂a + mδbc ) χc (x) = 0
(6.66)
in which repeated indices are summed over — b, c from 1 to 4 and a from 0 to 3. In matrix notation, the Dirac equation is (γ a ∂a + m) χ(x) = 0.
(6.67)
The Dirac gamma matrices are defined by the 16 rules {γ a , γ b } = 2η ab
(6.68)
in which η is the flat space-time metric, η = diag(−1, 1, 1, 1). Suppose now that φ(x) is any four-component field that satisfies the KleinGordon equation (6.52) (6.69) m2 − η ab ∂a ∂b φ(x) = 0. Then the field (Cahill and Cahill, 2006) χ(x) = (m − γ a ∂a )φ(x) will satisfy the Dirac equation (6.67) because (γ a ∂a + m) χ(x) = (γ a ∂a + m) m − γ b ∂b φ(x) = m2 − γ a γ b ∂a ∂b φ(x) = m2 − 21 ({γ a , γ b } + [γ a , γ b ])∂a ∂b φ(x) = m2 − η ab ∂a ∂b φ(x) = 0.
(6.70)
(6.71)
The simplest of these fields, the Majorana field, describes a single particle, usually thought of as neutral Z i d3 p X h ipx † −ipx u (p, s) a(p, s)e + v (p, s) a (p, s)e (6.72) χb (x) = b b (2π)3/2 s
246
Differential Equations
p in which p0 = p2 + m2 , s labels the two spin states, and the operators a and a† obey the anti-commutation relations {a(p, s), a† (p0 , s0 )} ≡ a(p, s) a† (p0 , s0 ) + a† (p0 , s0 ) a(p, s) = δss0 δ(p − p0 ). (6.73) 2 The field satisfies the Klein-Gordon equation (6.52) because p = −m2 . If two particles of the same mass are described by two Majorana fields χ1 and χ2 , then one may represent them by a single Dirac field 1 ψ(x) = √ [χ1 (x) + iχ2 (x)] . 2 Quarks and leptons are described by Dirac fields.
(6.74)
6.5 First-Order Differential Equations The equation dy P (x, y) = f (x, y) = − dx Q(x, y)
(6.75)
P (x, y) dx + Q(x, y) dy = 0
(6.76)
or “system” is a first-order ordinary differential equation. 6.6 Separable First-Order Differential Equations If in the first-order ordinary differential equation (6.76) one can separate the dependent variable y from the independent variable x F (x) dx + G(y) dy = 0
(6.77)
then the equation (6.76) is said to be separable and its daughter (6.77) is said to be separated. Once the variables are separated, one can integrate and so obtain an equation, called the general integral Z x Z y 0= F (x0 ) dx0 + G(y 0 ) dy 0 (6.78) x0
y0
relating y to x and providing a solution y(x) of the differential equation. Example: Zipf ’s Law (Gell-Mann, 1994, 2008, pp.92–100) is one of the relations discovered in 1913 by Auerbach who noticed that many quantities are distributed as dx dn = −a k+1 (6.79) x
6.6 Separable First-Order Differential Equations
247
an ODE that is separable and separated. For k 6= 0, we may integrate this to a (6.80) n+c= k xk or 1/k a x= (6.81) k(n + c) in which c is a constant. The case k = 1 occurs frequently x=
a n+c
(6.82)
and is called Zipf ’s law. With c = 0, it applies approximately to the populations of cities: if the largest city (n = 1) has population x, then the populations of the second, third, fourth cities (n = 2, 3, 4) will be x/2, x/3, and x/4. Again with c = 0, Zipf’s law applies to the occurrence of numbers x in a table of some sort. The rank of the number x is approximately a (6.83) n= . x So the number of numbers that occur with first digit D and, say, 4 trailing digits will be 1 1 n(D0000) − n(D9999) = a − D0000 D9999 9999 =a D0000 × D9999 104 a 10−4 ≈a = . (6.84) D(D + 1) 108 D(D + 1) So the ratio of the number of numbers with first digit D to the number with first digit D0 is D0 (D0 + 1) . (6.85) D(D + 1) For example, the first digit is more likely to be 1 than 9 by a factor of 45. The German government uses this formula (6.85) to catch tax cheaters. Example: The logistic equation (Gell-Mann, 2008) y dy = ay 1 − (6.86) dt Y is both separable and separated. It describes a wide range of phenomena
248
Differential Equations
whose evolution with time t is sigmoidal such as the cumulative number of casualties in a war, the cumulative number of deaths in London’s great plague, and the cumulative number of papers in an academic’s career. It also describes the effect y on an animal of a given dose t of a drug. With f = y/Y , the logistic equation (6.86) is df = af (1 − f (t)) dt
(6.87)
or a dt =
df df df = + f (1 − f ) f 1−f
(6.88)
which we may integrate to a(t − th ) = ln
f 1−f
(6.89)
Exponentiating ea(t−th ) =
f 1−f
(6.90)
and solving for f , we find f (t) = Example 6.1 (Lattice QCD)
ea(t−th ) . 1 + ea(t−th )
(6.91)
In lattice field theory, the beta-function
dg (6.92) d ln a tells us how the coupling constant g depends upon the lattice spacing a. The beta-function of quantum chromodynamics is β(g) ≡ −
β(g) = −β0 g 3 − β1 g 5
(6.93)
where 1 2 β0 = 11 − nf (4π)2 3 8 1 β1 = 102 − 10 n − n f f (4π)4 3
(6.94)
in which nf is the number of light quark flavors. Combining the definition (6.92) of the β-function with its expansion (6.93) for small g, one arrives at the differential equation dg = β0 g 3 + β1 g 5 d ln a
(6.95)
249
6.7 Hidden Separability
which one may integrate Z Z d ln a = ln a + c =
1 β1 dg =− + 2 ln 3 5 2 β0 g + β1 g 2β0 g 2β0
β0 + β1 g 2 g2 (6.96)
to find a(g) = d
β0 + β1 g 2 g2
β1 /2β02
e−1/2β0 g
2
(6.97)
in which d is a constant of integration. The term β1 g 2 is of higher order in g, and if one drops it and absorbs a factor of β02 into a new constant of integration Λ, then one gets a(g) =
−β1 /2β02 −1/2β0 g2 1 β0 g 2 e . Λ
(6.98)
As g → 0, the lattice spacing a(g) goes to zero very fast (as long as nf ≤ 16). The inverse of this relation (18.37) is −1/2 g(a) ≈ β0 ln(a−2 Λ−2 ) + (β1 /β0 ) ln ln(a−2 Λ−2 ) . (6.99) It shows that the coupling constant slowly goes to zero with a, which is a lattice version of asymptotic freedom.
6.7 Hidden Separability As long as the functions P (x, y) = U (x)V (y) and Q(x, y) = R(x)S(y) in the ODE (6.76) can be factored into the product of a function of x times a function of y, then the ODE is separable (Ince, 1956). Thus, by dividing the ODE U (x)V (y)dx + R(x)S(y)dy = 0
(6.100)
by R(x)V (y), we may separate the variables U (x) S(y) dx + dy = 0 R(x) V (y)
(6.101)
U (x) dx + R(x)
(6.102)
and integrate Z
Z
S(y) dy = C V (y)
in which C is a constant of integration. Example: We may separate the variables in the ODE x(y 2 − 1) dx − y(x2 − 1) dy = 0
(6.103)
250
Differential Equations
by dividing by (y 2 − 1)(x2 − 1) x y dx − 2 dy = 0. 2 x −1 y −1
(6.104)
Integrating, we find ln(x2 − 1) − ln(y 2 − 1) = − ln C
(6.105)
C(x2 − 1) = y 2 − 1
(6.106)
or
which we may solve for y(x) y(x) =
p
1 + C(x2 − 1).
(6.107)
6.8 Exact First-Order Differential Equations The differential equation P (x, y) dx + Q(x, y) dy = 0
(6.108)
is exact if its left-hand side is the differential of some function φ(x, y) P (x, y) dx + Q(x, y) dy = dφ(x, y) = φx (x, y) dx + φy (x, y) dy.
(6.109)
The criteria of exactness are P (x, y) =
∂φ(x, y) ≡ φx (x, y) ∂x
(6.110)
Q(x, y) =
∂φ(x, y) ≡ φy (x, y). ∂y
(6.111)
and
Thus, if the ODE (6.108) is exact, then Py (x, y) = φyx (x, y) = φxy (x, y) = Qx (x, y)
(6.112)
which is called the condition of integrability. This condition implies that the ODE (6.108) is exact and integrable, as we’ll see after a couple of examples. A first-order ODE that is separable and separated P (x)dx + Q(y)dy = 0
(6.113)
Py = 0 = Qx .
(6.114)
is exact because
6.8 Exact First-Order Differential Equations
251
But a first-order ODE may be exact without being separable. Example: Boyle’s Law constrains changes in the pressure P and volume V of an ideal gas P dV + V dP = 0
(6.115)
at a fixed temperature T . Since P dV + V dP = d(P V )
(6.116)
the ODE (6.115) is exact. Its integrated form is the ideal-gas law P V = N kT
(6.117)
in which N is the number of molecules in the gas and k = 1.38066 × 10−23 J/K = 8.617385×10−5 eV/K is Boltzmann’s constant. Incidentally, a more accurate formula, proposed by van der Waals (1837– 1923) in his doctoral thesis in 1873, is " 2 # N P+ a0 V − N b0 = N kT (6.118) V in which a0 represents the mutual attraction of the molecules and has the dimensions of energy times volume and b0 is the effective volume of a single molecule. This equation was one of many signs that molecules were real particles, independent of the imagination of chemists. Lamentably, most physicists refused to accept the reality of molecules until 1905 when Einstein related the viscous-friction coefficient ζ and the diffusion constant D to the energy kT of a thermal fluctuation by the equation ζ D = kT , as explained in Sec. 13.9. Example: Human Population Growth (Gell-Mann, 2008) is approximately described by the ODE dN N2 = (6.119) dt b which incorporates the fact that it takes two to tango. We may integrate its separated and hence exact form dN dt = N2 b
(6.120)
and get N (t) =
b T −t
in which T is the time at which the population becomes infinite.
(6.121)
252
Differential Equations
Surprisingly, this simple model with T = 2025 years and b = 2 × 1011 years gives a reasonable description of the world’s population between the years 1 and 1970.
6.9 The Meaning of the Condition of Integrability We may integrate the differentials of any first-order ordinary differential equation P (x, y) dx + Q(x, y) dy = 0
(6.122)
along any contour C in the x-y plane, but in general the functional we obtain Z
(x,y)
P (x0 , y 0 ) dx0 + Q(x0 , y 0 ) dy 0
φ(x, y, C, x0 , y0 ) =
(6.123)
(x0 ,y0 )C
will depend upon the contour C of integration as well as upon the end-points. The functional φ(x, y, C, x0 , y0 ) will be a simple function φ(x, y, x0 , y0 ) that depends on the end points but not upon the contour C linking them if and only if the integral along any closed contour vanishes I P (x, y) dx + Q(x, y) dy = 0. (6.124) But this contour integral is the real part of the complex contour integral I Re (P (x, y) − iQ(x, y)) (dx + idy) = I Re (u(x, y) + iv(x, y)) (dx + idy) = 0 (6.125) which vanishes if and only if the real and imaginary parts u(x, y) and v(x, y) satisfy the second of the two Cauchy-Riemann conditions (5.47) Py = uy = −vx = Qx
(6.126)
at all points inside any of the contours. But Py = Qx is the condition of integrability (6.112). Thus, the condition of integrability guarantees that all the closed-contour integrals (6.124) vanish, and so that the contour integral Z
(x,y)
φ(x, y, x0 , y0 ) =
P (x0 , y 0 ) dx0 + Q(x0 , y 0 ) dy 0
(6.127)
(x0 ,y0 )
defines a function that depends only upon the end points and not upon the
6.9 The Meaning of the Condition of Integrability
253
contour. Setting this function equal to a constant (of integration) φ(x, y, x0 , y0 ) = C
(6.128)
gives us a solution y(x) and implies that dφ(x, y, x0 , y0 ) = P (x, y) dx + Q(x, y) dy = 0
(6.129)
which is the ODE (6.122). A Method of Integration: (Ince, 1956) We may use the criteria of exactness (6.110 & 6.111) P (x, y) =
∂φ(x, y) ≡ φx (x, y) ∂x
(6.130)
Q(x, y) =
∂φ(x, y) ≡ φy (x, y). ∂y
(6.131)
and
to integrate the differential equation P (x, y) dx + Q(x, y) dy = 0.
(6.132)
To do so, we use the first criterion (6.130) to integrate the condition φx = P in the x-direction getting a known integral R(x, y) and an unknown “constant” of integration C(y) which depends upon y Z φ(x, y) = P (x, y) dx + C(y) = R(x, y) + C(y). (6.133) Then the second criterion (6.131) tells us that φy = Q or Q(x, y) = φy (x, y) = Ry (x, y) + Cy (y).
(6.134)
Since we know Q and Ry , we can find C(y) by integrating Cy in the ydirection Z C(y) = Q(x, y) − Ry (x, y) dy + D. (6.135) We then put C(y) into the formula (6.133) for φ(x, y). Our solution y(x) then follows from setting φ(x, y) equal to a constant φ(x, y) = R(x, y) + C(y) Z = R(x, y) + Q(x, y) − Ry (x, y) dy + D = E.
(6.136)
Example of a Method of Integration: (Ince, 1956) The ODE P (x, y) dx + Q(x, y) dy = log(y 2 + 1) dx +
2y(x − 1) dy = 0 y2 + 1
(6.137)
254
Differential Equations
has factorized P and Q, so it’s separable. It’s also exact since Py =
2y = Qx y2 + 1
(6.138)
and so we can apply the method just outlined. First, as in (6.133), we integrate φx = P in the x-direction Z φ(x, y) = log(y 2 + 1) dx + C(y) = x log(y 2 + 1) + C(y). (6.139) Then, as in (6.134), we use φy = Q φ(x, y)y =
2xy 2y(x − 1) + Cy (y) = Q(x, y) = +1 y2 + 1
y2
(6.140)
to find for Cy the formula Cy (y) = −
2y +1
y2
(6.141)
which we integrate in the y-direction as in (6.135) C(y) = − log(y 2 + 1) + D.
(6.142)
We now put C(y) into our formula (6.139) for φ(x, y) φ(x, y) = x log(y 2 + 1) − log(y 2 + 1) + D = (x − 1) log(y 2 + 1) + D
(6.143)
which we set equal to a constant φ(x, y) = (x − 1) log(y 2 + 1) + D = E
(6.144)
(x − 1) log(y 2 + 1) = F
(6.145)
h i1/2 y(x) = eF/(x−1) − 1 .
(6.146)
or more simply
which we solve for y(x)
6.10 Integrating Factors With great luck, one might invent an integrating factor α(x, y) that makes the ODE P (x, y) dx + Q(x, y) dy = 0
(6.147)
6.11 Homogeneous Functions
255
exact α(x, y) P (x, y) dx + α(x, y) Q(x, y) dy = dφ(x, y).
(6.148)
Then α(x, y) P (x, y) = φ(x, y)x
(6.149)
α(x, y) Q(x, y) = φ(x, y)y
(6.150)
and
and so (α(x, y) P (x, y))y = φ(x, y)xy = (α(x, y) Q(x, y))x .
(6.151)
Example of an Integrating Factor: The equation ydx − xdy = 0 is not exact, but α(x, y) = 1/x2 is an integrating factor. For after dividing by x2 , we have 1 y (6.152) − 2 dx + dy = 0 x x so that P = −y/x2 and Q = 1/x and Py = −
1 = Qx x2
(6.153)
which shows that (6.152) is exact. Another integrating factor is α(x, y) = 1/(xy) which separates the variables dx dy − =0 (6.154) x y so that we can integrate and get ln(y/y0 ) = ln(x/x0 )
(6.155)
or ln(yx0 /(xy0 )) = 0 which implies that y = (y0 /x0 )x.
6.11 Homogeneous Functions A function f (x) = f (x1 , . . . , xk ) of k variables xi is homogeneous of degree n if f (tx) = f (tx1 , . . . , txk ) = tn f (x). For instance, z 2 ln(x/y) is homogeneous of degree 2 because (tz)2 ln(tx/(ty)) = t2 z 2 ln(x/y) .
(6.156)
(6.157)
256
Differential Equations
By differentiating Eq.(6.156) with respect to t, we find k
X ∂f (tx) d = ntn−1 f (x). f (tx) = xi dt ∂txi
(6.158)
i=1
Setting t = 1, we see that a function that is homogeneous of degree n satisfies k X i=1
xi
∂f (x) = n f (x) ∂xi
(6.159)
which is one of Euler’s many theorems. 6.12 The Virial Theorem Consider N particles moving non-relativistically in a potential V (x) of 3N variables that is homogeneous of degree n. Their virial is the sum of the products of the coordinates xi multiplied by the momenta pi G=
3N X
xi p i .
(6.160)
i=1
If the particles are bound by the potential V , then it is reasonable to assume that the positions and momenta of the particles are bounded for all times t, and we will make this assumption. The time derivative of the virial is 3N
3N
X dG X = (vi pi + xi Fi ) = 2T + xi Fi dt i=1
(6.161)
i=1
in which Fi = ∂V (x)/∂xi is a component of the force. We now form the infinite time average of both sides of this equation 3N
X G(t) − G(0) dG =h i = 2hT i + h xi Fi i. lim t→∞ t dt
(6.162)
i=1
By assumption, the positions and momenta of the particles are bounded for all times t, and so the virial G(t) is bounded for all times t. Thus the time average of G˙ must vanish and so 3N X 0 = 2hT i + h xi Fi i.
(6.163)
i=1
Newton’s law Fi = −
∂V (x) xi
(6.164)
6.13 Homogeneous First-Order ODEs
257
then implies that 3N X ∂V (x) i. 2hT i = h xi xi
(6.165)
i=1
Thus, if the potential V (x) is a homogeneous function of degree n, then Euler’s theorem (6.159) gives the virial theorem n hT i = hV (x)i. (6.166) 2 The long-term time average of the kinetic energies of particles trapped in a homogeneous potential of degree n is n/2 times the long-term time average of their potential energies. For example, a 1/r gravitational or electrostatic potential is homogeneous of degree −1, and so the virial theorem asserts that particles bound in such wells must have long-term time averages that satisfy 1 hT i = − hV (x)i. 2
(6.167)
Similarly, the harmonic potential V (r) = r2 is homogeneous of degree 2, and so particles confined in such a potential must have long-term time averages that satisfy hT i = hV (x)i.
(6.168)
6.13 Homogeneous First-Order ODEs Suppose the functions P (x, y) and Q(x, y) in the first-order ODE (Ince, 1956) P (x, y) dx + Q(x, y) dy = 0
(6.169)
are homogeneous of degree n. We change variables from x and y to x and y(x) = xv(x) so that dy = xdv + vdx, and so P (x, xv)dx + Q(x, xv)(xdv + vdx) = 0.
(6.170)
The homogeneity of P (x, y) and Q(x, y) imply that xn P (1, v)dx + xn Q(1, v)(xdv + vdx) = 0.
(6.171)
Rearranging this equation, we are able to separate the variables dx Q(1, v) + dv = 0 x P (1, v) + vQ(1, v)
(6.172)
258
Differential Equations
and integrate Z log x +
Q(1, v) dv = C P (1, v) + vQ(1, v)
(6.173)
and find v(x) and so the solution y(x) = xv(x). Example: In the differential equation (x2 − y 2 ) dx + 2xy dy = 0
(6.174)
the coefficients of the differentials P (x, y) = x2 − y 2 and Q(x, y) = 2xy are homogeneous functions of degree n = 2, so the above method applies. With y(x) = xv(x), we have x2 (1 − v 2 )dx + 2x2 v(vdx + xdv) = 0
(6.175)
in which x2 cancels out, leaving (1 + v 2 )dx + 2vxdv = 0. Separating variables and integrating, we find Z Z dx 2v dv + = log C x 1 + v2 or log(1 + v 2 ) + log x = log C.
(6.176)
(6.177)
(6.178)
So (1 + v 2 )x = C which leads to the general integral x2 + y 2 = Cx
(6.179)
or y(x) =
p Cx − x2 .
(6.180)
6.14 Linear First-Order ODEs The general form of a linear first-order ODE is dy + r(x)y = s(x). dx
(6.181)
We always can find an integrating factor α(x) that makes 0 = α(ry − s)dx + αdy
(6.182)
259
6.14 Linear First-Order ODEs
exact. Here P = α(ry − s) while Q = α, so the condition (6.112) for this equation to be exact is Py = αr = Qx = αx
(6.183)
d ln α =r dx
(6.184)
or
which we integrate to x
Z α(x) = α(x0 ) exp
0
r(x )dx
0
.
(6.185)
x0
So now since αr = αx , the original equation (6.181) with this integrating factor is αyx + αry = αyx + αx y = (αy)x = αs
(6.186)
which we may integrate to x
Z α(x)y(x) = α(x0 )y(x0 ) +
α(x0 )s(x0 )dx0
(6.187)
x0
so that α(x0 )y(x0 ) 1 y(x) = + α(x) α(x)
Z
x
α(x0 )s(x0 )dx0
(6.188)
x0
in which α(x) is the exponential (6.185). More explicitly, y(x) is ! # Z x " Z x Z x0 0 0 00 00 0 0 y(x) = exp − r(x )dx y(x0 ) + exp r(x )dx s(x )dx . x0
x0
x0
(6.189) The first term in the square brackets multiplied by the prefactor α(x0 )/α(x) is the general solution of the associated homogeneous equation yx + ry = 0. The second term in the square brackets multiplied by the prefactor α(x0 )/α(x) is a particular solution of the inhomogeneous equation yx + ry = s. Thus equation (6.189) expresses the general solution of the inhomogeneous equation (6.181) as the sum of a particular solution of the inhomogeneous equation and the general solution of the associated homogeneous equation. We were able to find an integrating factor α because the original equation (6.181) had P (x, y) = r(x)y − s(x) and Q(x, y) = 1. When P and Q are more complicated, integrating factors are harder to find or nonexistent. Example—Bodies Falling in Air: The downward speed v of a mass
260
Differential Equations
m in a gravitational field of constant acceleration g is described by the inhomogeneous first-order ODE mvt = mg − bv
(6.190)
in which b represents air resistance. This equation is like (6.181) with p(t) = b/m and q(t) = g, and so by (6.189), its solution is mg mg −bt/m v(t) = + v(0) − e . (6.191) b b The terminal speed mg/b is roughly proportional to the ratio of the weight of the body to its surface area. For a sphere of radius r, this ratio is proportional to r3 /r2 = r, the radius. For a falling man, it is between 100 and 200 mph, but for a mouse it is so much slower that mice can fall down mine shafts and run off unhurt. Now you know why insects and birds can fly. When the falling body is microscopic, a statistical model is preferable. The potential energy of a mass m at height h is V = mgh. In an ensemble of particles at temperature T Kelvin, the height distribution P (h) follows Boltzmann’s distribution (1.408) P (h) = P (0)e−mgh/(kT )
(6.192)
in which k = 1.380 6504×10−23 J/K = 8.617 343×10−5 eV/K is his constant. The probability depends exponentially upon the mass m dropping by a factor of e with the scale height S = kT /mg, which can be a few kilometers for a small molecule. Example—An R-C Circuit: The capacitance C of a capacitor is the charge Q it holds (on each plate) divided by the applied voltage V C = Q/V.
(6.193)
The current I through the capacitor is the time derivative of the charge I = Q˙ = C V˙ .
(6.194)
The voltage across a resistor of R Ω (Ohms) through which a current I flows is V = IR.
(6.195)
So if a time-dependent voltage V (t) is applied to a capacitor in series with a resistor, then V (t) = Q/C + IR.
(6.196)
The current I therefore obeys the first-order differential equation I˙ + I/(RC) = V˙ /R
(6.197)
6.14 Linear First-Order ODEs
261
which is like (6.181) with x → t, y(x) → I(t), p(x) → 1/(RC), and q(x) → V˙ /R. Since p is a constant, the integrating factor α(x) → α(t) is α(t) = α(t0 ) e(t−t0 )/(RC) . So by equation (6.188) or (6.189), the current I(t) is # " Z t ˙ 0 (t0 −t0 )/(RC) V (t ) 0 −(t−t0 )/(RC) e dt . I(t) = e I(t0 ) + R t0
(6.198)
(6.199)
Example—Emission Rate from Fluorophores A fluorophore is a molecule that emits light when illuminated, e.g., by a laser. In most cases, the frequency of the emitted photon is less than that of the incident one; stimulated emission is rare. Consider a population of N fluorophores of which N+ are excited and can emit light and N− = N − N+
(6.200)
are unexcited. Suppose that the fluorophores are exposed to an illuminating flux I of photons, and that the cross-section for the excitation of an unexcited fluorophore is σ, so that the rate at which unexcited fluorophores become excited is IσN− . Then if the decay rate (aka, the emission rate) of the excited fluorophores is 1/τ , the time derivative of the number of excited fluorophores will be 1 1 N˙ + = − N+ + IσN− = − N+ + Iσ (N − N+ ) . τ τ
(6.201)
Using the shorthand a = Iσ + 1/τ , we have N˙ + = −aN+ + IσN
(6.202)
which we may solve by using the general formula (6.189) with p = a and q = IσN : Z t −at at0 0 0 N+ (t) = e N+ (0) + e I(t )σN dt . (6.203) 0
If the illumination I(t) is constant, then by doing the integral we find N+ (t) =
IσN a
1 − e−at + N+ (0)e−at .
(6.204)
The emission rate E of photons from the N+ (t) excited fluorophores then is N+ (t)/τ or N+ (0) −at IσN 1 − e−at + e . (6.205) E= aτ τ
262
Differential Equations
If no fluorophores were excited at t = 0, then the emission rate per fluorophore is E Iσ = 1 − e−(Iσ+1/τ )t . (6.206) N 1 + Iστ
6.15 Systems of Differential Equations Actual physical problems often involve several differential equations. The motion of n particles in three dimensions is described by 3n equations and problems in electrodynamics by the four Maxwell’s equations (11.74 & 11.75). This field is too vast to cover in these pages, but we may hint at some of its features by considering the Friedmann equations of general relativity (11.391 & 11.393) for the scale factor a(t) of a homogeneous, isotropic universe a ¨ 4πG =− (ρ + 3p) a 3
(6.207)
2 a˙ 8πG k = ρ− 2 a 3 a
(6.208)
and
in which k respectively is 1, 0, and -1 for closed, flat, and open geometries. (The scale factor tells how much space is expanding or contracting.) These equations become more tractable when the energy density ρ is due to a single constituent whose pressure p is related to it by the equation of state p = wρ. In this case, conservation of energy ensures (11.408–11.413) that the product ρ a3(1+w)
(6.209)
is independent of time. The constant w respectively is −1, 1/3, and 0 for the vacuum, radiation, and matter. The Friedmann equations then are a2+3w a ¨=−
4πG (1 + 3w) ρ a3(1+w) ≡ −D2 3
(6.210)
where D is a constant and 2D2 a1+3w a˙ 2 + k = . 1 + 3w Example—An open universe of radiation: Here w =
(6.211) 1 3
and k = −1,
6.15 Systems of Differential Equations
263
and so the first-order Friedmann equation (6.211) becomes a˙ 2 =
D2 + 1. a2
(6.212)
Since the visible universe is expanding, we take the positive square-root and find a da (6.213) dt = √ a2 + D 2 which leads to the general integral p t = a2 + D2 + C. If we choose the constant of integration C = D2 , then we have s t 2 1− 2 a(t) = D −1 D
(6.214)
(6.215)
which satisfies the boundary condition a(0) = 0, which may or may nor be appropriate. As t → ∞, asymptotically a(t) = t/D. Example—A closed universe of matter: Here w = 0 and k = 1, and so the first-order Friedmann equation (6.211) is a˙ 2 =
2D2 − 1. a
Since the universe is expanding, we take the positive square-root r 2D2 −1 a˙ = a which leads to the general integral √ Z a da √ t= 2D2 − a p = − a(2D2 − a) − D2 arcsin 1 − D−2 a + C
(6.216)
(6.217)
(6.218)
in which C is a constant of integration. Example—An open universe of matter: Here w = 0 and k = −1, and so the first-order Friedmann equation (6.211) is a˙ 2 =
2D2 +1 a
(6.219)
264
Differential Equations
which leads to the general integral √ Z a da √ t= 2D2 + a np o p = a(2D2 + a) − D2 ln a(2D2 + a) + a + D2 + C
(6.220)
in which C is a constant of integration.
6.16 Singular Points of Second-Order ODEs Consider the 2d-order ODE y 00 = f (x, y, y 0 ).
(6.221)
If y 00 = f (x0 , y, y 0 ) is finite for all finite y and y 0 , then x0 is a regular point of the ODE. If y 00 = f (x0 , y, y 0 ) is infinite for any pair of finite y and y 0 , then x0 is a singular point of the ODE. If the 2d-order ODE is linear and homogeneous y 00 + P (x)y 0 + Q(x)y = 0
(6.222)
and both P (x0 ) and Q(x0 ) are finite, then x0 is a regular point of the ODE. But if P (x0 ) or Q(x0 ) or both are infinite, then x0 is a singular point. Some singular points are okay. If P (x) or Q(x) diverges as x → x0 , but both (x − x0 )P (x) and (x − x0 )2 Q(x) remain finite as x → x0 , then x0 is a regular singular point or, equivalently, a nonessential singular point. But if either (x − x0 )P (x) or (x − x0 )2 Q(x) diverges as x → x0 , then x0 is an irregular singular point or, equivalently, an essential singularity. The point at infinity is tricky. One sets z = 1/x. Then if (2z −P (1/z))/z 2 and Q(1/z)/z 4 remain finite as z → 0, the point x0 = ∞ is a regular point of the ODE. If (2z − P (1/z))/z and Q(1/z)/z 2 remain finite as z → 0, then x0 = ∞ is a regular singular point. Otherwise, the point at infinity is an irregular singular point or, equivalently, an essential singularity. Example: Legendre’s equation (1 − x2 )y 00 − 2xy 0 + `(` + 1)y = 0
(6.223)
or y 00 −
2x 0 `(` + 1) y + y=0 1 − x2 1 − x2
(6.224)
has regular singular points at x = ±1 and also at the point at infinity.
6.17 Frobenius’s Series Solutions
265
6.17 Frobenius’s Series Solutions Frobenius studied second-order linear homogeneous ODEs h2 (x) y 00 + h1 (x) y 0 + h0 (x) y = 0
(6.225)
in which the hn (x) are simple polynomials in the independent variable x. He expanded y as a power series in x y(x) = xk
∞ X
an xn
(6.226)
n=0
with an initial factor xk that allowed him to start the summation at n = 0. He then computed 0
y (x) =
∞ X
(k + n) an xk+n−1
(6.227)
n=0
and 00
y (x) =
∞ X
(k + n)(k + n − 1) an xk+n−2
(6.228)
n=0
and substituted these three series into the ODE (6.225). If the ODE is to be satisfied for all x, then the coefficient of every power of x must vanish. This requirement implies that the index k must satisfy a quadratic equation, called the indicial equation, and that the coefficients an must obey an infinite set of linear equations, which often lead to recurrence relations that express an in terms of an−1 and an−2 . Example: If we try this method on the simple ODE y 00 + ω 2 y = 0
(6.229)
then we get ∞ X
(k + n)(k + n − 1) an x
k+n−2
+ω
2
n=0
∞ X
an xk+n = 0.
(6.230)
n=0
In the first sum, the n = 0 term a0 k(k − 1)xk−2 can not be canceled by any of the other terms, so its coefficient must vanish k(k − 1) = 0.
(6.231)
The solutions to this indicial equation are k = 0 and k = 1, which correspond to the two solutions cos ωx and sin ωx. The n = 1 term in the first sum a1 (k + 1)kxk−1 also is not canceled by any of the other terms, and so it too must vanish. If the index k = 0, this
266
Differential Equations
term does vanish, and the coefficient a1 is arbitrary. We set a1 = 0 when k = 0 to avoid mixing in the solution sin ωx. When k = 1, we are obliged to set a1 = 0. Having caused the first two terms of the first summation to vanish, we now set j = n − 2, so that n = j + 2; we also rename n in the second summation. Then we have ∞ X (k + j + 2)(k + j + 1) aj+2 + ω 2 aj xk+j = 0.
(6.232)
j=0
We cancel all the remaining terms by requiring that the coefficients aj satisfy the recurrence relation aj+2 = −
ω2 aj . (k + j + 2)(k + j + 1)
(6.233)
Putting all this together, we find for k = 0 that a1 and all the other odd terms vanish so that y(x) = a0
∞ X (ωx)2j (−1)j = a0 cos ωx. (2j)!
(6.234)
j=0
For k = 1, the even terms a2n vanish, and we get ∞ (ωx)2j+1 a0 a0 X (−1)j = sin ωx. y(x) = ω (2j + 1)! ω
(6.235)
j=0
The method of Frobenius also allows one to expand the solution about a point x0 6= 0 ∞ X y(x) = (x − x0 )k an (x − x0 )n . (6.236) n=0
6.18 Even and Odd Differential Operators Under the parity transformation x → −x, a representative term transforms as p d k! k! n x xk = xn+k−p → (−1)n+k−p xn+k−p (6.237) dx (k − p)! (k − p)! and so the corresponding differential operator transforms as p p d d n n+p n x → (−1) x . dx dx
(6.238)
6.18 Even and Odd Differential Operators
267
The reflected form of the second-order linear differential operator L(x) = h0 (x) + h1 (x)
d d2 + h2 (x) 2 dx dx
(6.239)
therefore is L(−x) = h0 (−x) − h1 (−x)
d2 d + h2 (−x) 2 . dx dx
(6.240)
The operator L(x) is even if it is unchanged by reflection, that is, if h0 (−x) = h0 (x), h1 (−x) = −h1 (x), and h2 (−x) = h2 (x), so that L(−x) = L(x).
(6.241)
It is odd if it changes sign under reflection, that is, if h0 (−x) = −h0 (x), h1 (−x) = h1 (x), and h2 (−x) = −h2 (x), so that L(−x) = −L(x).
(6.242)
The general differential operator L(x) is neither even nor odd. But every function f (x) for which the reflected function f (−x) is well defined can be written as the sum of an even function (f (x)+f (−x))/2 and an odd function (f (x) − f (−x))/2 f (x) = 21 (f (x) + f (−x)) + 12 (f (x) − f (−x)).
(6.243)
Similarly, every differential operator L(x) can be decomposed as the sum of one that is even and one that is odd L(x) = 21 (L(x) + L(−x)) + 21 (L(x) − L(−x)).
(6.244)
Many of the standard differential operators have h0 = 1 and are even. If y(x) is a solution of the ODE 0 = L(x) y(x)
(6.245)
0 = L(−x) y(−x)
(6.246)
then also
and so if L(−x) = ±L(x), then 0 = L(x) y(−x).
(6.247)
Thus, if a differential operator L(x) has a definite parity, that is, if L(x) is either even or odd, then y(−x) is a solution if y(x) is. In this case, solutions come in pairs y(x) ± y(−x), one even, one odd.
268
Differential Equations
6.19 Fuch’s Theorem The method of Frobenius can run amok, especially if one expands about a singular point x0 . One can get only one solution or none at all. The Theorem of Fuch: If we apply Frobenius’s method to a linear homogeneous second-order ODE and expand about a regular point or a regular singular point, then we always get at least one power-series solution. 1. If the two roots of the indicial equation are equal, we get only one solution. 2. If the two roots differ by a non-integer, we get two solutions. 3. If the two roots differ by an integer, then the bigger root yields a solution.
6.20 Wronski’s Determinant If the N functions y1 (x), . . . , yN (x) are linearly dependent, then by (6.11) there is a set of coefficients k1 , . . . , kN , not all zero, such that the sum 0 = k1 y1 (x) + · · · + kN yN (x)
(6.248)
vanishes for all x. Differentiating i times, we get (i)
(i)
0 = k1 y1 (x) + · · · + kN yN (x)
(6.249)
for all x. So if we use the yj and their derivatives to define the matrix Y (i−1)
Yij (x) = yj
(x)
(6.250)
then we may express Eqs.(6.248 & 6.249) for linearly dependent functions y1 (x), . . . , yN (x) in matrix notation as 0 = Y (x)k
(6.251)
for some non-zero vector k = (k1 , k2 . . . kN ). Since the matrix Y (x) maps the nonzero vector k into zero, its determinant must vanish 0 = det(Y (x)) ≡ |Y (x)| .
(6.252)
(i−1) W (x) = |Y (x)| = yj (x)
(6.253)
This determinant
is called Wronski’s determinant or the wronskian. It vanishes on an interval if and only if the functions are linearly dependent on that interval.
269
6.21 A Second Solution
6.21 A Second Solution If we have one solution to a second-order linear homogeneous ODE, then we may use the wronskian to find a second solution. Here’s how. If y1 and y2 are two linearly independent solutions of the 2d-order linear homogeneous ODE y 00 (x) + P (x) y 0 (x) + Q(x) y(x) = 0 then their wronskian does not vanish y1 (x) y2 (x) = y1 (x) y20 (x) − y2 (x) y10 (x) 6= 0 W (x) = 0 y1 (x) y20 (x)
(6.254)
(6.255)
except perhaps at isolated points. Its derivative W 0 = y10 y20 + y1 y200 − y20 y10 − y2 y100 = y1 y200 − y2 y100
(6.256)
must obey W 0 = −y1 P y20 + Q y2 + y2 P y10 + Q y1 = −P y1 y20 − y2 y10
(6.257)
W 0 (x) = −P (x) W (x)
(6.258)
or
which we integrate to Z W (x) = W (x0 ) exp −
x
0
P (x )dx
0
(6.259)
x0
which is Abel’s formula for the wronskian (Niels Abel, 1802–1829). Having expressed the wronskian in terms of the known function P (x), we now use it to find y2 (x) from y1 (x). We note that y2 0 0 2 d W = y1 y2 − y2 y1 = y1 . (6.260) dx y1 So d dx
y2 y1
=
W y12
(6.261)
which we integrate to Z y2 (x) = y1 (x)
x
W (x0 ) 0 dx + c . y12 (x0 )
(6.262)
270
Differential Equations
Using our formula (6.259) for the wronskian, we find as the second solution " Z 0 # Z x x 1 00 00 exp − y2 (x) = y1 (x) P (x )dx dx0 . (6.263) y12 (x0 ) apart from additive and multiplicative constants. In the important special case in which P (x) = 0
(6.264)
W 0 (x) = 0
(6.265)
the wronskian is a constant
and the second solution is simply Z y2 (x) = y1 (x)
x
dx0
. y12 (x0 )
(6.266)
By Fuchs’s theorem, Frobenius’s method applied at a regular point or at a regular singular point of a second-order linear homogeneous ODE always yields at least one solution. Then we may use Wronski’s trick to find a second solution. So in principle, we always can get two linearly independent solutions expanded about a regular point or a regular singular point. 6.22 Why Not Three Solutions? We have seen that a second-order linear homogeneous ODE has two linearly independent solutions. Why not three? If y1 , y2 , and y3 were three linearly independent solutions of the secondorder linear homogeneous ODE 0 = yj00 + P yj0 + Q yj ,
(6.267)
then their 3d-order wronskian y1 y2 y3 0 W = y1 y20 y30 y 00 y 00 y 00 1 2 3
(6.268)
would not vanish except at isolated points. But the ODE (6.267) relates the second derivatives yj00 = −(P yj0 + Q yj ) to the yj0 and the yj , and so the third row of this 3d-order wronskian is a linear combination of the first two rows y1 y2 y3 0 0 0 =0 (6.269) W = y1 y2 y3 −P y 0 − Qy1 −P y 0 − Qy2 −P y 0 − Qy3 1 2 3
6.23 Boundary Conditions
271
and so it vanishes identically. One may extend this argument to show that an nth-order linear homogeneous ODE can have at most n linearly independent solutions. It will be convenient to use superscript notation (6.19) in which y (n) denotes the nth derivative of y(x) with respect to x dn y . (6.270) dxn Suppose there were n + 1 linearly independent solutions yj of the ODE y (n) ≡
y (n) + P1 y (n−1) + P2 y (n−2) + · · · + Pn−1 y 0 + Pn y = 0
(6.271)
in which the Pk ’s are functions of x. Then we could form a Wronski determinant of order (n + 1) in which the first row would be y1 , . . . , yn+1 ; the 0 second row would be the first derivatives y10 , . . . , yn+1 ; and so forth, with (n)
(n)
(n)
the row n + 1 being y1 , . . . , yn+1 . We could then replace each term yk in the last row by (n)
yk
(n−1)
= −P1 yk
(n−2)
+ P 2 yk
+ · · · + Pn−1 yk0 + Pn yk .
(6.272)
But then the last row would be a linear combination of the first n rows, and the determinant would vanish. Thus, an nth-order linear homogeneous ODE can have at most n linearly independent solutions. 6.23 Boundary Conditions Since an nth-order linear homogeneous ordinary differential equation can have at most n linearly independent solutions, it follows that we can make a solution unique by requiring it to satisfy n boundary conditions. We’ll see that the n arbitrary coefficients ck of the general solution y(x) =
n X
ck yk (x)
(6.273)
k=1
of the differential equation (6.271) are fixed by the n boundary conditions y(x1 ) = b1 ,
y(x2 ) = b2 ,
...
y(xn ) = bn
(6.274)
as long as the matrix Y with entries Yjk = yk (xj ) is nonsingular, that is det Y 6= 0. In matrix notation, with B a vector with components Bj = bj and C a vector with components Ck = ck , the n boundary conditions y(xj ) =
n X k=1
ck yk (xj ) = bj
(6.275)
272
Differential Equations
are Y C = B. So we may invert Y and express the vector C of coefficients as C = Y −1 B. (` ) The boundary conditions can involve the derivatives yk j (xj ). One may (` )
show (problem 9) that in this case as long as the matrix Yjk = yk j (xj ) is nonsingular, the n boundary conditions y (`j ) (xj ) =
n X
(` )
ck yk j (xj ) = bj
(6.276)
k=1
are Y C = B and so the n coefficients are C = Y −1 B. But what if all the bj are zero? If the boundary conditions are totally homogeneous, Y C = 0, then there is no solution if the matrix Y is nonsingular. But if the n×n matrix Y has rank n−1, then it maps a unique vector C to zero, apart from normalization. So if all the boundary conditions are homogeneous, and the matrix Y has rank n − 1, then the solution y = ck yk is unique. But if the rank of Y is less than n − 1, the solution is not unique. Since a matrix of rank zero vanishes identically, any non-zero 2 × 2 matrix Y must be of rank 1 or 2. Thus, a second-order ODE with two homogeneous boundary conditions either has a unique solution or none at all. Example 6.2 (Boundary Conditions and Eigenvalues) of the differential equation − y 00 = k 2 y
The solutions yk
(6.277)
are y1 (x) = sin kx and y2 (x) = cos kx. If we impose the boundary conditions y(−a) = 0 and y(a) = 0, then the matrix Yjk = yk (xj ) is Y =
− sin ka cos ka sin ka cos ka
(6.278)
with determinant det Y = − 2 sin ka cos ka = − sin 2ka. This determinant vanishes only if ka = nπ/2 for some integer n, so if ka 6= nπ/2, then no solution y of the differential equation (6.277) satisfies the boundary conditions y(−a) = 0 = y(a). But if ka = nπ/2, then there is a solution, and it is unique because for even (odd) n, the first (second) column of Y vanishes, but not the second (first), which implies that Y has rank 1. One may regard the condition ka = nπ/2 either as determining the eigenvalue k 2 or as telling us what interval to use.
273
6.24 A Variational Problem
6.24 A Variational Problem For what functions u(x) is the “energy” functional Z E[u] ≡
b
p(x)u02 (x) + q(x)u2 (x) dx
(6.279)
a
stationary? That is, for what functions u is E[u + δu] unchanged to first order in δu when u(x) is changed by an arbitrary but tiny function δu(x) to u(x) + δu(x)? Our equations will be less cluttered if we drop explicit mention of the x-dependence of p, q, and u which we assume to be real functions of x. The first-order change in E is Z
b
δE[u] ≡
p 2u0 δu0 + q 2u δu dx
(6.280)
a
in which the change in the derivative of u is δu0 = u0 + (δu)0 − u0 = (δu)0 . Setting δE = 0 and integrating by parts, we have Z b 0 0 = δE = p u (δu)0 + q u δu dx a Z bh i 0 0 = p u0 δu − p u0 δu + q u δu dx a Z bh i h ib 0 = − p u0 + q u δu dx + p u0 δu . a
a
(6.281)
So if E is to be stationary with respect to all tiny changes δu, then u must satisfy both the differential equation 0 L u = − p u0 + q u = 0
(6.282)
and the natural boundary conditions 0 = p(b) u0 (b)
and
0 = p(a) u0 (a).
(6.283)
If the function p(x) is strictly positive on the interval [a, b], then these conditions imply Neumann’s boundary conditions u0 (a) = 0 (Carl Neumann, 1832–1925).
and
u0 (b) = 0.
(6.284)
274
Differential Equations
6.25 Self-Adjoint Differential Operators If p(x) and q(x) are real, then the differential operator d d L= − p(x) + q(x) dx dx
(6.285)
is self adjoint (or formally self adjoint). To see why this definition makes sense, we take any two functions u and v that are twice differentiable on an interval [a, b] and integrate v L u twice by parts over it Z b h Z b i 0 v − pu0 + qu dx v L u dx = (v, L u) = a a Z b 0 0 b = pu v + uqv dx − vpu0 a a Z b b = −(pv 0 )0 + qv u dx + puv 0 − vpu0 a a Z b b = (L v) u dx + p(uv 0 − vu0 ) a . (6.286) a
Interchanging u and v and subtracting, we get Green’s formula Z b (vL u − u L v) dx = p(uv 0 − vu0 ) a = [pW (u, v)]ba
(6.287)
(George Green, 1793–1841). Its differential form is Lagrange’s identity 0 vL u − u L v = p W (u, v) (6.288) (Joseph-Louis Lagrange, 1736–1813). Thus if the twice-differentiable functions u and v satisfy boundary conditions at x = a and x = b that make the boundary term (6.287) vanish b p(uv 0 − vu0 ) a = [pW (u, v)]ba = 0 (6.289) then the real differential operator L is symmetric Z b Z b (v, L u) = v L u dx = u L v dx = (u, L v). a
(6.290)
a
We recall that a real linear operator A that acts in a real vector space and satisfies the analogous relation (1.243) (g, A f ) = (f, A g)
(6.291)
for all vectors in the space is said to be symmetric and self adjoint. This is why we call the differential operator (6.285) self adjoint. Because L is
6.26 Self-Adjoint Differential Systems
275
symmetric only on the space of functions that satisfy the boundary condition (6.289), some mathematicians use the term formally self adjoint. In quantum mechanics, we usually deal with wave-functions that are complex. So keeping L real, let’s replace u and v by twice-differentiable, complex-valued functions ψ = u1 + iu2 and χ = v1 + iv2 . If u1 , u2 , v1 , and v2 satisfy boundary conditions at x = a and x = b that make the boundary terms (6.289) vanish b p(ui vj0 − vj u0i ) a = [pW (ui , vj )]ba = 0 for i, j = 1, 2 (6.292) then (6.290) implies that Z b Z b (L vj ) ui dx vj L ui dx =
for i, j = 1, 2.
(6.293)
a
a
Under these assumptions, one may show (problem 10) that (6.292) makes the complex boundary term vanish b [p W (ψ, χ∗ )]ba = p ψ χ∗0 − ψ 0 χ∗ a = 0 (6.294) and (problem 11) that since L is real, the identity (6.293) holds for complex functions Z b Z b (χ, L ψ) = χ∗ L ψ dx = (L χ)∗ ψ dx = (L χ, ψ). (6.295) a
a
We recall that a linear operator A that satisfies the analogous relation (1.237) (g, A f ) = (A g, f )
(6.296)
is said to be self adjoint or hermitian. This is why we call the differential operator (6.285) self adjoint. The self-adjoint differential operator (6.285) will satisfy the inner-product integral equations (6.290 or 6.295) only when the function p and the twicedifferentiable functions u and v or ψ and χ conspire to make the boundary terms (6.289 or 6.294) vanish. This requirement leads us to define a selfadjoint differential system.
6.26 Self-Adjoint Differential Systems A differential system consists of a differential operator, a differential equation on an interval, boundary conditions, and a set of twice differentiable functions that obey them. A second-order differential equation needs two boundary conditions to
276
Differential Equations
make a solution unique (section 6.23). In a self-adjoint differential system, the two boundary conditions are linear and homogeneous so that the set of all twice differentiable functions u that satisfy them is a vector space D called the domain of the system. Examples for the interval [a, b] are Dirichlet’s boundary conditions (Johann Dirichlet,1805–1859) u(a) = 0
and
u(b) = 0
(6.297)
u0 (a) = 0
and
u0 (b) = 0.
(6.298)
and Neumann’s (6.284)
These two kinds of boundary conditions are special cases of the conditions ca u(a) + da u0 (a) = 0
and
cb u(b) + db u0 (b) = 0
(6.299)
for da = db = 0 and for ca = cb = 0. The adjoint domain D∗ of a differential system is the set of all twicedifferentiable functions v that make the boundary term (6.289) vanish b p(uv 0 − vu0 ) a = [pW (u, v)]ba = 0 (6.300) for all functions u that are in the domain D. A differential system is regular and self adjoint if the differential operator Lu = −(pu0 )0 + qu is self-adjoint, if the interval [a, b] is finite, if p, p0 , and q are continuous functions of x on the interval, if p(x) > 0 on [a, b], and if the two domains D and D∗ coincide, D = D∗ . Since any two functions u and v in the domain D of a regular and selfadjoint differential system make the boundary term (6.289 = 6.300) vanish, a real formally self-adjoint differential operator L is symmetric and self adjoint (6.290) on any two functions in its domain Z b Z b (v, L u) = v L u dx = u L v dx = (u, L v). (6.301) a
a
If functions in the domain are complex, then by (6.294 & 6.295) the operator L is self adjoint or hermitian Z b Z b ∗ (χ, L ψ) = χ L ψ dx = (L χ)∗ ψ dx = (L χ, ψ) (6.302) a
a
on all ψ and χ in its domain. One may show (problems 12, 13, & 14) that if D is the set of all twicedifferentiable functions u(x) on [a, b] that satisfy either Dirichlet’s boundary conditions (6.297) or Neumann’s boundary conditions (6.298) or the more general boundary conditions (6.299), and if the function p(x) is continuous
6.26 Self-Adjoint Differential Systems
277
and positive on [a, b], then the adjoint set D∗ is the same as D. Thus, a selfadjoint differential operator Lu = −(pu0 )0 +qu together with either Dirichlet (6.297) or Neumann (6.298) or general (6.299) boundary conditions will be a regular and self-adjoint system if p, p0 , and q are continuous on a finite interval [a, b], and p > 0 on [a, b]. Example 6.3 (Sines and Cosines) adjoint differential operator
The differential system with the self-
L= −
d2 dx2
(6.303)
on an interval [a, b] and the differential equation L u = − u00 = λ u
(6.304)
has p(x) = 1. If we choose the interval to be [−π, π] and the domain D to be the set of all functions that are twice-differentiable on this interval and satisfy Dirichlet (6.297), then we get a self-adjoint differential system in which the domain includes linear combinations of un (x) = sin nx. If instead, we impose Neumann (6.298), then the domain D contains linear combinations of un (x) = cos nx. In both cases, the system is regular and self adjoint. Some important differential systems are self adjoint but singular because the function p(x) vanishes at one or both of the end points of the interval [a, b] or because the interval is infinite, for instance [0, ∞) or (−∞, ∞). In these singular, self-adjoint differential systems, the boundary term (6.300) vanishes if u and v are in the domain D = D∗ . Example 6.4 (Legendre’s System) Legendre’s self-adjoint differential operator is d 2 d L= − (1 − x ) (6.305) dx dx and his differential equation is L u = − (1 − x2 )u0
0
= `(` + 1) u (6.306) on the interval [−1, 1]. The function p(x) = 1 − x2 is zero at both end points, x = ±1, and so this self-adjoint system is singular. Because p(±1) = 0, the boundary term (6.300) vanishes as long as the functions u and v have continuous derivatives. The domain D is the set of all functions that are twice differentiable on the interval [−1, 1].
278
Differential Equations
Example 6.5 (Hermite’s System) ator is L= −
Hermite’s self-adjoint differential operd2 + x2 dx2
(6.307)
and his differential equation is L u = − u00 + x2 u = (2n + 1) u
(6.308)
on the interval (−∞, ∞). This system has p(x) = 1 and q(x) = x2 . It is self adjoint but singular because the interval is infinite. The domain D consists of all functions that are twice-differentiable and that go to zero as x → ±∞ faster than 1/x2 , which ensures that the relevant integrals converge and that the boundary term (6.300) vanishes.
6.27 Making Operators Self Adjoint We can make the generic second-order linear homogeneous differential operator d2 d L = h2 2 + h1 + h0 (6.309) dx dx self adjoint d d d2 d L=− p(x) + q(x) = −p(x) 2 − p0 (x) + q(x) (6.310) dx dx dx dx by first dividing through by −h2 (x) L1 = −
1 h1 d h0 d2 − L=− 2 − h2 dx h2 dx h2
and then by multiplying L1 by the positive prefactor Z x h1 (y) dy > 0. p(x) = exp h2 (y)
(6.311)
(6.312)
The product p L1 then is self adjoint Z x 2 h1 (y) d h1 (x) d h0 (x) L = p(x) L1 = − exp dy + + h2 (y) dx2 h2 (x) dx h2 (x) Z x Z x d h1 (y) d h1 (y) h0 (x) =− exp dy − exp dy dx h2 (y) dx h2 (y) h2 (x) d d =− p +q (6.313) dx dx
6.28 Wronskians of Self-Adjoint Operators
279
with q(x) = −p(x)
h0 (x) . h2 (x)
(6.314)
So we may turn any second-order linear homogeneous differential operator L into a self adjoint operator L by multiplying it by an appropriate prefactor ρ(x) = −
exp
Rx
h1 (y)/h2 (y)dy p(x) =− . h2 (x) h2 (x)
(6.315)
The two differential equations Lu = 0 and Lu = 0 have the same solutions, and so we can restrict our attention to self-adjoint differential equations. But the eigenvalue equation Lu = λ u
(6.316)
becomes a self-adjoint eigenvalue equation Lu = −(pu0 )0 + qu = λ ρ u
(6.317)
with a weight function ρ(x). Such an eigenvalue problem is known as a Sturm-Liouville problem (Jacques Sturm, 1803–1855; Joseph Liouville, 1809–1882). If the original differential operator L is positive, then h2 (x) is negative, and the weight function ρ(x) = −p(x)/h2 (x) is positive.
6.28 Wronskians of Self-Adjoint Operators We saw in (6.254–6.259) that if y1 (x) and y2 (x) are two linearly independent solutions of the ODE y 00 (x) + P (x) y 0 (x) + Q(x) y(x) = 0 then their Wronskian W (x) = y1 (x) y20 (x) − y2 (x) y10 (x) is Z x 0 0 W (x) = W (x0 ) exp − P (x )dx .
(6.318)
(6.319)
x0
But if the ODE (6.318) is made self adjoint as 0 dy(x) d2 y(x) + p0 (x) + q(x)y(x) = 0 (6.320) p(x)y 0 (x) + q(x)y(x) = p(x) 2 dx dx then P (x) = p0 (x)/p(x), and so the Wronskian (6.319) is Z x W (x) = W (x0 ) exp − p0 (x0 )/p(x)dx0 x0
(6.321)
280
Differential Equations
which we may integrate directly to W (x) = W (x0 ) exp [− ln [p(x)/p(x0 )]] = W (x0 )
p(x0 ) . p(x)
(6.322)
Now we learned in (6.254–6.263) that if we had one solution y1 (x) of the ODE (6.318 or 6.320), then we could find another y2 (x) that is linearly independent of the first one by doing the integral Z x W (x0 ) 0 y2 (x) = y1 (x) dx + c (6.323) y12 (x0 ) which in view of (6.319) is really an iterated integral. But the helpful formula (6.322) reduces it to Z x 1 0 y2 (x) = y1 (x) W (x0 )p(x0 ) dx + c . (6.324) p(x0 ) y12 (x0 ) The part of the second solution that is linearly independent of y1 (x) is Z x dx0 . (6.325) y2 (x) = y1 (x) p(x0 ) y12 (x0 ) Example 6.6 (Legendre functions of the second kind) adjoint differential equation (6.306) is 0 − (1 − x2 ) y 0 = `(` + 1) y.
Legendre’s self(6.326)
and an obvious solution for ` = 0 is y(x) ≡ P0 (x) = 1.
(6.327)
Since p(x) = (1 − x2 ), we get a second solution Q0 (x) by using the integral formula (6.325) Z x 1 Q0 (x) = P0 (x) dx0 p(x0 ) P02 (x0 ) Z x 1 = dx0 (1 − x2 ) 1 1+x = ln . (6.328) 2 1−x This second solution Q0 (x) is singular at both ends of the interval [−1, 1], and so does not satisfy the Dirichlet (6.297) or Neumann (6.298) boundary conditions that make the system self adjoint or hermitian.
6.29 Self-Adjoint First-Order ODEs
281
6.29 Self-Adjoint First-Order ODEs What about first-order linear homogeneous differential operators? The operator d L=p +q (6.329) dx will be self adjoint if Z b Z b Z b ∗ ∗ † (Lχ)∗ ψ dx. (6.330) χ Lψ dx = L χ ψ dx = a
a
a
Starting from the first term, we find Z b Z b ∗ χ∗ p ψ 0 + qψ dx χ Lψ dx = a a Z b (−χ∗ p)0 + χ∗ q ψ dx + [χ∗ pψ]ba = a Z b ∗ = (−χp∗ )0 + χq ∗ ψ dx + [χ∗ pψ]ba a Z b ∗ 0 ∗ = −p χ + (q ∗ − p∗0 )χ ψ dx + [χ∗ pψ]ba . (6.331) a
So if the boundary terms vanish [χ∗ pψ]ba = 0
(6.332)
and if both p∗ = −p
and q ∗ − p∗0 = q
obtain, then Z b Z b Z b 0 ∗ ∗ χ Lψ dx = pχ + qχ ψ dx = (Lχ)∗ ψ dx a
a
(6.333)
(6.334)
a
and so L is self adjoint or hermitian, L† = L. The general form of a firstorder self-adjoint linear operator is d i + s(x) + r0 (x) dx 2 in which r and s are arbitrary real functions of x. An example is the momentum operator L = ir(x)
(6.335)
~ d (6.336) i dx in which p = ~/i is imaginary and q = 0. The functions ψ and χ are then taken to vanish at the end points a and b, which may be infinite. p=
282
Differential Equations
6.30 A Constrained Variational Problem In quantum mechanics, we usually deal with normalized wave functions. So let’s find the function u(x) that minimizes the energy functional Z b p(x) u02 (x) + q(x) u2 (x) dx (6.337) E[u] = a
subject to the constraint that u(x) be normalized on [a, b] with respect to some weight function ρ(x) > 0 Z b 2 N [u] = kuk = ρ(x) u2 (x) dx = 1. (6.338) a
Introducing λ as a Lagrange multiplier (1.266) and suppressing explicit mention of the x-dependence of p, q, ρ, and u, we minimize the unconstrained functional Z b Z b 02 2 2 ρ u dx − 1 . (6.339) E[u, λ] = p u + q u dx − λ a
a
This functional will be stationary at the function u that minimizes it. The first-order change in the functional is Z b δE[u, λ] = p 2u0 δu0 + q 2u δu − λ ρ 2u δu dx (6.340) a
in which the change in the derivative of u is δu0 = u0 + (δu)0 − u0 = (δu)0 .
(6.341)
Setting δE = 0 and integrating by parts, we have Z b 0 0 = 21 δE = p u (δu)0 + (q − λ ρ) u δu dx a Z bh i 0 0 = p u0 δu − p u0 δu + (q − λ ρ) u δu dx a Z bh i h ib 0 = − p u0 + (q − λ ρ) u δu dx + p u0 δu . a
a
(6.342)
So if E is to be stationary with respect to all tiny changes δu, then u must satisfy both the self-adjoint differential equation 0 0 = − p u0 + (q − λ ρ) u (6.343) and the natural boundary conditions 0 = p(b) u0 (b)
and
0 = p(a) u0 (a).
(6.344)
6.30 A Constrained Variational Problem
283
If instead, we require E[u, λ] to be stationary with respect to all variations δu that vanish at the end points, δu(a) = δu(b) = 0, then u must satisfy the differential equation (6.343) but need not satisfy the natural boundary conditions. In both cases, the function u(x) that minimizes the energy E[u] subject to the normalization condition N [u] = 1 is an eigenfunction of the self-adjoint differential operator d p(x) + q(x) dx
(6.345)
0 Lu = − p u0 + q u = λ ρ u.
(6.346)
d L= − dx with eigenvalue λ
The Lagrange multiplier λ has become an eigenvalue of a Sturm-Liouville equation (6.317). Is the eigenvalue λ related to E[u] and N [u]? To keep things simple, we restrict ourselves to a regular and self-adjoint differential system (section 6.26) consisting of the self-adjoint differential operator (6.345), the differential equation (6.346), and a domain D = D∗ of functions that are twice differentiable on [a, b] and that satisfy two homogeneous boundary conditions on [a, b], such as (6.297, 6.298, or 6.299). Since the system is self adjoint, the domain and the adjoint domain D∗ coincide, and so u satisfies the boundary condition (6.300) ib h 0 upu = 0. (6.347) a
We now multiply the Sturm-Liouville equation (6.346) from the left by u and integrate from a to b; after integrating by parts and noting the vanishing of the boundary terms (6.347), we find Z
b
Z
2
ρ u dx =
λ
b
Z u Lu dx =
a
a
Z =
b
h i 0 u − p u0 + q u dx
a b
h ib p u02 + q u2 dx − upu0
a
a
Z =
b
p u02 + q u2 dx = E[u].
(6.348)
a
Thus in view of the normalization constraint (6.338), we see that the eigen-
284
Differential Equations
value λ is the ratio of the energy E[u] to the norm N [u] Z b 02 p u + q u2 dx E[u] = . λ= a Z b N [u] 2 ρ u dx
(6.349)
a
Is the function that minimizes the ratio R[u] ≡
E[u] N [u]
(6.350)
the eigenfunction u of the Sturm-Liouville equation (6.346)? And is the minimum of R[u] the least eigenvalue λ of the Sturm-Liouville equation (6.346)? To see that the answers are yes and yes, we require δR[u] to vanish δR[u] =
δE[u] E[u] δN [u] − =0 N [u] N 2 [u]
(6.351)
to first order in tiny changes δu(x) that are zero at the end points of the interval, δu(a) = δu(b) = 0. Multiplying both sides by N [u], we have δE[u] = R[u] δN [u].
(6.352)
Referring back to our derivation (6.339–6.342) of the Sturm-Liouville equation, we see that since δu(a) = δu(b) = 0, the change δE is Z bh i h ib 0 δE[u] = 2 − p u0 + q u δu dx + 2 p u0 δu a a (6.353) Z bh i 0 0 =2 − p u + q u δu dx a
while δN is Z
b
ρ u δu dx.
δN [u] = 2
(6.354)
a
Substituting these changes (6.353) and (6.354) into the condition (6.352) that R[u] be stationary, we find that the integral Z bh i 0 − p u0 + (q − R[u] ρ ) u δu dx = 0 (6.355) a
must vanish for all tiny changes δu(x) that are zero at the end points of the interval. Thus on (a, b), the function u must satisfy the Sturm-Liouville equation (6.346) 0 − p u0 + q u = R[u] ρ u (6.356)
6.30 A Constrained Variational Problem
285
with an eigenvalue λ ≡ R[u] that is the minimum value of the ratio R[u]. So the eigenfunction u1 with the smallest eigenvalue λ1 is the one that minimizes the ratio R[u], and λ1 = R[u1 ]. What about other eigenfunctions with larger eigenvalues? How do we find the eigenfunction u2 with the next smallest eigenvalue λ2 ? Simple: we minimize R[u] with respect to all functions u that are in the domain D and that are orthogonal to u1 . Example 6.7 (Infinite Square Well) Let us consider a particle of mass m trapped in an interval [a, b] by a potential that is V for a < x < b but infinite for x < a and for x > b. Because the potential is infinite outside the interval, the wave function u(x) will satisfy the boundary conditions u(a) = u(b) = 0. The mean value of the hamiltonian is then the energy functional Z b p(x) u02 (x) + q(x) u2 (x) dx hu|H|ui = E[u] =
(6.357)
(6.358)
a
in which p(x) = ~2 /2m and q(x) = V , a constant independent of x. Wave functions in quantum mechanics are normalized when possible. So we need to minimize the functional Z b 2 ~ 02 2 E[u] = u (x) + V u (x) dx (6.359) a 2m subject to the constraint Z c=
b
u2 (x) dx − 1 = 0
(6.360)
a
for all tiny variations δu that vanish at the end points of the interval. The weight function ρ(x) = 1, and the eigenvalue equation (6.346) is ~2 00 u + V u = λ u. 2m For any positive integer n, the normalized function 1/2 2 nπ un (x) = sin b−a b−a −
(6.361)
(6.362)
satisfies the boundary conditions (6.357) and the eigenvalue equation (6.361) with energy eigenvalue nπ~ 2 1 + V. (6.363) λn = E[un ] = 2m b − a The second eigenfunction u2 minimizes the energy functional E[u] over the
286
Differential Equations
space of normalized functions that satisfy the boundary conditions (6.357) and are orthogonal to the first eigenfunction u1 . The eigenvalue λ2 is higher than λ1 (4 times higher). As the quantum number n increases, the energy λn = E[un ] goes to infinity as n2 . That λn → ∞ as n → ∞ turns out to be essential to the completeness of the eigenfunctions un . Example 6.8 (Bessel’s System) Bessel’s energy functional is Z 1 n2 2 02 x u (x) + u (x) dx E[u] = x 0
(6.364)
in which n ≥ 0 is an integer. We will minimize it for twice differentiable functions u(x) on [0, 1] that are normalized Z 1 N [u] = kuk2 = x u2 (x) dx = 1 (6.365) 0
and satisfy the boundary conditions u(0) = 0 for n > 0 and u(1) = 0. We’ll let λ be a Lagrange multiplier (section 1.25) and minimize the unconstrained functional E[u] − λ(N [u] − 1). Proceeding as in (6.337–6.346), we find that u must obey the self-adjoint differential equation L u = − (x u0 )0 +
n2 u = λ x u. x
(6.366)
2 The eigenvalues λ = λn,m ≡ kn,m are positive. By changing variables to ρ = kn,m x and letting u(x) = Jn (ρ), we arrive at (problem 15) d2 Jn 1 dJn n2 + + 1 − 2 Jn = 0 (6.367) dρ2 ρ dρ ρ
which is Bessel’s equation (Friedrich Bessel, 1784–1846). The eigenvalues 2 are determined by the condition u(1) = Jn (kn,m ) = 0; they λn,m ≡ kn,m are the squares of the zeros of Jn (ρ). Asymptotically, as m → ∞ one has (Courant and Hilbert, 1955, p. 416) lim
m→∞
λn,m =1 m2 π 2
(6.368)
which shows that the eigenvalues λn,m rise like m2 as m → ∞. Example 6.9 (Harmonic Oscillator) Let’s minimize the energy functional Z
∞
E[u] = −∞
~2 02 1 u (x) + m ω 2 x2 u2 (x) dx 2m 2
(6.369)
287
6.31 Eigenfunctions and Eigenvalues of Self-Adjoint Systems
subject to the normalization condition Z ∞ 2 u2 (x) dx = 1. N [u] = kuk =
(6.370)
−∞
We introduce λ as a Lagrange multiplier and find the minimum of the unconstrained function E[u]−λ (N [u] − 1). Following equations (6.337–6.346), we find that u must satisfy Schr¨odinger’s equation −
~2 00 1 u + m ω 2 x2 u = λu 2m 2
which we write as ~ d ~ d 1 mω x− x+ + u = λu. ~ω 2~ mω dx mω dx 2
(6.371)
(6.372)
The lowest eigenfunction u0 is mapped to zero by the second factor inside the square brackets ~ d x+ u0 (x) = 0 (6.373) mω dx and so its eigenvalue λ0 is ~ω/2. We may integrate this differential equation (6.373) and find mω 1/4 mωx2 u0 (x) = exp − (6.374) π~ 2~ in which the prefactor is a normalization constant. One may get the higher eigenfunctions by acting on u0 with powers of the first factor inside the square brackets 1 mω n/2 ~ d n un (x) = √ u0 (x). (6.375) x− mω dx n! 2~ The eigenvalue of un is λn = ~ω(n + 1/2). Again, λn → ∞ as n → ∞. 6.31 Eigenfunctions and Eigenvalues of Self-Adjoint Systems A regular Sturm-Liouville system is a set of regular and self-adjoint differential systems (section 6.26) that have the same differential operator, interval, boundary conditions, and domain, and whose differential equations are of Sturm-Liouville (6.346) type L ψ = − (p ψ 0 )0 + q ψ = λ ρ ψ
(6.376)
each distinguished only by its eigenvalue λ. Their common weight function ρ(x) is real and positive, except at isolated points where it may vanish.
288
Differential Equations
Since the differential systems are self adjoint, the real or complex functions in the common domain D are twice differentiable on the interval [a, b] and satisfy two homogeneous boundary conditions that make the boundary terms (6.300) vanish b p W (ψ 0 , ψ ∗ ) a = 0 (6.377) and so the differential operator L obeys the condition (6.302) Z b Z b ∗ (χ, L ψ) = (L χ)∗ ψ dx = (L χ, ψ) χ L ψ dx =
(6.378)
a
a
of being self adjoint or hermitian. Let ψi and ψj be eigenfunctions of L with eigenvalues λi and λj L ψi = λi ρ ψi
(6.379)
L ψj = λj ρ ψj
(6.380)
in a regular Sturm-Liouville system. Taking the complex conjugate of (6.380) (L ψj )∗ = λ∗j ρ ψj∗
(6.381)
and then multiplying (6.379) by ψj∗ and (6.381) by ψi , we get ψj∗ L ψi = ψj∗ λi ρ ψi ∗
ψi (L ψj ) =
(6.382)
ψi λ∗j ρ ψj∗ .
(6.383)
Subtracting (6.383) from (6.382), integrating over the interval [a, b], and using the hermiticity condition (6.378), we find Z b Z b ∗ ∗ ∗ ∗ 0 = ψj L ψi − (L ψj ) ψi dx = λj − λi ψj ψi dx. (6.384) a
a
Setting i = j, one has 0=
(λ∗i
Z − λi )
b
ρ|ψi |2 dx
(6.385)
a
which, since the integral is positive, shows that the eigenvalue λi must be real. Thus all the eigenvalues of a regular Sturm-Liouville system are real. Using λ∗j = λj in (6.384), we find Z b 0 = (λj − λi ) ψj∗ ρ ψi dx (6.386) a
which says that eigenfunctions that have different eigenvalues are orthogonal on the interval [a, b] with weight function ρ(x).
6.31 Eigenfunctions and Eigenvalues of Self-Adjoint Systems
289
But each eigenfunction ψi in the domain D satisfies two homogeneous boundary conditions as well as its second-order differential equation − (p ψi0 )0 + q ψi = λi ρ ψi
(6.387)
and so it is the unique solution in D to this equation. Thus there can be no other eigenfunction in D with the same eigenvalue. In a regular SturmLiouville system, there is no degeneracy. All the eigenfunctions ψi are orthonormal (and normalizable) on the interval [a, b] with weight function ρ(x) Z b ψj∗ ρ ψi dx = δij . (6.388) a
It is true that the eigenfunctions of a second-order differential equation come in pairs because one can use Wronski’s formula (6.325) Z x dx0 y2 (x) = y1 (x) (6.389) p(x0 ) y12 (x0 ) to find a linearly independent, second solution with the same eigenvalue. But the second solution will not obey the boundary conditions of the domain D. A set of eigenfunctions ψi is complete in the mean in a space S of functions if every function f ∈ S can be represented as a Fourier series f (x) =
∞ X
ai ψi (x)
(6.390)
i=1
that converges in the mean, that is 2 Z b N X f (x) − a ψ (x) lim ρ(x) dx = 0. i i N →∞ a
(6.391)
i=1
The natural space S is the space L2 (a, b) of all functions f that are squareintegrable on the interval [a, b] Z b f 2 (x) dx < ∞. (6.392) a
The orthonormal eigenfunctions of every regular Sturm-Liouville system on an interval [a, b] are complete in the mean in L2 (a, b). The completeness of these eigenfunctions follows from the fact that the eigenvalues λn of a regular Sturm-Liouville system when arranged in ascending order λn < λn+1 tend to infinity with the index n lim λn = ∞
n→∞
(6.393)
290
Differential Equations
as we’ll see in section 6.33.
6.32 Unboundedness of Eigenvalues We have seen (section 6.30) that the function u(x) that minimizes the ratio b
Z R[u] =
E[u] = N [u]
a
p u02 + q u2 dx Z b ρ u2 dx
(6.394)
a
is a solution of the Sturm-Liouville equation 0 Lu = − p u0 + q u = λ ρ u
(6.395)
with eigenvalue λ=
E[u] . N [u]
(6.396)
Let us call this least value λ of the ratio (6.394) λ = λ1 ; it also is the least eigenvalue of the differential equation (6.395). The second smallest eigenvalue λ2 is the minimum of the same ratio (6.394) but for functions that are orthogonal to u1 in the sense that Z b ρ u1 u2 dx = 0. (6.397) a
And λ3 is the minimum of the ratio R[u] but for functions that are orthogonal to both u1 and u2 . Continuing in this way, we make a sequence of orthogonal eigenfunctions un (x) (which we can normalize, N [un ] = 1) with eigenvalues λ1 ≤ λ2 ≤ λ3 ≤ . . . λn . How do the eigenvalues λn behave as n → ∞? Since the function p(x) is positive for a < x < b, it is clear that the energy functional (6.337) Z b 02 E[u] = p u + q u2 dx (6.398) a
u02
gets bigger as increases. In fact, if we let the functionR u(x) zigzag up and down about a given Rcurve u ¯, then the kinetic energy pu02 dx will rise 2 but the potential energy qu dx will remain approximately constant. Thus by increasing the frequency of the zigzags, we can drive the energy E[u] to infinity. The case of u(x) = sin x and a zigzag version uz (x) = u(x)(1 + 0.2(sin ωx)) is illustrated in Fig. 6.1 for ω = 100. Clearly, in the limit as ω → ∞, the energy E[u] → ∞.
291
6.32 Unboundedness of Eigenvalues 1.2 1
u(x), uz (x)
0.8 0.6 0.4 0.2 0 0
0.5
1
1.5
2
2.5
3
x Figure 6.1 The energy functional E[u] of Eq. (6.337) assigns very different energies to the functions u(x) = sin(x) (smooth curve, blue) and uz (x) = u(x)(1 + 0.2 sin(ωx)) (zigzag curve, red) on the interval [0, π], plotted here for ω = 100. As the frequency ω → ∞, the energy E[uz ] → ∞.
It is therefore intuitively clear (or at least plausible) that if the functions p(x), q(x), and ρ(x) are continuous on [a, b] and if p(x) > 0 and ρ(x) > 0 on (a, b), then there are infinitely many energy eigenvalues λn , and that they increase without limit as n → ∞ lim λn = ∞.
n→∞
(6.399)
Courant and Hilbert (Richard Courant, 1888–1972 and David Hilbert, 1862–1943) provide several proofs of this result (Courant and Hilbert, 1955, pp. 397–429). One of their proofs involves the change of variables f = (pρ)1/4 and v = f u, after which the eigenvalue equation 0 L u = − p u0 + q u = λρu (6.400) becomes Lf v = −v 00 + rv = λv v
(6.401)
with r = f 00 /f + q/ρ. Were this r(x) a constant, the eigenfunctions of Lf
292
Differential Equations
would be vn (x) = sin(nπ/(b − a))
(6.402)
with eigenvalues λvn =
nπ b−a
2 +r
(6.403)
rising as n2 . Courant and Hilbert show that as long as r(x) is bounded for a ≤ x ≤ b, the actual eigenvalues of Lf are λv,n = c n2 + dn
(6.404)
in which dn is bounded and that the the eigenvalues λn of L differ from the λvn by a scale factor, so that n2 =g n→∞ λn lim
(6.405)
where g is a constant. 6.33 Completeness of Eigenfunctions We have seen in section (6.32) that the eigenvalues of every regular SturmLiouville system when arranged in ascending order tend to infinity with the index n lim λn = ∞.
n→∞
(6.406)
We’ll now use this property to show that the corresponding eigenfunctions un (x) are complete in the mean (6.391) in the domain D of the system. To do so, we follow Courant and Hilbert (Courant and Hilbert, 1955, pp. 397–428) and extend the energy E and norm N functionals to inner products on the domain of the system Z b E[f, g] ≡ p(x) f 0 (x) g 0 (x) + q(x) f (x) g(x) dx (6.407) a Z b N [f, g] ≡ ρ(x) f (x) g(x) dx (6.408) a
for any f and g in D. Integrating E[f, g] by parts, we have Z bh i 0 0 E[f, g] = p f g 0 − f pg 0 + q f g dx a Z bh i 0 b = − f pg 0 + f q g dx + p f g 0 a a
(6.409)
293
6.33 Completeness of Eigenfunctions
or in terms of the self-adjoint differential operator L of the system Z b b f L g dx + p f g 0 a . (6.410) E[f, g] = a
Since the boundary term vanishes when the functions f and g are in the domain D of the system, it follows that for f and g in D Z b E[f, g] = f L g dx. (6.411) a
We can use the first n orthonormal eigenfunctions uk of the system L uk = λk ρ uk
(6.412)
to approximate an arbitrary function in f ∈ D as the linear combination f (x) ∼
n X
ck uk (x)
(6.413)
k=1
with coefficients ck given by b
Z ck = N [f, uk ] =
ρ f uk dx.
(6.414)
a
We’ll show that this series converges in the mean to the function f . By construction (6.414), the remainder or error of the nth sum rn (x) = f (x) −
n X
ck uk (x)
(6.415)
k = 1, . . . , n.
(6.416)
k=1
is orthogonal to the first n eigenfunctions N [rn , uk ] = 0
for
The next eigenfunction un+1 minimizes the ratio R[φ] =
E[φ, φ] N [φ, φ]
(6.417)
over all φ that are orthogonal to the first n eigenfunctions uk in the sense that N [φ, uk ] = 0 for k = 1, . . . n. And that minimum is the eigenvalue λn+1 R[un+1 ] = λn+1
(6.418)
which therefore must be less than the ratio R[rn ] λn+1 ≤ R[rn ] =
E[rn , rn ] . N [rn , rn ]
(6.419)
294
Differential Equations
Thus the square of the norm of the remainder is bounded by the ratio krn k2 ≡ N [rn , rn ] ≤
E[rn , rn ] . λn+1
(6.420)
So since λn+1 → ∞ as n → ∞, we’re done if we can show that the energy E[rn , rn ] is bounded. This energy is " # n n X X E[rn , rn ] = E f − ck uk , f − ck uk k=1 n X
= E[f, f ] −
k=1
ck k=1 n X
= E[f, f ] − 2
(E[f, uk ] + E[uk , f ]) +
n X n X
ck c` E[uk , u` ]
k=1 `=1
ck E[f, uk ] +
k=1
n X n X
ck c` E[uk , u` ].
(6.421)
k=1 `=1
Since f and all the uk are in the domain of the system, they satisfy the boundary condition (6.300 or 6.377), and so (6.410, 6.412, & 6.388) imply that Z b Z b E[f, uk ] = ρf Luk dx = λk ρf uk dx = λk ck (6.422) a
a
and that Z E[uk , u` ] =
b
Z
b
ρuk Lu` dx = λ` a
ρuk u` dx = λk δk,` .
(6.423)
a
Using these relations to simplify our formula (6.421) for E[rn , rn ] we find E[rn , rn ] = E[f, f ] −
n X
λk c2k .
(6.424)
k=1
Since λn → ∞ as n → ∞, we can be sure that for high enough n, the sum n X
λk c2k > 0
for n > N
(6.425)
k=1
is positive. It follows from (6.424) that the energy of the remainder rn is bounded by that of the function f E[rn , rn ] = E[f, f ] −
n X k=1
λk c2k ≤ E[f, f ].
(6.426)
295
6.33 Completeness of Eigenfunctions
By substituting this upper bound E[f, f ] on E[rn , rn ] into our upper bound (6.420) on the squared norm krn k2 of the remainder, we find krn k2 ≤
E[f, f ] . λn+1
(6.427)
Thus since λn → ∞ as n → ∞, we see that the series (6.413) converges in the mean (section 4.3) to f lim krn k2 = lim kf −
n→∞
n→∞
n X
E[f, f ] = 0. n→∞ λn+1
ck uk k2 ≤ lim
k=1
(6.428)
The eigenfunctions uk of a regular Sturm-Liouville system are therefore complete in the mean in the domain D of the system: they span D. It is a short step from spanning D to spanning the space L2 (a, b) of functions that are square integrable on the interval [a, b] of the system. To take this step, we assume that the domain D is dense in L2 (a, b), that is, that for every function g ∈ L2 (a, b) there is a sequence of functions fn ∈ D that converges to it in the mean, that is, for any > 0 there is an integer N1 such that Z b 2 kg − fn k ≡ |g(x) − fn (x)|2 ρ(x) dx < for n > N1 . (6.429) a
Since fn ∈ D, we can find a series of eigenfunctions uk of the system that converges in the mean to fn so that for any > 0 there is an integer N2 such that 2 Z b N N X X 2 kfn − cn,k uk k ≡ cn,k uk (x) ρ(x) dx < for N > N2 . fn (x) − a k=1
k=1
(6.430) The Schwartz inequality applies to these inner products, and so kg −
N X
cn,k uk k ≤ kg − fn k + kfn (x) −
N X
k=1
cn,k uk k.
(6.431)
k=1
Combining the last three inequalities, we have for n > N1 and N > N2 kg −
N X
cn,k uk k < 2
√
.
(6.432)
k=1
So the eigenfunctions uk of a regular Sturm-Liouville system span the space of functions that are square integrable on its interval L2 (a, b). One may further show (Courant and Hilbert 1955, p. 360; Stakgold 1967,
296
Differential Equations
p. 220) that the eigenfunctions ui (x) of any regular Sturm-Liouville system form a complete orthonormal set in the sense that every function f (x) that satisfies Dirichlet (6.297) or Neumann (6.298) boundary conditions and has a continuous first and a piecewise continuous second derivative may be expanded in a series ∞ X f (x) = ai ui (x) (6.433) i=1
that converges absolutely and uniformly on the interval [a, b] of the system. Our discussion (6.406–6.428) of the completeness of the eigenfunctions of a regular Sturm-Liouville system was insensitive to the finite length of the interval [a, b] and to the positivity of p(x) on [a, b]. What was essential was the vanishing of the boundary terms (6.300) which can happen if p vanishes at the end points of a finite interval or if the functions u and v tend to zero as |x| → ∞ on an infinite one. This is why the results of this section have been extended to singular Sturm-Liouville systems made of self-adjoint differential systems that are singular because the interval is infinite or has p vanishing at one of its ends. If the eigenfunctions ui are orthonormal with respect to a weight function ρ(x) Z b δij = u∗i (x) ρ(x) uj (x) dx (6.434) a
then the coefficients ai of the expansion (6.413) are given by the integrals (6.414) Z b ai = u∗i (x) ρ(x) f (x) dx. (6.435) a
By combining equations (6.390) and (6.435), we have ∞ Z b X f (x) = u∗i (x0 ) ρ(x0 ) f (x0 ) dx0 ui (x) i=1
(6.436)
a
or rearranging b
Z f (x) =
f (x0 )
"∞ X
a
# u∗i (x0 ) ui (x) ρ(x0 ) dx0
(6.437)
i=1
which implies the representation ∞ X p 0 u∗i (x0 ) ui (x) δ(x − x ) = ρ(x)ρ(x ) 0
i=1
(6.438)
6.34 The Inequalities of Bessel and Schwarz
297
for Dirac’s delta function. This representation is suitable for functions f in the domain D of the regular Sturm-Liouville system.
6.34 The Inequalities of Bessel and Schwarz The inequality Z a
b
2 N X w(x) f (x) − ai ui (x) dx ≥ 0
(6.439)
i=1
and the formula (6.435) for ai lead to Bessel’s inequality Z b ∞ X 2 w(x) |f (x)| dx ≥ |ai |2 . a
(6.440)
i=1
The argument we used to derive the Schwarz inequality (1.96) for vectors applies also to functions and leads to the Schwarz inequality Z b 2 Z b Z b 2 2 ∗ |f (x)| dx |g(x)| dx ≥ g (x)f (x) dx . (6.441) a
a
a
6.35 Green’s Functions Physics is full of equations of the form L G(x) = δ (n) (x)
(6.442)
in which L is a 2d-order linear homogeneous differential operator in n variables. The solution G(x) is a Green’s function for the operator L. For instance, the Feynman propagator 4F (x) is a Green’s function for L = 2−m2 (2 − m2 )∆F (x) = −δ 4 (x)
(6.443)
as we saw in (5.224); it is given by (5.238). A more common Green’s function arises in classical electrodynamics. Gauss’s law says that the divergence of the electric field is ∇ · E = 4πρ
(6.444)
where ρ is the charge density. The electric field E is E = −∇φ −
1 ∂A . c ∂t
(6.445)
298
Differential Equations
in which −φ = A0 is the time-component of the gauge field Ai and A are its spatial components. In the Coulomb or radiation gauge ∇ · A = 0 and so −4φ = −∇ · ∇φ = 4πρ.
(6.446)
The needed Green’s function satisfies −4G(x) = −∇ · ∇G(x) = δ (3) (x).
(6.447)
For then the scalar potential φ is the integral Z φ(t, x) = 4π
G(x − x0 ) ρ(t, x0 ) d3 x0 .
(6.448)
To verify that this formula for φ satisfies the PDE (6.446), we apply to it the operator −4 Z
(−4) G(x − x0 ) ρ(t, x0 ) d3 x0
Z
δ (3) (x − x0 ) ρ(t, x0 ) d3 x0
− 4φ(t, x) = 4π = 4π
= 4πρ(t, x)
(6.449)
which is Poisson’s equation. The reader might wonder how the potential φ(t, x) can depend upon the charge density ρ(t, x0 ) at different points at the same time. The scalar potential is instantaneous because of the Coulomb gauge condition ∇·A = 0, which is not Lorentz invariant. The gauge-invariant, physical fields E and B are not instantaneous and do describe electrodynamics in a Lorentz-invariant manner. It is easy to find the Green’s function G(x) by expressing it as a Fourier transform Z G(x) = eik·x g(k) d3 k (6.450) and by using the three-dimensional version δ
(3)
Z (x) =
d3 k ik·x e (2π)3
(6.451)
of Dirac’s delta function (3.32). If we insert these Fourier transforms into
299
6.35 Green’s Functions
the equation (6.447) that defines the Green’s function G(x), then we find Z − 4G(x) = −4 eik·x g(k) d3 k Z = eik·x k2 g(k) d3 k = δ (3) (x) Z d3 k . (6.452) = eik·x (2π)3 Thus, the Green’s function G(x) is the Fourier transform Z ik·x 3 e d k G(x) = 2 (2π)3 k
(6.453)
which we may integrate to G(x) =
1 1 = 4π|x| 4πr
(6.454)
where r = |x| is the length of the vector x. The Green’s function for the Helmholtz equation (6.30) must satisfy (−4 − k2 )GH (x) = δ (3) (x).
(6.455)
By using the same Fourier-transform method, one can show that it is given by eikr (6.456) 4πr in which k and r are the lengths of k and x. Similarly, the Green’s function GmH for the modified Helmholtz equation GH (x) =
(−4 + m2 )GmH (x) = δ (3) (x)
(6.457)
is GmH (x) =
e−mr . 4πr
(6.458)
Of these Green’s functions, probably the most important is G(x) = 1/(4πr) which has the expansion ∞ X ` ` X r< 1 1 = G(x − x ) = Y m (θ, φ)Y`m∗ (θ0 , φ0 ) `+1 ` 4π|x − x0 | 2` + 1 r> `=0 m=−` (6.459) in terms of the spherical harmonics Y`m (θ, φ). Here, r, θ, and φ are the 0
300
Differential Equations
spherical coordinates of the point x, and r0 , θ0 , and φ0 are those of the point x0 ; r> is the larger of r and r0 , and r< is the smaller of r and r0 . If we substitute this expansion into the formula (6.448) for the potential φ, then we arrive at the multipole expansion Z φ(t, x) = 4π =
G(x − x0 ) ρ(t, x0 ) d3 x0
∞ X ` X `=0 m=−`
4π 2` + 1
Z
` r< `+1 r>
(6.460)
Y`m (θ, φ)Y`m∗ (θ0 , φ0 ) ρ(t, x0 ) d3 x0 .
Typically, this expansion is used to compute the potential due to a localized distribution of charge ρ(t, x0 ) on a remote point x. In this case, the integration is only over the restricted region where ρ(t, x0 ) 6= 0, and so r< = r0 and r> = r, and the multipole expansion is φ(t, x) =
∞ X `=0
Z ` X Y`m (θ, φ) 4π r0` Y`m∗ (θ0 , φ0 ) ρ(t, x0 ) d3 x0 . (6.461) 2` + 1 r`+1 m=−`
In terms of the multipoles Qm `
Z =
r0` Y`m∗ (θ0 , φ0 ) ρ(t, x0 ) d3 x0
(6.462)
∞ X ` m X 4πQm ` Y` (θ, φ) . 2` + 1 r`+1
(6.463)
the potential is φ(t, x) =
`=0 m=−`
The spherical harmonics provide for the Legendre polynomial the expansion ` 4π X m ˆ)= P` (ˆ x·x Y` (θ, φ)Y`m∗ (θ0 , φ0 ) 2` + 1 0
(6.464)
m=−`
which abbreviates the Green’s function formula (6.459) to G(x − x0 ) =
∞ ` 1 1 X r< ˆ 0 ). = P (ˆ x·x `+1 ` 4π|x − x0 | 4π r `=0 >
(6.465)
6.36 Eigenfunctions and Green’s Functions
301
6.36 Eigenfunctions and Green’s Functions Our construction of the Green’s function (6.453) was based on the resolution (6.451) of the delta function Z d3 k ik·(x−y) (3) e (6.466) δ (x − y) = (2π)3 in terms of the eigenfunctions exp(ik · x) of the differential operator L = 4 4eik·x + k2 eik·x = 0.
(6.467)
We may generalize this way of making Green’s functions. Suppose L is a differential operator with eigenfunctions fn (x) and eigenvalues λn L fj (x) + λj fj (x) = 0.
(6.468)
Suppose these eigenfunctions are orthonormal with unit weight function Z δij = (fi , fj ) = fi∗ (x)fj (x) dn x. (6.469) Suppose they form a complete set of functions for the space S of functions, so that any function g ∈ S possesses an expansion of the form g(x) =
∞ X
aj fj (x)
(6.470)
j=1
with coefficients aj given by (6.469) as Z aj = fj∗ (x) g(x) dn x
(6.471)
so that Z g(x) =
∞ X fj∗ (y) fj (x) g(y) dn y
(6.472)
j=1
implies the completeness relation δ (n) (x − y) =
∞ X
fj (x) fj∗ (y).
(6.473)
j=1
Then we may make a Green’s function G(x − y) that satisfies L G(x − y) = δ (n) (x − y)
(6.474)
302
Differential Equations
by expanding both sides in terms of the complete set of eigenfunctions fj as
L
∞ X
aj fj (x)fj∗ (y) =
j=1
∞ X
−aj λj fj (x) fj∗ (y) =
j=1
∞ X
fj (x) fj∗ (y)
(6.475)
j=1
which works if aj = −1/λj G(x − y) =
∞ X fj (x) fj∗ (y)
−λj
j=1
.
(6.476)
Often one wants the Green’s function to satisfy (L + λ) G(x − y) = δ (n) (x − y)
(6.477)
instead of (6.474). In this case, one sets (L + λ)
∞ X j=1
aj fj (x)fj∗ (y)
=
∞ X
aj (−λj +
λ) fj (x) fj∗ (y)
j=1
=
∞ X
fj (x) fj∗ (y)
j=1
(6.478) whence the required Green’s function is G(x − y) =
∞ X fj (x) fj∗ (y) j=1
λ − λj
.
(6.479)
6.37 Green’s Functions in One Dimension In one dimension, a more explicit treatment allows us to solve the inhomogeneous ODE L f (x) + g(x) = 0
(6.480)
in which L d L= dx
d p(x) dx
+ q(x)
(6.481)
is a real, self-adjoint differential operator. We will construct a Green’s function from two solutions u and v of the homogeneous ODE L u(x) = L v(x) = 0. We’ll take the Green’s function to be 1 G(x, y) = − [θ(x − y)u(y)v(x) + θ(y − x)u(x)v(y)] A
(6.482)
(6.483)
in which θ(x) = (x + |x|)/(2|x|) is the Heaviside step function (Oliver
6.37 Green’s Functions in One Dimension
303
Heaviside, 1850–1925), and A is a constant which we’ll presently identify. We’ll show that Z b f (x) = G(x, y) g(y) dy a Z Z u(x) b v(x) x u(y) g(y) dy − v(y) g(y) dy (6.484) =− A a A x solves the inhomogeneous equation (6.480). Differentiating, we find after a cancellation Z Z v 0 (x) x u0 (x) b 0 f (x) = − u(y) g(y) dy − v(y) g(y) dy. (6.485) A A a x Differentiating again, we have Z Z v 00 (x) x u00 (x) b 00 f (x) = − u(y) g(y) dy − v(y) g(y) dy A A a x v 0 (x)u(x)g(x) u0 (x)v(x)g(x) − + A A Z x Z 00 v (x) u00 (x) b u(y) g(y) dy − v(y) g(y) dy =− A A a x W (x) − g(x) A
(6.486)
in which W (x) is the wronskian W (x) = u(x)v 0 (x) − u0 (x)v(x).
(6.487)
By applying the result (6.322) for the wronskian of two linearly independent solutions of a self-adjoint homogeneous ODE (6.482), we find W (x) = W (x0 )
p(x0 ) . p(x)
(6.488)
We set the constant A = [W (x0 )p(x0 )] so that the last term in (6.486) is −g(x)/p(x). It follows that Z Z Lu(x) b Lv(x) x Lf (x) = − u(y) g(y) dy − v(y) g(y) dy − f (x) = −g(x). A A a x (6.489) But Lu(x) = Lv(x) = 0, so we see that f satisfies the inhomogeneous ODE (6.480) Lf (x) + g(x) = 0.
(6.490)
304
Differential Equations
6.38 Problems 1. Suppose a voltage V (t) = V sin(ωt) is applied to a resistor of R (Ω) in series with a capacitor of capacitance C (F ). If the current through the circuit at time t = 0 is zero, what is the current at time t? 2. (a) Show whether the ODE (1 + y 2 )y dx + (1 + x2 )x dy (1 + x2 + y 2 )3/2
=0
(6.491)
is or is not exact. (b) Find its general integral and the solution y(x). Hint: Use the method of integration outlined in SubSec. (6.9). 3. (a) Separate the variables of the ODE (1 + y 2 )y dx + (1 + x2 )x dy = 0.
(6.492)
(b) Find its general integral and the solution y(x). 4. James Bernoulli studied ODEs of the form y0 + p y = q yn
(6.493)
in which p and q are functions of x. Division by y n and the substitution v = y 1−n gives v 0 + (1 − n)p v = (1 − n) q
(6.494)
which is soluble as shown in Sec. (6.14). Use this method to solve the ODE dy y − = 5x2 y 5 . (6.495) dx 2x 5. Integrate the ODE (xy + 1) dx + 2x2 (2xy − 1) dy = 0.
(6.496)
Hint: Use the variable v(x) = xy(x) instead of y(x). 6. Derive the relation (11.413) between the energy density ρ and the RobertsonWalker scale factor a(t) from the conservation law (11.408) and the equation of state (11.410). 7. For a universe with w = 1/3 and k = 1, use Friedmann’s equations (6.210 & 6.211) to derive the solution (11.432) subject to the boundary condition a(0) = 0. When does the universe collapse in a big crunch?
305
6.38 Problems
8. For a universe with w = 0 and k = 0, use Friedmann’s equations (6.210 & 6.211) to derive the solution (11.439) subject to the boundary condition a(0) = 0. (` )
9. Show that as long as the matrix Ykj = yk j (xj ) is nonsingular, the n boundary conditions bj = y (`j ) (xj ) =
n X
(` )
ck yk j (xj )
(6.497)
k=1
determine the n coefficients ck of the expansion (6.273) to be C T = B Y −1
or Ck =
n X
−1 bj Yjk .
(6.498)
j=1
10. Show that if the real and imaginary parts u1 , u2 , v1 , and v2 of ψ and χ satisfy boundary conditions at x = a and x = b that make the boundary term (6.287) vanish, then its complex analog (6.294) also vanishes. 11. Show that if the real and imaginary parts u1 , u2 , v1 , and v2 of ψ and χ satisfy boundary conditions at x = a and x = b that make the boundary term (6.287) vanish, and if the differential operator L is real and self adjoint, then (6.290) implies (6.295). 12. Show that if D is the set of all twice-differentiable functions u(x) on [a, b] that satisfy Dirichlet’s boundary conditions (6.297) and if the function p(x) is continuous and positive on [a, b], then the adjoint set D∗ defined as the set of all twice-differentiable functions v(x) that make the boundary term (6.300) vanish for all functions u ∈ D coincides with the set D. 13. Same as problem (12) but for Neumann boundary conditions (6.298). 14. Same as problem (12) but for themore general boundary conditions (6.299). 15. Show that after the change of variables u(x) = Jn (kx) = Jn (ρ), the selfadjoint differential equation (6.366) becomes Bessel’s equation (6.367). 16. Derive Bessel’s inequality (6.440) from the inequality (6.439). 17. Derive the Yukawa potential (6.458) as the Green’s function for the modified Helmholtz equation (6.457).
7 Integral Equations
Differential equations when integrated become integral equations with boundary conditions. Thus if we integrate the first-order ODE du(x) ≡ ux (x) = p(x) u(x) + q(x) dx then we get the integral equation Z x Z x u(x) = p(y) u(y) dy + q(y) dy + u0 a
(7.1)
(7.2)
a
and the boundary condition u(a) = f (a) = u0 . With a little more effort, we may integrate the second-order ODE u00 = pu0 + qu + r
(7.3)
(problems 1 & 2) to Z u(x) = f (x) +
x
k(x, y) u(y) dy
(7.4)
a
with k(x, y) = p(y) + (x − y) q(y) − p0 (y)
(7.5)
and f (x) = u(a) + (x − a) u0 (a) − p(a) u(a) +
Z
x
(x − y)r(y) dy.
(7.6)
a
In some physical problems, integral equations arise independently of differential equations. Whatever their origin, integral equations tend to have properties more suitable to mathematical analysis because derivatives are unbounded operators.
7.1 Fredholm Integral Equations
307
7.1 Fredholm Integral Equations An equation of the form Z b k(x, y) u(y) dy = λ u(x) + f (x)
(7.7)
a
for a ≤ x ≤ b in which the kernel k(x, y) and the function f (x) are given is an inhomogeneous Fredholm equation of the second kind for the function u(x) and the parameter λ. (Erik Ivar Fredholm, 1866–1927). If f (x) = 0, then it is a homogeneous Fredholm equation of the second kind Z b k(x, y) u(y) dy = λ u(x), a ≤ x ≤ b. (7.8) a
Such an equation typically has non-trivial solutions only for certain eigenvalues λ. Each such solution u(x) is an eigenfunction. If λ = 0 but f (x) 6= 0, then equation (7.7) is an inhomogeneous Fredholm equation of the first kind Z b k(x, y) u(y) dy = f (x), a ≤ x ≤ b. (7.9) a
Finally, if both λ = 0 and f (x) = 0, then it is a homogeneous Fredholm equation of the first kind Z b k(x, y) u(y) dy = 0, a ≤ x ≤ b. (7.10) a
These Fredholm equations are linear because they involve only the first (and zeroth) power of the unknown function u(x).
7.2 Volterra Integral Equations If the kernel k(x, y) in the equations (7.7– 7.10) that define the Fredholm integral equations is causal, that is, if k(x, y) = k(x, y) θ(x − y)
(7.11)
in which θ is the Heaviside function θ(x − y) =
(x − y + |x − y|) 2|x − y|
(7.12)
308
Integral Equations
then the corresponding equations bear the name Volterra (Vito Volterra, 1860–1941). Thus, an equation of the form Z x k(x, y) u(y) dy = λ u(x) + f (x) (7.13) a
in which the kernel k(x, y) and the function f (x) are given, is an inhomogeneous Volterra equation of the second kind for the function u(x) and the parameter λ. If f (x) = 0, then it is a homogeneous Volterra equation of the second kind Z x k(x, y) u(y) dy = λ u(x). (7.14) a
Such an equation typically has non-trivial solutions only for certain eigenvalues λ. These solutions u(x) are the eigenfunctions. If λ = 0 but f (x) 6= 0, then equation (7.13) is an inhomogeneous Volterra equation of the first kind Z x k(x, y) u(y) dy = f (x). (7.15) a
Finally, if both λ = 0 and f (x) = 0, then it is a homogeneous Volterra equation of the first kind Z x k(x, y) u(y) dy = 0. (7.16) a
These Volterra equations are linear because they involve only the first (and zeroth power) of the unknown function u(x). In what follows, we’ll mainly discuss Fredholm integral equations, since those of the Volterra type are a special case of the Fredholm type.
7.3 Implications of Linearity Because the Fredholm and Volterra integral equations are linear, one may add solutions of the homogeneous equations (7.8, 7.10, 7.14, & 7.16). Thus if uj (x) are eigenfunctions Z
b
k(x, y) uj (y) dy = λ uj (x), a
a≤x≤b
(7.17)
309
7.4 Numerical Solutions
P
with the same eigenvalue λ, then the sum j aj uj (x) is also an eigenfunction with the same eigenvalue Z b X X X Z b k(x, y) aj uj (y) dy = k(x, y) uj (y) dy = aj λ uj (x) aj a
j
a
j
=λ
j
X
aj uj (x) .
(7.18)
j
It is also true that the difference between any two solutions ui1 (x) and of one of the inhomogeneous Fredholm (7.7, 7.9) or Volterra (7.13, 7.15) equations is a solution of the associated homogeneous equation (7.8, 7.10, 7.14, or 7.16). Thus, if ui1 (x) and ui2 (x) satisfy the inhomogeneous Fredholm equation of the second kind Z b k(x, y) uij (y) dy = λ uij (x) + f (x), j = 1, 2 (7.19) ui2 (x)
a
then their difference ui1 (x)−ui2 (x) satisfies the homogeneous Fredholm equation of the second kind Z b k(x, y) ui1 (y) − ui2 (y) dy = λ ui1 (x) − ui2 (x) . (7.20) a
Thus, the most general solution ui (x) of the inhomogeneous Fredholm equation of the second kind (7.19) is a particular solution uip (x) of that equation plus the general solution of the homogeneous Fredholm equation of the second kind (7.17) X ui (x) = uip (x) + aj uj (x). (7.21) j
Linear integral equations are much easier to solve than non-linear ones. 7.4 Numerical Solutions Let us break the real interval [a, b] into N segments [yk , yk+1 ] of equal length ∆y = (b − a)/N with y0 = a, yk = a + k ∆y, and yN = b. We’ll also set xk = yk and define U as the vector with entries Uk = u(yk ) and K as the (N + 1) × (N + 1) square matrix with elements Kk` = k(xk , y` ) ∆y. Then we may approximate the homogeneous Fredholm equation of the second kind (7.8) Z b k(x, y) u(y) dy = λ u(x), a≤x≤b (7.22) a
310
Integral Equations
as the algebraic equation N X
Kk,` U` = λ Uk .
(7.23)
K U = λ U.
(7.24)
`=0
or, in matrix notation, We saw in section 1.27 that every such equation has N + 1 eigenvectors U (α) and eigenvalues λ(α) , and that the eigenvalues λ(α) are the solutions of the characteristic equation (1.284) det(K − λ(α) I) = K − λ(α) I = 0. (7.25) In general, as N → ∞ and ∆y → 0, the number N + 1 of eigenvalues λ(α) and eigenvectors U (α) also becomes infinite. We may apply the same technique to the inhomogeneous Fredholm equation of the first kind Z b k(x, y) u(y) dy = f (x), a ≤ x ≤ b. (7.26) a
The resulting matrix equation is KU =F
(7.27)
in which the kth entry in the vector F is Fk = f (xk ). This equation has the solution U = K −1 F
(7.28)
as long as the matrix K is non-singular, that is, as long as det K 6= 0.
(7.29)
This technique applied to the inhomogeneous Fredholm equation of the second kind Z b k(x, y) u(y) dy = λ u(x) + f (x) (7.30) a
leads to the matrix equation K U = λ U + F.
(7.31)
The associated homogeneous matrix equation K U = λ U.
(7.32)
has N + 1 eigenvalues λ(α) and eigenvectors U (α) ≡ |αi. For any value of λ
311
7.4 Numerical Solutions
that is not one of the eigenvalues λ(α) , the matrix K − λI has a non-zero determinant and hence an inverse, and so the vector U i = (K − λ I)−1 F
(7.33)
is a solution of the inhomogeneous matrix equation (7.31). If λ = λ(β) is one of the eigenvalues λ(α) of the homogeneous matrix equation (7.32), then the matrix K − λ(β) I will not have an inverse, but it will have a pseudoinverse (section 1.36). If its singular-value decomposition (1.424) is K − λ(β) I =
N +1 X
|mn iSn hn|
(7.34)
n=1
then its pseudoinverse (1.459) is
K − λ(β) I
+
N +1 X
=
|niSn−1 hmn |
(7.35)
n=1 Sn 6=0
in which the sum is over the positive singular values. So if the vector F is a linear combination of the left singular vectors |mn i whose singular values are positive F =
N +1 X
fn |mn i
(7.36)
n=1 Sn 6=0
then the vector + U i = K − λ(β) I F
(7.37)
will be a solution of the inhomogeneous matrix Fredholm equation (7.31). For in this case + K − λ(β) I U i = K − λ(β) I K − λ(β) I F =
N +1 X
|mn00 iSn00 hn00 |
n00 =1
=
N +1 X
N +1 X n0 =1
Sn0 6=0
fn |mn i = F.
|n0 iSn−1 0 hmn0 |
N +1 X
fn |mn i
n=1 Sn 6=0
(7.38)
n=1 Sn 6=0
The most general solution will be the sum of this particular solution of
312
Integral Equations
the inhomogeneous equation (7.31) and the most general solution of the homogeneous equation (7.32) + X X F+ fβ,k U (β,k) . (7.39) U = Ui + fβ,k U (β,k) = K − λ(β) I k
k
Open-source programs are available in C++ (math.nist.gov/tnt/) and in fortran (www.netlib.org/lapack/) that can solve such equations for the N + 1 eigenvalues λ(α) and eigenvectors U (α) and for the inverse K −1 for N = 100, 1000, 10,000, etc. in milliseconds on a PC.
7.5 Integral Transformations Integral transformations (Courant and Hilbert, 1955, chap. VII) help us solve linear homogeneous differential equations like Lu + cu = 0
(7.40)
in which L is a linear operator involving derivatives of u(z) with respect to its complex argument z = x + iy and c is a constant. We choose a kernel K(z, w) analytic in both variables and write u(z) as an integral along a contour in the complex w-plane weighted by an unknown function v(w) Z u(z) = K(z, w) v(w) dw. (7.41) C
If the differential operator L commutes with the contour integration as it usually would, then our differential equation (7.40) is Z [L K(z, w) + c K(z, w)] v(w) dw = 0. (7.42) C
The next step is to find a linear operator M that acting on K(z, w) with w-derivatives (but no z-derivatives) gives L acting on K(z, w) M K(z, w) = L K(z, w). We then get an integral equation Z [M K(z, w) + c K(z, w)] v(w) dw = 0
(7.43)
(7.44)
C
involving w-derivatives which we can integrate by parts. We choose the contour C so that the resulting boundary terms vanish. By using our freedom to pick the kernel and the contour, we often can make the resulting differential equation for v simpler than the one (7.40) we started with.
7.5 Integral Transformations
313
Example 7.1 (Fourier, Laplace, and Euler Kernels) We already are familiar with the most important integral transforms. In chapter 3, we learned that the kernel K(z, w) = exp(izw) leads to the Fourier transform Z ∞ eizw v(w) dw (7.45) u(z) = −∞
and the kernel K(z, w) = exp(−zw) to the Laplace transform Z ∞ u(z) = e−zw v(w) dw
(7.46)
0
of section 3.10. Euler’s kernel K(z, w) = (z − w)a occurs in many applications of Cauchy’s integral theorem (5.19) and integral formula (5.35). These kernels help us solve differential equations. Example 7.2 (Bessel Functions) equation (6.367)
The differential operator L for Bessel’s
z 2 u00 + z u0 + z 2 u − λ2 u = 0.
(7.47)
is L = z2
d2 d +z + z2 2 dz dz
(7.48)
and the constant c is − λ2 . If we choose M = − d2 /dw2 , then the kernel should satisfy (7.43) L K − M K = z 2 Kzz + z Kz + z 2 K + Kww = 0
(7.49)
in which subscripts indicate differentiation as in (6.20). The kernel K(z, w) = e±iz sin w
(7.50)
is a solution (problem 3) that is entire in both variables. In terms of it, our integral equation (7.44) is Z Kww (z, w) + λ2 K(z, w) v(w) dw = 0. (7.51) C
We now integrate by parts once Z dKw v 0 2 dw − Kw v + λ K v + dw C
(7.52)
and then again Z C
d(Kw v − Kv 0 ) K v +λ v + dw. dw 00
2
(7.53)
314
Integral Equations
If we choose the contour so that Kw v − Kv 0 = 0 at its ends, then the unknown function v must satisfy the differential equation v 00 + λ2 v = 0
(7.54)
which is vastly simpler than Bessel’s; the solution v(w) = exp(iλw) is an entire function of w for every complex λ. Our solution u(z) then is Z Z e±iz sin w eiλw dw. (7.55) K(z, w) v(w) dw = u(z) = C
C
For Re(z) > 0 and any complex λ, the contour C1 that runs from − i∞ to the origin w = 0, then to w = −π, and finally up to − π + i∞ has Kw v − Kv 0 = 0 at its ends (problem 4) provided we use the minus sign in the exponential. The function defined by this choice Z 1 (1) Hλ (z) = − e−iz sin w+iλw dw (7.56) π C1 is the first Hankel function (Hermann Hankel, 1839–1873). The second Hankel function is defined for Re(z) > 0 and any complex λ by a contour C2 that runs from π + i∞ to w = π, then to w = 0, and lastly to − i∞ Z 1 (2) e−iz sin w+iλw dw. (7.57) Hλ (z) = − π C2 Because the integrand exp(−iz sin w + iλw) is an entire function of z and w, one may deform the contours C1 and C2 and analytically continue the Hankel functions beyond the right half-plane (Courant and Hilbert, 1955, chap. VII). One may verify (problem 5) that the Hankel functions are related by complex conjugation (1)
(2)∗
Hλ (z) = Hλ
(z)
(7.58)
when both z > 0 and λ are real.
7.6 Problems 1. Show that Z
x
Z dz
a
z
Z dy f (y) =
a
x
(x − y) f (y) dy. a
Hint: differentiate both sides with respect to x.
(7.59)
7.6 Problems
315
2. Use this identity (7.59) to integrate (7.3) and derive equations (7.4, 7.5, & 7.6). 3. Show that the kernel K(z, w) = exp(±iz sin w) satisfies the differential equation (7.49). 4. Show that for Rez > 0 and arbitrary complex λ, the boundary terms in the integral (7.53) vanish for the two contours C1 and C2 that define the two Hankel functions. 5. Show that the Hankel functions are related by complex conjugation (9.103) when both z > 0 and λ are real.
8 Legendre Polynomials
8.1 The Legendre Polynomials The Green’s function 1 1 1 =√ =√ 2 2 2 2 |R − r| R + r − 2R · r R + r − 2Rr cos θ
(8.1)
occurs throughout physics. If R ≡ |R| > r ≡ |r|, then we may set x = cos θ, t = r/R, and factor out 1/R 1 1 1 p = |R − r| R 1 + (r/R)2 − 2(r/R)x 1 1 √ = R 1 + t2 − 2xt 1 = g(t, x) R
(8.2)
and then expand the inverse square-root in a power series in t = r/R g(t, x) = 1 + t2 − 2xt
−1/2
=
∞ X
Pn (x) tn .
(8.3)
n=0
The coefficients Pn (x) are the Legendre polynomials. The function g(t, x) is a generating function for these polynomials. The Legendre polynomials provide for the Green’s function G(R − r) = 1/|R − r| the expansion ∞ 1 1 X r n = Pn (cos θ) |R − r| R R
(8.4)
n=0
in which cos θ =
R·r . Rr
(8.5)
317
8.2 Recurrence Relations
With x ≡ cos θ, the first few Legendre polynomials are: P0 (x) = 1 P1 (x) = x 1 3x2 − 1 P2 (x) = 2 1 5x3 − 3x P3 (x) = 2 1 35x4 − 30x2 + 3 P4 (x) = 8 1 63x5 − 70x3 + 15x P5 (x) = 8 1 P6 (x) = 231x6 − 315x4 + 105x2 − 5 16 1 429x7 − 693x5 + 315x3 − 35x P7 (x) = 16 1 P8 (x) = 6435x8 − 12012x6 + 6930x4 − 1260x2 + 35 . 128
(8.6)
By using the binomial series (4.77) with x replaced by t2 − 2tx we find g(t, x) = 1 + t2 − 2xt
−1/2
=
∞ X
Pn (x) tn
n=0
=1+
∞ X n=1
(2n − 1)!! (2tx − t2 )n 2n n!
(8.7)
where the double factorial (4.37) is (2n − 1)!! = (2n − 1)(2n − 3)(2n − 5) · · · 1. Clearly P0 (x) = 1, and (less clearly) P1 (x) = x, and eventually [n/2]
Pn (x) =
X k=0
(−1)k (2n − 2k)! xn−2k 2n k! (n − k)! (n − 2k)!
(8.8)
in which [x] is the greatest integer not bigger than x, and we recall (4.33) that 0! = 1. This impressive formula is a hard way to compute Pn (x).
8.2 Recurrence Relations We differentiate the definition (8.3) of the generating function g(t, x) with respect to t ∞
X ∂g(t, x) x−t = = n Pn (x) tn−1 ∂t (1 − 2xt + t2 )3/2 n=1
(8.9)
318
Legendre Polynomials
and so find (x−t) g(t, x) = (1−2xt+t2 )
∞ X
n Pn (x) tn−1 = (x−t)
n=1
∞ X
Pn (x) tn . (8.10)
n=0
By equating the coefficients of tn in the last two expressions, we arrive at the recurrence relation (2n + 1) x Pn (x) = (n + 1) Pn+1 (x) + n Pn−1 (x).
(8.11)
This recurrence relation implies for n = 1 that 3xP1 (x) = 2P2 (x) + P0 (x)
(8.12)
and so that P2 (x) = (3x2 − 1)/2 since P1 = x and P0 = 1. We differentiate the definition (8.3) of the generating function g(t, x) with respect to x ∞
X t ∂g(t, x) = = P 0 (x) tn ∂x (1 − 2xt + t2 )3/2 n=1 n
(8.13)
and so get (1 − 2xt + t2 )
∞ X
Pn0 (x) tn = t g(t, x) =
n=1
∞ X
Pn (x) tn+1 .
(8.14)
n=0
Equating coefficients of tn , we have 0 0 Pn+1 (x) + Pn−1 (x) = 2x Pn0 (x) + Pn (x).
(8.15)
By differentiating the recurrence relation (8.11) and combining it with this last equation, we find 0 0 Pn+1 (x) − Pn−1 (x) = (2n + 1) Pn (x).
(8.16)
The last two recurrence relations (8.15 & 8.16) lead to several more: 0 Pn+1 (x) = (n + 1) Pn (x) + x Pn0 (x)
(1 − (1 −
0 Pn−1 (x) 2 0 x ) Pn (x) x2 ) Pn0 (x)
= −n Pn (x) +
x Pn0 (x)
(8.17) (8.18)
= n Pn−1 (x) − nx Pn (x)
(8.19)
= (n + 1)x Pn (x) − (n + 1) Pn+1 (x).
(8.20)
319
8.3 Legendre’s Equation
8.3 Legendre’s Equation We now differentiate (8.19) 0 0 (x) − n Pn (x) − nx Pn0 (x) (1 − x2 ) Pn0 (x) = n Pn−1
(8.21)
0 and use (8.18) for Pn−1 (x) to get Legendre’s equation 0 (1 − x2 ) Pn0 (x) + n(n + 1) Pn (x) = 0
(8.22)
which self adjoint. Setting x = cos θ, (1 − x2 ) = sin2 θ, and d d cos θ d d = = − sin θ dθ dθ dx dx
(8.23)
1 d d =− dx sin θ dθ
(8.24)
so that
we may write Legendre’s equation in terms of the angle θ as 1 d d sin θ Pn (cos θ) + n(n + 1) Pn (cos θ) = 0 sin θ dθ dθ
(8.25)
which is how it appears when we separate the variables of the laplacian 4 in spherical coordinates.
8.4 Special Values of Legendre’s Polynomials The definition (8.3) of the generating function g(t, x) at x = 1 reduces to −1/2 g(t, 1) = 1 + t2 − 2t =
∞
∞
n=0
n=0
X X 1 = tn = Pn (1) tn 1−t
(8.26)
which implies that Pn (1) = 1.
(8.27)
Similarly g(t, −1) = 1 + t2 + 2t
−1/2
∞
=
∞
X X 1 = (−t)n = Pn (−1) tn 1+t n=0
(8.28)
n=0
implies that Pn (−1) = (−1)n .
(8.29)
320
Legendre Polynomials
With more effort, one shows that P2n (0) = (−1)n
(2n − 1)!! (2n)!!
(8.30)
and that P2n+1 (0) = 0.
(8.31)
The generating function g(t, x) is even under the reflection of both independent variables, so g(t, x) =
∞ X
n
t Pn (x) =
n=0
∞ X
(−t)n Pn (−x) = g(−t, −x)
(8.32)
n=0
whence Pn (−x) = (−1)n Pn (x)
(8.33)
By writing the term cos θ in the definition (8.3) of the generating function in terms of the exponentials exp(±iθ), one may show that Pn (cos θ) =
n X
am cos mθ
(8.34)
m=0
in which the coefficients am are non-negative. It follows that |Pn (cos θ)| = |Pn (x)| ≤ Pn (1) = 1.
(8.35)
8.5 Orthogonality The differential operator L=
d d (1 − x2 ) dx dx
(8.36)
is self adjoint and the real function p(x) = 1 − x2 vanishes at x = ±1, so Legendre’s differential operator L is self adjoint on the interval [ − 1, 1]. The Legendre polynomial Pn (x) is an eigenfunction of L with eigenvalue n(n+1) and weight function w(x) = 1 0 (1 − x2 ) Pn0 (x) + n(n + 1) Pn (x) = 0 (8.37) which repeats (8.22). The orthogonality relation (6.386) tells us that eigenfunctions of a selfadjoint differential operator that have different eigenvalues are orthogonal on the appropriate interval (here [−1, 1]) with respect to the weight function
321
8.5 Orthogonality
w(x), which here is unity. Thus Pn (x) and Pm (x) are orthogonal for n 6= m Z
1
Pn (x) Pm (x) dx = 0
(8.38)
−1
on the interval [−1, 1]. Since the Legendre polynomials are real, we did not complex-conjugate one of them in this equation. In terms of the angle θ, this orthogonality relation is for n 6= m Z π Pn (cos θ) Pm (cos θ) sin θ dθ = 0. (8.39) 0
For n = m, the integral (8.38) is Z 1 Pn2 (x) dx = −1
2 . 2n + 1
(8.40)
One may derive this result by integrating the square of the generating function g(t, x) as defined by the series (8.3) and using the orthogonality of the polynomials (8.38) !2 Z 1 Z 1 Z 1 X ∞ ∞ X dx n n t Pn (x) dx = t Pn2 (x) dx. (8.41) = 2 − 2xt 1 + t −1 −1 −1 n=0
n=0
We may do the first integral by setting u = 1 + t2 − 2xt Z 1 Z 2 dx 1 (1+t) du 1 1+t = = ln 2 2t (1−t)2 u t 1−t −1 1 + t − 2xt which has the power series expansion (4.82) ∞ X 1 1+t t2n =2 ln . t 1−t 2n + 1
(8.42)
(8.43)
n=0
If we now equate the coefficients of t2n , then we get (8.40) and so Z 1 2 δnm . Pn (x) Pm (x) dx = 2n + 1 −1
(8.44)
The scaled polynomials r Ln (x) =
2n + 1 Pn (x) 2
(8.45)
are orthonormal on [−1, 1]. Since the weight function w(x) of the Legendre polynomials is unity,
322
Legendre Polynomials
w(x) = 1, it follows from Eq.(6.438) that they provide for Dirac’s delta function the resolution 0
δ(x − x ) =
∞ X 2n + 1
2
n=0
Pn (x0 ) Pn (x)
which in turn leads to the Fourier-Legendre expansion Z 1 ∞ X 2n + 1 f (x) = Pn (x0 ) f (x0 ) dx0 . Pn (x) 2 −1
(8.46)
(8.47)
n=0
8.6 The Azimuthally Symmetric Laplacian We saw in Eqs.(9.69–6.51) that the laplacian 4 = ∇·∇ separates in spherical coordinates r, θ, φ. If the system has no dependence on the angle φ, then it is said to have azimuthal symmetry. In this case, the function f (r, θ, φ) = Rk,` (r) Θ` (θ)
(8.48)
will be a solution of Helmholtz’s equation −4f = k 2 f
(8.49)
if the functions Rk,` (r) and Θ` (θ) satisfy 1 d `(` + 1) 2 dRk,` 2 r + k − Rk,` = 0 r2 dr dr r2 1 d dΘ` sin θ + `(` + 1)Θ` = 0 sin θ dθ dθ
(8.50) (8.51)
in which ` is a non-negative integer. This last equation is the Legendre equation (8.25), and so we may set Θ` (θ) = P` (cos θ).
(8.52)
For k > 0, the solutions of the radial equation (8.50) that are regular at r = 0 are the spherical Bessel functions Rk,` (r) = j` (kr)
(8.53)
given by `
j` (x) = (−1) x
`
d xdx
`
sin x x
.
(8.54)
323
8.6 The Azimuthally Symmetric Laplacian
So the general, azimuthally symmetric solution of the Helmholtz equation (8.49) that is regular at r = 0 is f (r, θ) =
∞ X
ak,` j` (kr) P` (cos θ)
(8.55)
`=0
in which the ak,` are constants. If the solution need not be regular at the origin, then the Neumann functions d ` cos x ` ` (8.56) n` (x) = −(−1) x xdx x must be included, and the general solution then is f (r, θ) =
∞ X
[ak,` j` (kr) + bk,` n` (kr)] P` (cos θ)
(8.57)
`=0
in which the ak,` and bk,` are constants. When k = 0, Helmholtz’s equation reduces to Laplace’s 4f = 0
(8.58)
which describes the Coulomb-gauge electric potential in the absence of charges and the Newtonian gravitational potential in the absence of masses. Now the radial equation is simply d dR` r2 = `(` + 1)R` (8.59) dr dr since k = 0. We try setting R` (r) = rn
(8.60)
n(n + 1) = `(` + 1)
(8.61)
which works if
that is, if n=`
or n = −(` + 1).
(8.62)
So the general solution to (8.58) is f (r, θ) =
∞ h X
i a` r` + b` r−`−1 P` (cos θ).
(8.63)
`=0
If the solution must be regular at r = 0, then all the b` ’s must be zero.
324
Legendre Polynomials
8.7 Laplacian in Two Dimensions In Eqs.(6.38–6.43), we saw that Helmholtz’s equation separates in cylindrical coordinates, but that the equation for B(ρ) is Bessel’s equation (6.40) for the cylindrical Bessel functions, which are not simple. But if α = 0, Helmholtz’s equation reduces to Laplace’s equation, and if the potential f is also independent of z, then simple solutions exist. For in this case both α and k are zero, and equations (6.40 & 6.41) become dB d2 Φ d ρ = m2 B and − = m2 Φ(φ). (8.64) ρ dρ dρ dφ2 The function Φ(φ) may be taken to be Φ(φ) = exp(imφ) or a linear combination of cos(mφ) and sin(mφ). If the whole range of φ from 0 to 2π is physically relevant, then Φ(φ) must be periodic, and so m must be an integer. It’s easy to solve the equation for Bm ; we set Bm = ρn and substitute that into Eq.(8.64), obtaining n2 ρn = m2 ρn
(8.65)
or n = ±m. The general z-independent solution of Laplace’s equation in cylindrical coordinates is then f (ρ, φ) =
∞ X
(am cos(mφ) + bm sin(mφ)) cm ρm + dm ρ−m .
(8.66)
m=0
8.8 The Formulas of Rodrigues and Schlaefli Rodrigues showed that the Legendre polynomial Pn (x) is proportional to the nth derivative of [−p(x)]n = (x2 − 1)n n 1 d Pn (x) = n (x2 − 1)n . (8.67) 2 n! dx Incidentally, this formula gives us another way of deriving the parity relation (8.29) Pn (−x) = (−1)n Pn (x).
(8.68)
Schlaefli used this formula to express Pn (z 0 ) as the contour integral I 1 (z 2 − 1)n 0 Pn (z ) = n dz 0 (8.69) 2 2πi (z − z 0 )n+1 as long as the contour encircles the complex point z 0 in a ccw manner.
325
8.9 The Laplacian in Spherical Coordinates
8.9 The Laplacian in Spherical Coordinates The laplacian 4 separates in spherical coordinates, as we saw in Eqs.(9.69– 6.51). Thus, a function f (r, θ) = Rk,` (r) Θ`,m (θ) Φm (φ)
(8.70)
will be a solution of the Helmholtz equation −4f = k 2 f if Rk,` is a linear combination of the spherical Bessel functions j` (8.54) and n` 8.56 Rk,` (r) = ak,` j` (kr) + bk,` n` (kr)
(8.71)
Φm (φ) = eimφ
(8.72)
if Φm is
and if Θ`,m satisfies associated Legendre equation dΘ`,m 1 d m2 sin θ + `(` + 1) − Θ`,m = 0. sin θ dθ dθ sin2 θ
(8.73)
8.10 The Associated Legendre Polynomials The associated Legendre functions P`m (x) are polynomials in both sin θ and cos θ. They arise as solutions of the separated θ equation (8.73) dP`m 1 d m2 Pm = 0 (8.74) sin θ + `(` + 1) − sin θ dθ dθ sin2 θ ` of the laplacian in spherical coordinates. In terms of x = cos θ, this selfadjoint ODE is 0 m2 (1 − x2 )P`0m (x) + `(` + 1) − P m (x) = 0. (8.75) 1 − x2 ` To find the P`m ’s, we use Leibnitz’s rule n n X X dn n! n (n−k) (k) AB = A B ≡ A(n−k) B (k) n k dx k! (n − k)! k=0
(8.76)
k=0
to differentiate Legendre’s equation (8.22) 0 (1 − x2 ) P`0 + `(` + 1) P` = 0
(8.77)
m times, obtaining (m+2)
(1 − x2 )P`
(m+1)
− 2x(m + 1)P`
(m)
+ (` − m)(` + m + 1)P`
= 0. (8.78)
326
Legendre Polynomials
We may make this equation self adjoint by using the prefactor formula (6.312) Z x −2x0 1 0 exp (m + 1) dx F = 1 − x2 1 − x02 1 2 = exp (m + 1) ln(1 − x ) = (1 − x2 )m . (8.79) 1 − x2 The resulting ODE is h i 0(m) 0 (m) (1 − x2 )m+1 P` + (1 − x2 )m (` − m)(` + m + 1)P` = 0
(8.80)
is self adjoint, but it is not (8.75). (m) Instead, we set P` (m)
P`
(x) = (1 − x2 )−m/2 P`m (x)
(8.81)
and compute its derivatives: mxP`m (m+1) 0m (1 − x2 )−m/2 (8.82) P` = P` + 1 − x2 2mxP`0m mP`m m(m + 2)x2 P`m (m+2) P` = P`00m + + + (1 − x2 )−m/2 . 1 − x2 1 − x2 (1 − x2 )2 If we put these three expressions in equation (8.78), then we get the desired ODE (8.75). The associated Legendre functions are (m)
P`m (x) = (1 − x2 )m/2 P`
(x) = (1 − x2 )m/2
They are simple polynomials in x = cos θ and P`m (cos θ) = sinm θ
√
dm P` (x) dxm
(8.83)
1 − x2 = sin θ
dm P` (cos θ). d cosm θ
(8.84)
It follows from Rodrigues’s formula (8.67) for the Legendre polynomial P` (x) that P`m (x) is given by the similar formula P`m (x) =
(1 − x2 )m/2 d`+m (x2 − 1)` 2` `! dx`+m
(8.85)
which tells us that under parity P`m (x) changes by (−1)`+m P`m (−x) = (−1)`+m P`m (x).
(8.86)
Rodrigues’s formula (8.85) for the associated Legendre function makes sense as long as ` + m ≥ 0. This last condition is the requirement in quantum mechanics that m not be less than −`. Similarly, if m exceeds `,
8.11 Spherical Harmonics
327
then P`m (x) is given by more than 2` derivatives of a polynomial of degree 2`, which is to say that P`m (x) = 0 if m > `. This last condition is the requirement in quantum mechanics that m not be greater than `. That is, −` ≤ m ≤ `.
(8.87)
One may show that P`−m (x) = (−1)m
(` − m)! m P (x). (` + m)! `
(8.88)
In fact, since m occurs only as m2 in the ODE (8.75), P`−m (x) must be proportional to P`m (x). Under reflections, the parity of P`m is (−1)`+m , that is, P`m (−x) = (−1)`+m P`m (x). √ If m 6= 0, then P`m (x) has a power of 1 − x2 in it, so P`m (±1) = 0.
(8.89)
(8.90)
We may consider either `(` + 1) or m2 as the eigenvalue in the ODE (8.75) 0 2 0m (1 − x )P` (x) + `(` + 1) −
m2 P m (x) = 0. 1 − x2 `
(8.91)
If `(` + 1) is the eigenvalue, then the weight function is unity, and since this ODE is self adjoint on the interval [−1, 1] (at the ends of which p(x) = (1 − x2 ) = 0), the eigenfunctions P`m (x) and P`m 0 (x) must be orthogonal on 0 that interval when ` 6= ` . The full integral formula is Z 1 2 (` + m)! P`m (x) P`m δ`,`0 . (8.92) 0 (x) dx = 2` + 1 (` − m)! −1 If m2 for fixed ` is the eigenvalue, then the weight function is 1/(1 − x2 ), and the eigenfunctions P`m (x) and P`m 0 (x) must be orthogonal on [−1, 1] when m 6= m0 . The full formula is Z 1 dx (` + m)! 0 P`m (x) P`m (x) = δm,m0 . (8.93) 2 1−x m(` − m)! −1 8.11 Spherical Harmonics The spherical harmonic Y`m (θ, φ) ≡ Y`m (θ, φ) is the product Y`m (θ, φ) = Θm ` (θ) Φm (φ)
(8.94)
328
Legendre Polynomials
m in which Θm ` (θ) is proportional to the associated Legendre function P` s 2` + 1 (` − m)! m m P (cos θ) (8.95) Θm ` (θ) = (−1) 2 (` + m)! `
and eimφ Φm (φ) = √ . 2π The big square-root in the definition (8.95) ensures that Z 2π Z π 0 sin θ dθ Y`m∗ (θ, φ) Y`m dφ 0 (θ, φ) = δ``0 δmm0 . 0
(8.96)
(8.97)
0
In spherical coordinates, the parity transformation x0 = −x
(8.98)
is r0 = r, θ0 = π − θ, and φ0 = φ ± π. So under parity, cos θ0 = − cos θ and exp(imφ0 ) = (−1)m exp(imφ). This factor of (−1)m cancels the mdependence (8.86) of P`m (θ) under parity, so that under parity Y`m (θ0 , φ0 ) = Y`m (π − θ, φ ± 1) = (−1)`+m (−1)m Y`m (θ, φ) = (−1)` Y`m (θ, φ). (8.99) Thus the parity of the state |n, `, mi is (−1)` . The spherical harmonics are complete on the unit sphere. They may be used to expand any smooth function f (θ, φ) as f (θ, φ) =
∞ X ` X
a`m Y`m (θ, φ).
(8.100)
`=0 m=−`
By using the preceding orthonormality relation (8.97), we may identify the coefficients a`m as Z 2π Z π (8.101) a`m = dφ sin θ dθ Y`m∗ (θ, φ) f (θ, φ). 0
0
Putting the last two equations together, we find "∞ ` # Z 2π Z π X X 0 0 0 m∗ 0 0 m f (θ, φ) = dφ sin θ dθ Y` (θ , φ ) Y` (θ, φ) f (θ0 , φ0 ) 0
0
`=0 m=−`
(8.102) and so, we may identify the sum within the brackets as proportional to an
329
8.11 Spherical Harmonics
Figure 8.1 The Cosmic Microwave Background temperature fluctuations from the 7-year Wilkinson Microwave Anisotropy Probe data seen over the full sky. The image is a mollweide projection of the temperature variations over the celestial sphere. The average temperature is 2.725 K, and the colors represent the tiny temperature fluctuations. Red regions are warmer and blue regions are colder by about 0.0002 degrees. Courtesy of NASA, WMAP, and Wikipedia.
angular delta function ∞ X ` X
Y`m∗ (θ0 , φ0 ) Y`m (θ, φ) =
`=0 m=−`
1 δ(θ − θ0 ) δ(φ − φ0 ) sin θ
(8.103)
which sometimes is abbreviated as ∞ X ` X
Y`m∗ (Ω0 ) Y`m (Ω) = δ (2) (Ω − Ω0 ).
(8.104)
`=0 m=−`
The spherical-harmonic expansion (8.100) of the Legendre polynomial P` (ˆ n·n ˆ 0 ) of the cosine n ˆ·n ˆ 0 in which the polar angles of the unit vectors respectively are θ, φ and θ0 , φ0 is the addition theorem P` (ˆ n·n ˆ 0) =
` 4π X m Y` (θ, φ)Y`m∗ (θ0 , φ0 ) 2` + 1 m=−`
` 4π X m∗ = Y` (θ, φ)Y`m (θ0 , φ0 ). 2` + 1
(8.105)
m=−`
Example 8.1 (CMB Radiation)
Instruments on the WMAP and Planck
330
Legendre Polynomials
Figure 8.2 The power spectrum of the cosmic microwave background radiation temperature anisotropy in terms of the angular scale (or multipole moment). The data shown come from the WMAP (2006), Acbar (2004) Boomerang (2005), CBI (2004), and VSA (2004) instruments. Also shown is a fit to the model (solid line). (Courtesy of NASA and Wikipedia.)
satellites in orbit at the Lagrange point L2 (some 1.5×106 km farther from the Sun in the Earth’s shadow) measure the temperature T (θ, φ) of the cosmic microwave background (CMB) radiation as a function of the polar angles θ and φ in the sky as shown in Fig. 8.1. This radiation is photons last scattered as the visible universe became transparent at an age of 360,000 years and a temperature cool enough (3,000 K) for hydrogen atoms to be stable. This initial tranparency is inexplicably called recombination. Since the spherical harmonics Y`m (θ, φ) are complete on the unit sphere, we can expand the temperature as T (θ, φ) =
∞ X ` X `=0 m=−`
a`,m Y`m (θ, φ)
(8.106)
8.12 Problems
in which the coefficients are by (8.101) Z 2π Z π a`,m = dφ sin θ dθ Y`m∗ (θ, φ) T (θ, φ). 0
331
(8.107)
0
The average temperature T contributes only to a0,0 = T = 2.725 K; the other coefficients describe the difference ∆T (θ, φ) = T (θ, φ) − T . The angular power spectrum is C` =
` X 1 |a`,m |2 . 2` + 1
(8.108)
m=−`
If we let the unit vector n ˆ point in the direction θ, φ and use the addition theorem (8.105), then we can write the angular power spectrum as Z Z 1 C` = d2 n ˆ d2 n ˆ 0 P` (ˆ n·n ˆ 0 ) T (ˆ n) T (ˆ n0 ). (8.109) 4π The measured values of the quantity `(` + 1) C` /2π are plotted in Fig. 8.2 for 8 < ` < 1600. They agree with an inflationary cosmological model (solid curve) with cold dark matter and a cosmological constant Λ. In this ΛCDM cosmology, the age of the visible universe is 13.77 Gyr; the Hubble constant is H0 = 70.4 km/sMpc; the total energy density of the universe is enough to make the universe flat as required by inflation; and the fractions of the energy density respectively due to baryons, dark matter, and dark energy are 4.55%, 22.8%, and 72.7%. 8.12 Problems 1. Suppose that Pn (x) and Qn (x) are two solutions of (8.22). Find an expression for their wronskian, apart from an overall constant. 2. Use the method of Secs. (6.21 & 6.28) and the solution f (r) = r` to find a second solution of the ODE (8.59). 3. For a uniformly charged circle of radius a, find the resulting scalar potential φ(r, θ) for r < a. 4. Imagine using the Gramm-Schmidt method (Sec. 1.10) to turn the functions fn (x) = xn for n = 0, 1, . . . ∞ into a set of functions zn (x) that are orthonormal on the interval [−1, 1] with unit weight function w(x) = 1. Here the inner product is Z 1 (g, h) = g(x)h(x)dx. (8.110) −1
(a) Find the first four z0 (x) . . . z3 (x). (b) Identify them.
9 Bessel Functions
9.1 Bessel Functions of the First Kind Friedrich Bessel (1784–1846) invented functions for problems with circular symmetry. The most useful ones are defined for any integer n by the series z2 z4 zn 1− + − ... Jn (z) = n 2 n! 2(2n + 2) 2 · 4(2n + 2)(2n + 4) (9.1) ∞ z n X z 2m (−1)m = 2 m! (m + n)! 2 m=0
the first term of which tells us that for small |z| 1 Jn (z) ≈
zn . 2n n!
(9.2)
The alternating signs make the waves plotted in Fig. 9.1, and we have for big |z| 1 the approximation (Courant and Hilbert, 1955, chap. VII) r 2 nπ π Jn (z) ≈ cos z − − + O(|z|−3/2 ). (9.3) πz 2 4 The Jn (z) are entire transcendental functions. They obey Bessel’s equation d2 Jn 1 dJn n2 + + 1 − 2 Jn = 0 dz 2 z dz z
(9.4)
(6.367) as one may show (problem 1) by substituting the series (9.1) into the differential equation (9.4). Their generating function is exp
hz 2
∞ i X (u − 1/u) = un Jn (z) n=−∞
(9.5)
333
9.1 Bessel Functions of the First Kind
1
0.5
0
−0.5 0
2
4
6
8
10
12
8
10
12
ρ 0.4 0.2 0 −0.2 −0.4 0
2
4
6
ρ
Figure 9.1 Top: Plots of J0 (ρ) (solid curve), J1 (ρ) (dot-dash), and J2 (ρ) (dashed) for real ρ. Bottom: Plots of J3 (ρ) (solid curve), J4 (ρ) (dot-dash), and J5 (ρ)(dashed). The points at which Bessel functions cross the ρ-axis are called zeros or roots; we use them to satisfy boundary conditions.
from which one may derive (problem 5) the series expansion (9.1) and the integral representation (5.42, problem 6) Z 1 π Jn (z) = cos(z sin θ − nθ) dθ = J−n (−z) = (−1)n J−n (z) (9.6) π 0 for all complex z. For n = 0, this integral is (problem 7) more simply 1 J0 (z) = 2π
Z
2π iz cos θ
e 0
1 dθ = 2π
Z
2π
eiz sin θ dθ.
(9.7)
0
These integrals (problem 8) imply that for n 6= 0, Jn (0) = 0, while J0 (0) = 1. By differentiating the generating function (9.5) with respect to u and identifying the coefficients of powers of u, one finds the recursion relation Jn−1 (z) + Jn+1 (z) =
2n Jn (z). z
(9.8)
334
Bessel Functions
Similar reasoning after taking the z derivative gives (problem 10) Jn−1 (z) − Jn+1 (z) = 2 Jn0 (z).
(9.9)
By using the gamma function (section 5.4), one may extend Bessel’s equation (9.4) and its solutions Jn (z) to non-integral values of n ∞ z ν X z 2m (−1)m Jν (z) = . (9.10) 2 m! Γ(m + ν + 1) 2 m=0
Letting z = a x, we arrive (problem 11) at the self-adjoint form (6.366) of Bessel’s equation d d n2 x Jn (ax) + Jn (ax) = a2 xJn (ax). − (9.11) dx dx x In the notation of equation (6.346), p(x) = x, a2 is an eigenvalue, and ρ(x) = x is a weight function. To have a self-adjoint system (section 6.26) on an interval [0, b], we need the boundary condition (6.300) b b 0 = p(Jn v 0 − Jn0 v) 0 = x(Jn v 0 − Jn0 v) 0 (9.12) for all functions v(x) in the domain D of the system. Since p(x) = x, J0 (0) = 1, and Jn (0) = 0 for integers n > 0, the terms in this boundary condition vanish at x = 0 as long as the domain consists of functions v(x) that are continuous on the interval [0, b]. To make these terms vanish at x = b, we require that Jn (ab) = 0 and that v(b) = 0. So ab must be a zero zn,m of Jn (z), that is Jn (ab) = Jn (zn,m ) = 0. With a = zn,m /b, Bessel’s equation (9.11) is 2 zn,m d d n2 − x Jn (zn,m x/b) + Jn (zn,m x/b) = 2 x Jn (zn,m x/b) . (9.13) dx dx x b 2 /b2 is different for each positive integer For fixed n, the eigenvalue a2 = zn,m m. Moreover as m → ∞, the zeros zn,m of Jn (x) rise as mπ as one might expect since the leading term of the asymptotic form (9.3) of Jn (x) is proportional to cos(x−nπ/2−π/4) which has zeros at mπ +(n+1)π/2+π/4. It follows that the eigenvalues a2 ≈ (mπ)2 /b2 increase without limit as m → ∞ in accordance with the general result of section 6.32. It follows then from the argument of section 6.33 and from relation (6.388) that for every fixed n, the infinite sequence of eigenfunctions Jn (zn,m x/b), one for each zero, are complete in the mean, orthogonal, and normalizable on the interval [0, b] with weight function ρ(x) = x Z b b2 x Jn (zn,m x/b) Jn (zn,m0 x/b) dx = δm,m0 Jn02 (zn,m ) (9.14) 2 0
9.1 Bessel Functions of the First Kind
335
and a normalization constant (problem 12) that depends upon the first derivative of the Bessel function at the zero. The analogous relation on an infinite interval is Z b 1 x Jn (kx) Jn (k 0 x) dx = δ(k − k 0 ). (9.15) k 0 One may generalize these relations (9.11–9.15) from integral n to real nonnegative ν (and to ν > −1/2). Example 9.1 (Bessel’s Drum) The top of a drum is a circular membrane with a fixed circumference. The membrane’s potential energy is approximately proportional to the extra area it has when it’s not flat. Let h(x, y) be the displacement of the membrane in the z direction normal to the x-y plane of the flat membrane, and let hx and hy denote its partial derivatives (6.20). The extra length of a line segment dx on the stretched membrane is p 1 + h2x dx, and so the extra area of an element dx dy is q 1 2 1 + h2x + h2y − 1 dx dy ≈ dA ≈ hx + h2y dx dy. (9.16) 2 The (non-relativistic) kinetic energy of the area element is proportional to the square of its speed. So if σ is the surface tension and µ the surface mass density of the membrane, then to lowest order in d the action functional is Z h i µ 2 σ 2 S[h] = ht − hx + h2y dx dy dt. (9.17) 2 2 We minimize this action for h’s that vanish on the boundary x2 + y 2 = rd2 Z 0 = δS[h] = [µ ht δht − σ (hx δhx + hy δhy )] dx dy dt. (9.18) As before (6.341), δht = (δh)t etc., and so integrating by parts, we get Z 0 = δS[h] = [ − µ htt + σ (hxx + hyy )] δh dx dy dt (9.19) apart from a surface term proportional to δh which vanishes because the boundary condition keeps δh = 0 on the circumference of the membrane. The membrane therefore obeys the wave equation µ htt = σ (hxx + hyy ) ≡ σ 4d.
(9.20)
This equation is separable, and so letting h(x, y, t) = s(t) v(x, y), we have σ 4v stt = = − ω2. s µ v
(9.21)
336
Bessel Functions
The eigenvalues of the Helmholtz equation − 4v = λ v (9.22) p give the angular frequencies as ω = σλ/µ. The time dependence then is p s(t) = a sin( σλ/µ (t − t0 )) (9.23) in which a and t0 are constants. In polar coordinates (11.261) Helmholtz’s equation is separable (6.32– 6.36) − 4v = − vrr − r−1 vr − r−2 vθθ = λv
(9.24)
so we set v(r, θ) = u(r)h(θ) and find − u00 h − r−1 u0 h − r−2 uh00 = λuh.
(9.25)
After multiplying both sides by r2 /uh, we get r2
u00 u0 h00 + r + λr2 = − = n2 . u u h
(9.26)
The general solution for h then is h(θ) = b sin(n(θ − θ0 ))
(9.27)
in which b and θ0 are constants and n must be an integer so that h is single valued on the circumference of the membrane. The function u is thus an eigenfunction of the self-adjoint differential equation (6.366) 0 n2 − r u0 + u = λru r
(9.28)
whose eigenvalues λ ≡ z 2 /rd2 are all positive. By changing variables to ρ = zr/rd and letting u(r) = Jn (ρ), we arrive at (problem 15) d2 Jn 1 dJn n2 + + 1 − 2 Jn = 0 (9.29) dρ2 ρ dρ ρ which is Bessel’s equation (6.367). The eigenvalues λ = z 2 /rd2 are determined by the boundary condition u(rd ) = Jn (z) = 0. For each integer n ≥ 0, there are an infinite number of zeroes zn,m at which the Bessel function vanishes, Jn (zn,m )p= 0. Thus 2 /r 2 and so the frequency is ω = (z λ = λn,m = zn,m σ/µ. The n,m /rd ) d
9.1 Bessel Functions of the First Kind
337
general solution to the wave equation (9.20) of the membrane then is r ∞ X ∞ X zn,m σ r (t − t0 ) sin [n(θ − θ0 )] Jn zn,m cn,m sin . h(r, θ, t) = rd µ rd n=0 m=1 (9.30) For any n, the zeros zn,m are the square-roots of the dimensionless eigenvalues (6.368) and rise like mπ as m → ∞. We learned in section 6.3 that in three dimensions Helmholtz’s equation − 4V = α2 V separates in cylindrical coordinates (and in several other coordinate systems). That is, the function V (ρ, φ, z) = B(ρ)Φ(φ)Z(z) satisfies the equation 1 1 − 4V = − (ρ V,ρ ),ρ + V,φφ + ρ V,zz = α2 V (9.31) ρ ρ if B(ρ) obeys Bessel’s equation d dB ρ ρ + (α2 + k 2 )ρ2 − n2 B = 0 dρ dρ
(9.32)
and Φ and Z respectively satisfy −
d2 Φ = n2 Φ(φ) dφ2
and
d2 Z = k 2 Z(z) dz 2
or if B(ρ) obeys the Bessel equation d dB ρ ρ + (α2 − k 2 )ρ2 − n2 B = 0 dρ dρ
(9.33)
(9.34)
and Φ and Z satisfy −
d2 Φ = n2 Φ(φ) dφ2
and
d2 Z = −k 2 Z(z). dz 2
(9.35)
In the first case (9.41 & 9.42), the solution V is p Vk,n (ρ, φ, z) = Jn ( α2 + k 2 ρ)e±inφ e±kz
(9.36)
while in the second case (9.41 & 9.42) it is p Vk,n (ρ, φ, z) = Jn ( α2 − k 2 ρ)e±inφ e±ikz .
(9.37)
In both cases, n must be an integer if the solution is to be single valued on the full range of φ from 0 to 2π.
338
Bessel Functions
When α = 0, the Helmholtz equation reduces to the Laplace equation 4V = 0 of electrostatics which the simpler functions Vk,n (ρ, φ, z) = Jn (kρ)e±inφ e±kz
and Vk,n (ρ, φ, z) = Jn (ikρ)e±inφ e±ikz (9.38) −ν satisfy. The product i Jν (ikρ) is real and is known as the modified Bessel function Iν (kρ) ≡ i−ν Jν (ikρ).
(9.39)
Modified Bessel functions occur in solutions of the diffusion equation 4V = α2 V . The function V (ρ, φ, z) = B(ρ)Φ(φ)Z(z) satisfies 1 1 4V = (ρ V,ρ ),ρ + V,φφ + ρ V,zz = α2 V (9.40) ρ ρ if B(ρ) obeys Bessel’s equation dB d ρ − (α2 − k 2 )ρ2 + n2 B = 0 ρ dρ dρ
(9.41)
and Φ and Z respectively satisfy −
d2 Φ = n2 Φ(φ) dφ2
and
d2 Z = k 2 Z(z) dz 2
or if B(ρ) obeys the Bessel equation d dB ρ ρ − (α2 + k 2 )ρ2 + n2 B = 0 dρ dρ
(9.42)
(9.43)
and Φ and Z satisfy −
d2 Φ = n2 Φ(φ) dφ2
and
d2 Z = −k 2 Z(z). dz 2
(9.44)
In the first case (9.41 & 9.42), the solution V is p Vk,n (ρ, φ, z) = In ( α2 − k 2 ρ)e±inφ e±kz
(9.45)
while in the second case (9.41 & 9.42) it is p Vk,n (ρ, φ, z) = In ( α2 + k 2 ρ)e±inφ e±ikz .
(9.46)
In both cases, n must be an integer if the solution is to be single valued on the full range of φ from 0 to 2π. Example 9.2 (Charge near a Membrane) We will use ρ to denote the density of free charges, that is, the charges that are free to move in or out of a dielectric medium—as opposed to those that are part of the medium and are
9.1 Bessel Functions of the First Kind
339
bound to it by molecular forces. The time-independent Maxwell equations are Gauss’s law ∇ · D = ρ for the divergence of the electric displacement D, and the static form ∇ × E = 0 of Faraday’s law which implies that the electric field E is the gradient of an electrostatic potential E = −∇V . Across an interface between two dielectrics with normal vector n ˆ , the tangential component n ˆ × E of the electric field is continuous n ˆ × E2 = n ˆ × E1
(9.47)
while the normal component of the electric displacement jumps by the surface density σ of free charge n ˆ · (D2 − D1 ) = σ.
(9.48)
In a linear dielectric, the electric displacement D is proportional to the electric field D = E, and the coefficient is the permittivity of the material. The membrane of a eukaryotic cell is a lipid bilayer whose area is some 3×108 nm2 , and whose thickness t is about 5 nm. On a scale of nanometers, the membrane is flat. We will take it to be a plane extending to infinity in the x and y directions. If the interface between the lipid bilayer and the extracellular salty water is at z = 0, then the cytosol extends down from z = −t = −5 nm. The permittivity ` of the lipid bilayer is about twice that of the vacuum ` ≈ 20 ; those of the extracellular water and of the cytosol roughly are w ≈ c ≈ 800 . We will compute the electrostatic potential V due to a charge q at a point (0, 0, h) on the z-axis above the membrane. This potential is cylindrically symmetric about the z-axis, so V = V (ρ, z). The functions Jn (kρ) einφ e±kz form a complete set of solutions of Laplace’s equation, but due to the symmetry, we only need the n = 0 functions J0 (kρ) e±kz . Since there are no free charges in the lipid bilayer or in the cytosol, we may express the potential in the lipid bilayer V` and in the cytosol Vc as Z ∞ h i V` (ρ, z) = dk J0 (kρ) m(k) ekz + f (k) e−kz Z0 ∞ (9.49) kz Vc (ρ, z) = dk J0 (kρ) d(k) e . 0
The Green’s function (3.148) for Poisson’s equation −4G(x) = δ (3) (x) in cylindrical coordinates is (5.163) Z ∞ 1 1 dk G(x) = = p = J0 (kρ) e−k|z| . (9.50) 4π|x| 4π 4π ρ2 + z 2 0
340
Bessel Functions
Thus we may expand the potential in the salty water as Z ∞ q −k|z−h| −kz dk J0 (kρ) e + u(k) e . Vw (ρ, z) = 4πw 0
(9.51)
Imposing the constraints (9.47 & 9.48), suppressing the dependence upon k, and setting β ≡ qe−kh /4π0 w and y = e2kt , we get the four equations m+f −u=β ` m − ` yf − c d = 0
and ` m − ` f + w u = w β and m + yf − d = 0.
(9.52)
In terms of the abbreviations w` = (w + ` ) /2 and c` = (c + ` ) /2 as well as p = (w − ` )/(w + ` ) and p0 = (c − ` )/(c + ` ), the solutions are p − p0 /y w 1 and m(k) = β 0 1 − pp /y w` 1 − pp0 /y 0 w p /y w ` 1 and d(k) = β . f (k) = − β 0 w` 1 − pp /y w` c` 1 − pp0 /y u(k) = β
(9.53)
Inserting these solutions into the Bessel expansions (9.49) for the potentials, expanding their denominators ∞
X 1 = (pp0 )n e−2nkt 1 − pp0 /y
(9.54)
0
and using the integral (9.50), we find that the potential Vw in the extracellular water of a charge q at (0, 0, h) in the water is " # ∞ X p0 1 − p2 (pp0 )n−1 q 1 p p Vw (ρ, z) = +p − 4πw r ρ2 + (z + h)2 n=1 ρ2 + (z + 2nt + h)2 (9.55) p in which r = ρ2 + (z − h)2 is the distance to the charge q. The principal image charge pq is at (0, 0, −h). Similarly, the potential V` in the lipid bilayer is " # ∞ (pp0 )n pn p0n+1 q X p V` (ρ, z) = −p 4πw` ρ2 + (z − 2nt − h)2 ρ2 + (z + 2(n + 1)t + h)2 n=0 (9.56) and that in the cytosol is ∞
Vc (ρ, z) =
q ` X (pp0 )n p . 4πw` c` ρ2 + (z − 2nt − h)2
(9.57)
n=0
These potentials are the same as those of example 4.17, but this derivation is much simpler and less error prone than the method of images.
341
9.1 Bessel Functions of the First Kind
Since p = (w − ` )/(w + ` ) > 0, the principal image charge pq at (0, 0, − h) has the same sign as the charge q and so contributes a positive term proportional to pq 2 to the energy. So a lipid membrane repels a nearby charge in water no matter what the sign of the charge. In a mean-field theory, all of the particles move under the influence of a common potential V (x), and so the force qE(x) = −q∇V (x), being proportional to the charge q, must be opposite for charges of opposite signs. Thus no mean-field theory can describe how a lipid membrane repels both cations and anions that are nearby in water. Example 9.3 (Cylindrical Wave Guides) An electromagnetic wave traveling in the z-direction down a cylindrical wave guide looks like E ei(kz−ωt)
and B ei(kz−ωt)
(9.58)
in which E and B depend upon ρ and φ ˆ + Ez zˆ and B = Bρ ρˆ + Bφ φ ˆ + Bz zˆ E = Eρ ρˆ + Eφ φ
(9.59)
in cylindrical coordinates (11.170–11.175 & 11.216). If the wave guide is an evacuated, perfectly conducting cylinder of radius r, then on the surface of the wave guide the parallel components of E and the normal component of B must vanish which leads to the boundary conditions Ez (r, φ) = 0,
Eφ (r, φ) = 0,
and Bρ (r, φ) = 0.
(9.60)
Since the E and B fields have subscripts, we will use commas to denote derivatives as in ∂Ez /∂φ ≡ Ez,φ and ∂(ρEφ )/∂ρ ≡ (ρEφ ),ρ and so forth. In ˙ 2 of the this notation, the vacuum forms ∇ × E = − B˙ and ∇ × B = E/c Faraday and Maxwell-Amp`ere laws give us (problem 14) the field equations Bz,φ /ρ − ikBφ = −iωEρ /c2
Ez,φ /ρ − ikEφ = iωBρ
ikBρ − Bz,ρ = −iωEφ /c2
ikEρ − Ez,ρ = iωBφ
(9.61) 2
[(ρEφ ),ρ − Eρ,φ ] /ρ = iωBz
[(ρBφ ),ρ − Bρ,φ ] /ρ = −iωEz /c .
Solving them for the ρ and φ components of E and B in terms of their z components (problem 15), we find kEz,ρ + ωBz,φ /ρ k 2 − ω 2 /c2 kBz,ρ − ωEz,φ /c2 ρ Bρ = − i k 2 − ω 2 /c2
kEz,φ /ρ − ωBz,ρ k 2 − ω 2 /c2 kBz,φ /ρ + ωEz,ρ /c2 Bφ = − i . k 2 − ω 2 /c2
Eρ = − i
Eφ = − i
(9.62)
The fields Ez and Bz obey the separable wave equations (11.83) ¨z /c2 = ω 2 Ez /c2 −4Ez = −E
and
¨z /c2 = ω 2 Bz /c2 . (9.63) − 4Bz = −B
342
Bessel Functions
Because their z-dependence (9.58) is periodic, they are (problem 16) linear p 2 2 combinations of Jn ( ω /c − k 2 ρ)einφ ei(kz−ωt) . Modes with Bz = 0 are transverse magnetic orpTM modes. For them the boundary conditions (9.60) will be satisfied if ω 2 /c2 − k 2 r is a zero zn,m of Jn . So the frequency ωn,m (k) of the n, m TM mode is q 2 /r 2 . ωn,m (k) = c k 2 + zn,m (9.64) Since first zero of a Bessel function is z0,1 ≈ 2.4048, the minimum frequency ω0,1 (0) = c z0,1 /r ≈ 2.4048 c/r occurs for n = 0 and k = 0. If the radius of the wave-guide is r = 1 cm, then ω0,1 (0)/2π is about 11 GHz, which is a microwave frequency with a wavelength of 2.6 cm. In terms of the frequencies (9.64), the field of a pulse moving in the +z-direction is ∞ X ∞ Z ∞ z X n,m ρ Ez (ρ, φ, z, t) = cn,m (k) Jn einφ exp i [kz − ωn,m (k)t] dk. r n=0 m=1 0 (9.65) Modes with Ez = 0 are transverse electric or TE modes.pFor them the boundary conditions (9.60) will be satisfied (problem 18) ifq ω 2 /c2 − k 2 r 0 02 /r 2 . is a zero zn,m of Jn0 . Their frequencies are ωn,m (k) = c k 2 + zn,m
0 ≈ 1.8412, the Since first zero of a first derivative of a Bessel function is z1,1 0 minimum frequency ω1,1 (0) = c z1,1 /r ≈ 1.8412 c/r occurs for n = 1 and k = 0. If the radius of the wave-guide is r = 1 cm, then ω1,1 (0)/2π is about 8.8 GHz, which is a microwave frequency with a wavelength of 3.4 cm.
Example 9.4 (Cylindrical Cavity) The modes of an evacuated, perfectly conducting cylindrical cavity of radius r and height h are like those of a cylindrical wave guide (example 9.3) but with added boundary conditions Bz (ρ, φ, 0, t) = 0
and Bz (ρ, φ, h, t) = 0
Eρ (ρ, φ, 0, t) = 0
and Eρ (ρ, φ, h, t) = 0
Eφ (ρ, φ, 0, t) = 0
and Eφ (ρ, φ, h, t) = 0 (9.66) p at the two ends of the cylinder. If ` is an integer and if ω 2 /c2 − π 2 `2 /h2 r 0 is a zero zn,m of Jn0 , then the TE fields Ez = 0 and 0 Bz = Jn (zn,m ρ/r) einφ sin(π`z/h) e−iωt
(9.67)
satisfy both these (9.66) boundary conditions at z = 0 and h and those (9.60) at ρ = r as well as the separable wave equations (9.63). The frequencies of q the resonant TE modes then are ωn,m,` = c
02 /r 2 + π 2 `2 /h2 . zn,m
343
9.2 Spherical Bessel Functions
The TM modes are Bz = 0 and Ez = Jn (zn,m ρ/r) einφ cos(π`z/h) e−iωt q 2 /r 2 + π 2 `2 /h2 . with resonant frequencies ωn,m,` = c zn,m
(9.68)
9.2 Spherical Bessel Functions Helmholtz’s equation separates in spherical coordinates (11.266 h i 1 − 4V = − 2 2 sin2 θ r2 V,r ,r + sin θ (sin θV,θ ),θ + V,φφ = k 2 V. r sin θ (9.69) The product V (r, θ, φ) = R(kr) Θ(θ) Φ(φ) is a solution if the radial function R satisfies Bessel’s equation for spherical coordinates − r2 R00 − 2rR0 + `(` + 1)R = k 2 r2 R if Θ obeys the associated Legendre equation 0 − sin θ sin θ Θ0 + m2 Θ = `(` + 1) sin2 θ Θ
(9.70)
(9.71)
and if Φ satisfies − Φ00 = m2 Φ with −` ≤ m ≤ `. The product Θ Φ is the spherical harmonic Y`m (θ, φ) of section 8.11. √ Letting x = kr and R(kr) = R(x) = B(x)/ x, we see that B must satisfy the equation 0 (` + 1/2)2 B = xB (9.72) − xB 0 + x which is Bessel’s equation (9.4) of index n = `+1/2. Thus B(x) = J`+1/2 (x) √ and R` (kr) = J`+1/2 (kr)/ kr ≡ j` (kr). The solution V = B Θ Φ to Helmholtz’s equation (9.69) is then V (r, θ, φ) = j` (kr) Y`m (θ, φ).
(9.73)
The square root in j` cancels one in J`+1/2 . With a cosmetic factor of p π/2, Rayleigh’s formula gives the spherical Bessel function r π j` (x) ≡ J (x) (9.74) 2x `+1/2 as the `th derivative of sin x/x `
j` (x) = (−1) x
`
1 d x dx
`
sin x x
(9.75)
(Lord Rayleigh (John William Strutt) 1842–1919). In particular, j0 (x) =
344
Bessel Functions
sin x/x and j1 (x) = sin x/x2 − cos x/x. Rayleigh’s formula leads to the recursion relation (problem 21) ` j` (x) − j`0 (x) (9.76) x with which one can show (problem 22) that the spherical Bessel functions as defined by Rayleigh’s formula do satisfy their differential equation (9.70) with x = kr. The spherical Bessel functions j` (kr) satisfy the self-adjoint Sturm-Liouville (6.395) equation (9.70) j`+1 (x) =
− r2 j`00 − 2rj`0 + `(` + 1)j` = k 2 r2 j`
(9.77)
with eigenvalue k 2 and weight function ρ = r2 . If j` (z`,n ) = 0, then the functions j` (kr) = j` (z`,n r/a) vanish at r = a and form an orthogonal basis Z 0
a
j` (z`,n r/a) j` (z`,m r/a) r2 dr =
a3 2 j (z`,n ) δn,m 2 `+1
(9.78)
for a self-adjoint system on the interval [0, a]. Moreover, since the eigenvalues 2 = z 2 /a2 ≈ (nπ)2 /a2 → ∞ as n → ∞, the eigenfunctions j (z r/a) k`,n ` `,n `,n also are complete in the mean. On an infinite interval, the analogous relation is Z ∞ π j` (kr) j` (k 0 r) r2 dr = 2 δ(k − k 0 ). (9.79) 2k 0 If we write the spherical Bessel function j0 (x) as the integral Z sin z 1 1 izx j0 (z) = = e dx z 2 −1
(9.80)
and use Rayleigh’s formula (9.75), we may find an integral for j` (z) Z 1 d ` sin z 1 d ` 1 1 izx ` ` ` ` j` (z) = (−1) z = (−1) z e dx z dz z z dz 2 −1 Z Z z ` 1 (1 − x2 )` izx (−i)` 1 (1 − x2 )` d` izx = e dx = e dx 2 −1 2` `! 2 2` `! dx` −1 Z Z (−i)` 1 izx d` (x2 − 1)` (−i)` 1 e dx = P` (x) eizx dx (9.81) = 2 2 dx` 2` `! −1 −1 (problem 23) that contains Rodrigues’s formula (8.67) for the Legendre polynomial P` (x). With z = kr and x = cos θ, this formula Z 1 1 i` j` (kr) = P` (cos θ)eikr cos θ d cos θ (9.82) 2 −1
345
9.2 Spherical Bessel Functions
0.5
j 1 (ρ) 0
−0.5 0
2
4
6
8
10
12
8
10
12
ρ 0.5
j 2 (ρ) 0
−0.5 0
2
4
6
ρ
Figure 9.2 Top: Plot of j1 (ρ) (solid curve) and its approximations ρ/3 for small ρ (9.85, dot-dash) and sin(ρ − π/2)/ρ for big ρ (9.87, dashed). Bottom: Plot of j2 (ρ) (solid curve) and its approximations ρ2 /15 for small ρ (9.85, dot-dash) and sin(ρ − π)/ρ for big ρ (9.87, dashed). The values of ρ at which j` (ρ) = 0 are the zeros or roots of j` ; we use them to fit boundary conditions.
and the Fourier-Legendre expansion (8.47) gives Z 1 ∞ X 2` + 1 0 ikr cos θ e = P` (cos θ) P` (cos θ0 ) eikr cos θ d cos θ0 2 −1 `=0
∞ X = (2` + 1) P` (cos θ) i` j` (kr).
(9.83)
`=0
θ0 , φ0
If θ, φ and are the polar angles of the vectors r and k, then by using the addition theorem (8.105) we get eik·r =
∞ X
4π i` j` (kr) Y`m (θ, φ) Y`m∗ (θ0 , φ0 ).
(9.84)
`=0
The series expansion (9.1) for Jn and the definition (9.74) of j` give us for
346
Bessel Functions
small |ρ| 1 the approximation j` (ρ) ≈
(ρ)` ` ! (2ρ)` = . (2` + 1)! (2` + 1)!!
(9.85)
To see how j` (ρ) behaves for large |ρ| 1, we use Rayleigh’s formula (9.75) to compute j1 (ρ) and notice that the derivative d/dρ d sin ρ cos ρ sin ρ j1 (ρ) = − = − + 2 (9.86) dρ ρ ρ ρ adds a factor of 1/ρ when it acts on 1/ρ but not when it acts on sin ρ. Thus the dominant term is the one in which all the derivatives act on the sine, and so for large |ρ| 1, we have approximately 1 d` sin ρ 1 π j` (ρ) ≈ (−1)` = sin ρ − ` (9.87) ρ dρ` ρ 2 with an error that falls off as 1/ρ2 . The quality of the approximation, which is exact for ` = 0, is illustrated for ` = 1 and 2 in Fig. 9.2. Example 9.5 (Partial Waves) Spherical Bessel functions occur in the wave-functions of free particles with well defined angular momentum. The hamiltonian H0 = p2 /2m for a free particle of mass m and the square L2 of the orbital angular momentum operator are both invariant under rotations; thus they commute with the orbital angular momentum operator L. Since the operators H0 , L2 , and Lz commute with each other, simultaneous eigenstates |k, `, mi of these compatible operators (section 1.34) exist p2 (~ k)2 |k, `, mi = |k, `, mi 2m 2m L2 |k, `, mi = ~2 `(` + 1) |k, `, mi
H0 |k, `, mi =
(9.88)
Lz |k, `, mi = ~ m |k, `, mi. Their wave-functions are products of spherical Bessel functions and spherical harmonics (8.94) r 2 hr|k, `, mi = hr, θ, φ|k, `, mi = k j` (kr) Y`m (θ, φ). (9.89) π They satisfy the normalization condition Z Z 2kk 0 ∞ 0 hk, `, m|k 0 , `0 , m0 i = j` (kr)j` (k 0 r) r2 dr Y`m∗ (θ, φ)Y`m 0 (θ, φ) dΩ π 0 = δ(k − k 0 ) δ`,`0 δm,m0 (9.90)
9.2 Spherical Bessel Functions
and the completeness relation Z ∞ X ∞ X ` 1= dk |k, `, mihk, `, m|. 0
347
(9.91)
`=0 m=−`
Their inner products with an eigenstate |k0 i of a free particle of momentum p0 = ~k0 are hk, `, m|k0 i =
i` δ(k − k 0 ) Y`m∗ (θ0 , φ0 ) k
(9.92)
in which the polar coordinates of k0 are θ0 , φ0 . Using the resolution (9.91) of the identity operator and the inner-product formulas (9.89 & 9.92), we recover the expansion (9.84) Z ∞ X 0 ∞ X ` eik ·r 0 dk = hr|k i = hr|k, `, mihk, `, m|k0 i (2π)3/2 0 `=0 m=−` ∞ r X 2 ` = i j` (kr) Y`m (θ, φ) Y`m∗ (θ0 , φ0 ). π
(9.93)
`=0
The small kr approximation (9.85) and the definition (9.89) tell us that the probability that a particle with angular momentum ~` about the origin has r = |r| 1/k is Z Z r 2k 2 r 2 2 (4` + 6)(kr)2`+3 2 2`+2 P (r) = j` (kr)r dr ≈ (kr) dr = π 0 π((2` + 1)!!)2 0 π((2` + 3)!!)2 k (9.94) which is very small for big ` and tiny k. So a short-range potential can only affect partial waves of low angular momentum. When physicists found that nuclei scattered low-energy hadrons into s-waves, they knew that the range of the nuclear force was short, about 10−15 m. If the potential V (r) that scatters a particle is of short range, then at big r the radial wave-function u` (r) of the scattered wave should look like that of a free particle (9.93) which by the big kr approximation (9.87) is i sin(kr − `π/2) 1 h i(kr−`π/2) (0) u` (r) = j` (kr) ≈ = e − e−i(kr−`π/2) . kr 2ikr (9.95) (0) Thus at big r the radial wave-function u` (r) differs from u` (r) only by a phase shift δ` i sin(kr − `π/2 + δ` ) 1 h i(kr−`π/2+δ` ) u` (r) ≈ = e − e−i(kr−`π/2+δ` ) . kr 2ikr (9.96)
348
Bessel Functions
The phase shifts determine the cross-section σ to be (Cohen-Tannoudji et al., 1977, chap. VIII) ∞ 4π X σ= 2 (2` + 1) sin2 δ` . k
(9.97)
`=0
If the potential V (r) is negligible for r > r0 , then for momenta k 1/r0 the cross-section is σ ≈ 4π sin2 δ0 /k 2 . Example 9.6 (Quantum Dots) The active region of some quantum dots is a CdSe sphere whose radius a is less than 2 nm. Photons from a laser excite electron-hole pairs which fluoresce in nanoseconds. I will model a quantum dot simply as an electron trapped in a sphere of radius a. Its wave-function ψ(r, θ, φ) satisfies Schr¨odinger’s equation ~2 4ψ = Eψ (9.98) 2m 2 /a2 , with the boundary condition ψ(a, θ, φ) = 0. With k 2 = 2mE/~2 = z`,n the unnormalized eigenfunctions are −
ψn,`,m (r, θ, φ) = j` (z`,n r/a) Y`m (θ, φ) θ(a − r)
(9.99)
in which the Heaviside function θ(a − r) makes ψ vanish for r > a, and ` and m are integers with −` ≤ m ≤ ` because ψ must be single valued for all angles θ and φ. The zeros z`,n of j` (x) fix the energy levels as En,`,m = (~z`,n /a)2 /2m. Since z0,n = nπ, the ` = 0 levels are En,0,0 = (~nπ/a)2 /2m. If the coupling to a photon is via a term like p · A, then one expects ∆` = 1. The energy gap from the n, ` = 1 state to the n = 1, ` = 0 ground state thus is ~2 . (9.100) 2ma2 Inserting factors of c2 and using ~c = 197 eV nm, and mc2 = 0.511 MeV, we find from the zero z1,2 = 7.72525 that ∆E2 = 1.89 (nm/a)2 eV, which is red light if a = 1 nm. The next zero z1,3 = 10.90412 gives ∆E3 = 4.14 (nm/a)2 eV, which is in the visible if 1.2 < a < 1.5 nm. The Mathematica command solve SphericalBesselJ[1,x]==0, 0 < x < 20 gives the first five zeros of j1 (x). 2 ∆En = En,1,0 − E1,0,0 = (z1,n − π2)
9.3 Bessel Functions of the Second Kind In section 7.5 we derived integral representations (7.56 & 7.57) for the Hankel (1) (2) functions Hλ (z) and Hλ (z) for Rez > 0. One may analytically continue
9.3 Bessel Functions of the Second Kind
them (Courant and Hilbert, 1955, chap. VII) to the upper Z 1 −iλ/2 ∞ iz cosh x−λx (1) e dx Imz ≥ 0 Hλ (z) = e πi −∞
349
(9.101)
and lower (2)
Hλ (z) = −
1 +iλ/2 e πi
Z
∞
e−iz cosh x−λx dx
Imz ≤ 0.
(9.102)
−∞
half z-planes. When both z = ρ and λ = ν are real, the two Hankel functions are complex conjugates of each other Hν(1) (ρ) = Hν(2)∗ (ρ)
(9.103)
and their real and imaginary parts are Bessel functions of the first Jν (ρ) and second Yν (ρ) kind Hν(1) (ρ) = Jν (ρ) + iYν (ρ) Hν(2) (ρ) = Jν (ρ) − iYν (ρ).
(9.104)
Bessel functions of the second kind are also called Neumann functions; the symbols Yν (ρ) = Nν (ρ) refer to the same function. They are infinite at ρ = 0 as illustrated in Fig. 9.3. When do we need to use these functions? If we are representing functions that are finite at the origin ρ = 0, then we don’t need them. But if the point ρ = 0 lies outside the region of interest or if the function we are representing is infinite at that point, then we do need the Yν (ρ)’s. Example 9.7 (Coaxial Wave Guides) An ideal coaxial wave guide is perfectly conducting for ρ < r0 and ρ > r, and the waves occupy the region r0 < ρ < r. Since points with ρ = 0 are not in the physical domain of the problem, the electric field E(ρ, φ) exp(i(kz − ωt)) is a linear combination of Bessel functions of the first and second kinds with h i p p Ez (ρ, φ) = a Jn ( ω 2 /c2 − k 2 ρ) + b Yn ( ω 2 /c2 − k 2 ρ) (9.105) in the notation of example 9.3. A similar equation represents the magnetic field Bz . The fields E and B obey the equations and boundary conditions of example 9.3 as well as Ez (r0 , φ) = 0,
Eφ (r0 , φ) = 0,
and Bρ (r0 , φ) = 0
(9.106)
at ρ = r0 . In TM modes with Bz = 0, one may show (problem 25) that the boundary conditions Ez (r0 , φ) = 0 and Ez (r, φ) = 0 can be satisfied if Jn (x) Yn (vx) − Jn (vx) Yn (x) = 0
(9.107)
350
Bessel Functions
0.5
0
−0.5
−1 0
2
4
6
8
10
12
8
10
12
ρ 0.5
0
−0.5
−1 0
2
4
6
Figure 9.3 Top: Plots of Y0 (ρ) (solid curve), Y1 (ρ) (dot-dash), and Y2 (ρ) (dashed) for real ρ. Bottom: Plots of Y3 (ρ) (solid curve), Y4 (ρ) (dot-dash), and Y5 (ρ)(dashed). The points at which Bessel functions cross the ρ-axis are called zeros or roots; we use them to satisfy boundary conditions.
in which v = r/r0 and x =
p ω 2 /c2 − k 2 r0 . One can use the Matlab code
n = 0.; v = 10.; f=@(x)besselj(n,x).*bessely(n,v*x)-besselj(n,v*x).*bessely(n,x) x=linspace(0,5,1000); figure plot(x,f(x)) % we use the figure to guess at the roots set (gca,’xgrid’,’on’,’ygrid’,’on’) options=optimset(’tolx’,1e-9); fzero(f,0.3) % we tell fzero to look near 0.3 fzero(f,0.7) fzero(f,1) to find that for n = 0 and v = 10, the first three solutions are x0,1 = 0.3314, x0,2 = 0.6858, and x0,3 = 1.0377. Setting n = 1 and adjusting the guesses
9.4 Spherical Bessel Functions of the Second Kind
351
in the code, one finds x1,1 = 0.3941, x1,2 = 0.7331, q and x1,3 = 1.0748. The corresponding dispersion relations are ωn,i (k) = c k 2 + x2n,i /r02 . 9.4 Spherical Bessel Functions of the Second Kind Spherical Bessel functions of the second kind are defined as r π y` (ρ) = Y (ρ) 2ρ `+1/2 and Rayleigh formulas express them as d ` cos ρ `+1 ` . y` (ρ) = (−1) ρ ρ dρ ρ
(9.108)
(9.109)
The term in which all the derivatives act on the cosine dominates at big ρ y` (ρ) ≈ (−1)`+1
1 d` cos ρ = − cos (ρ − `π/2) /ρ. ρ dρ`
(9.110)
The second kind of spherical Bessel functions at small ρ are approximately y` (ρ) ≈ − (2` − 1)!!/ρ`+1 .
(9.111)
They all are infinite at x = 0 as illustrated in Fig. 9.4. Example 9.8 (Scattering off a Hard Sphere) In the notation of example 9.5, the potential of a hard sphere of radius r0 is V (r) = ∞ θ(r0 − r) in which θ(x) = (x + |x|)/2|x| is Heaviside’s function. Since the point r = 0 is not in the physical region, the scattered wave function is a linear combination of spherical Bessel functions of the first and second kinds u` (r) = c` j` (kr) + d` y` (kr).
(9.112)
The boundary condition u` (kr0 ) = 0 fixes the ratio v` = d` /c` of the constants c` and d` . Thus for ` = 0, Rayleigh’s formulas (9.75 & 9.109) and the boundary condition say that kr0 u0 (r0 ) = c0 sin(kr0 )−d0 cos(kr0 ) = 0 or d0 /c0 = tan kr0 . The s-wave then is u0 (kr) = c0 sin(kr − kr0 )/(kr cos kr0 ), which tells us that the phase shift is δ0 (k) = − kr0 . By (9.97), the crosssection at low energy is σ ≈ 4πr02 , or four times the classical value. Similarly, one finds (problem 26) that the p-wave phase shift is δ1 (k) =
kr0 cos kr0 − sin kr0 . cos kr0 + kr0 sin kr0
(9.113)
For kr0 1, we have δ1 (k) ≈ −(kr0 )3 /6; more generally the `th phase shift δ` (k) ≈ (kr0 )2`+1 for a potential of range r0 at low energy k 1/r0 .
352
Bessel Functions
0.2 0 −0.2
y 1 (ρ) −0.4 0
2
4
6
8
10
12
8
10
12
ρ 0.2 0 −0.2
y 2 (ρ) −0.4 0
2
4
6
ρ
Figure 9.4 Top: Plot of y1 (ρ) (solid curve) and its approximations −1/ρ2 for small ρ (9.111, dot-dash) and − cos(ρ−π/2)/ρ for big ρ (9.110, dashed). Bottom: Plot of y2 (ρ) (solid curve) and its approximations −3/ρ3 for small ρ (9.111, dot-dash) and − cos(ρ−π)/ρ for big ρ (9.110, dashed). The values of ρ at which y` (ρ) = 0 are the zeros or roots of y` ; we use them to fit boundary conditions.
9.5 Problems 1. Show that the series (9.1) for Jn (ρ) satisfies Bessel’s equation (9.4). 2. Show that the generating function exp(z(u − 1/u)/2) for the Bessel functions is invariant under the substitution u → −1/u. 3. Use the invariance of exp(z(u − 1/u)/2) under u → −1/u to show that J−n (z) = (−1)n Jn (z). 4. By writing the generating function (9.5) as the product of the exponentials exp(zu/2) and exp(−z/2u), derive the expansion exp
hz 2
u − u−1
i
=
∞ X ∞ X z m+n um+n z n u−n − . (9.114) 2 (m + n)! 2 n! m=−n n=0
5. From this expansion (9.114) of the generating function (9.5), derive the power-series expansion (9.1) for Jn (z).
353
9.5 Problems
6. In the formula (9.5) for the generating function exp(z(u−1/u)/2), replace u by exp iθ and then derive the integral representation (9.6) for Jn (z). 7. From the general integral representation (9.6) for Jn (z), derive the two integral formulas (9.7) for J0 (z). 8. Show that the integral representations (9.6 & 9.7) imply that for any integer n 6= 0, Jn (0) = 0, while J0 (0) = 1. 9. By differentiating the generating function (9.5) with respect to u and identifying the coefficients of powers of u, derive the recursion relation Jn−1 (z) + Jn+1 (z) =
2n Jn (z). z
(9.115)
10. By differentiating the generating function (9.5) with respect to z and identifying the coefficients of powers of u, derive the recursion relation Jn−1 (z) − Jn+1 (z) = 2 Jn0 (z).
(9.116)
11. Show that the change of variables ρ = ax turns (9.4) into the self-adjoint form of Bessel’s equation (9.11). 12. Letting y = Jn (ax), equation (9.11) is (xy 0 )0 +(xa2 −n2 /x)y = 0. Multiply this equation by xy 0 , integrate from 0 to b, and so show that if ab = zn,m and Jn (zn,m ) = 0, then Z b 2 x Jn2 (ax) dx = b2 Jn02 (zn,m ) (9.117) 0
13. 14.
15. 16. 17. 18. 19.
which is the normalization condition (9.14). Show that with λ ≡ z 2 /rd2 , the change of variables ρ = zr/rd and u(r) = Jn (ρ) turns (9.28) into (9.29). Use the formula (11.216) for the curl in cylindrical coordinates and the ˙ 2 of the laws of Faraday vacuum forms ∇ × E = − B˙ and ∇ × B = E/c and Maxwell-Amp`ere to derive the field equations (9.61). Derive equations (9.62) from (9.61). p Show that Jn ( ω 2 /c2 − k 2 ρ)einφ ei(kz−ωt) is a traveling-wave solution (9.58) of the wave equations (9.63). Find expressions for the non-zero TM fields in terms of the formula (9.65) for Ez . p Show that the field Bz = Jn ( pω 2 /c2 − k 2 ρ)einφ ei(kz−ωt) will satisfy the 0 boundary conditions (9.60) if ω 2 /c2 − k 2 r is a zero zn,m of Jn0 . p 0 Show that if ` is an integer and if ω 2 /c2 − π 2 `2 /h2 r is a zero zn,m of 0 0 inφ −iωt Jn , then the fields Ez = 0 and Bz = Jn (zn,m ρ/r) e sin(`πz/h) e satisfy both the boundary conditions (9.60) at ρ = r and those (9.66) at
354
Bessel Functions
z = 0 and h as well as the wave equations (9.63). Hint: Use Maxwell’s ˙ 2 as in (9.61). equations ∇ × E = − B˙ and ∇ × B = E/c 20. Show that the resonant frequencies of the TM modes of the cavity of q 2 2 example 9.4 are ωn,m,` = c zn,m /r + π 2 `2 /h2 . 21. Show that Rayleigh’s formula (9.75) implies the recursion relation (9.76). 22. Use the recursion relation (9.76) to show by induction that the spherical Bessel functions j` (x) as given by Rayleigh’s formula (9.75) satisfy their differential equation (9.70), which with x = kr is − x2 j`00 − 2xj`0 + `(` + 1)j` = x2 j` .
(9.118)
Hint: start by showing that j0 (x) = sin(x)/x satisfies (9.70). This problem involves some tedium. 23. Iterate the trick Z 1 Z Z 1 i 1 izx i d eizx dx = xe dx = eizx d(x2 − 1) zdz −1 z −1 2z −1 (9.119) Z 1 Z i 1 1 2 2 izx izx =− (x − 1)de = (x − 1)e dx 2z −1 2 −1 to show that (Schwinger et al., 1998, p. 227) Z Z 1 2 d ` 1 izx (x − 1)` izx e dx = e dx. zdz 2` `! −1 −1
(9.120)
24. Show that (−1)` d` sin ρ/dρ` = sin(ρ − π`/2) and so complete the derivation of the approximation (9.87) for j` (ρ) for big ρ. 25. In the context of examples 9.3 and 9.7, show that the boundary conditions Ez (r0 , φ) = 0 and Ez (r, φ) = 0 imply (9.107). 26. Show that for scattering off a hard sphere of radius r0 as in example 9.8, the p-wave phase shift is given by (9.113).
10 Group Theory
Why study group theory? One reason is that all of the fundamental interactions of physics arise from requirements of symmetry under groups of transformations. Invariance of the action under the phase transformations of the group of unimodular complex numbers, exp(iθ(x)), in which the phase θ(x) depends upon the space-time point x, leads to electromagnetism. Invariance under the group of three-dimensional unitary matrices g(x) of unit determinant leads to chromodynamics. Invariance under the group of general coordinate transformations leads to gravity. Howard Georgi has written the best book (Georgi, 1999) ever on group theory for physicists; much of this chapter is drawn from his excellent text. 10.1 What Is a Group? A group G is a set of objects f , g, h, . . . and an operation called multiplication such that: 1. 2. 3. 4.
If f ∈ G and g ∈ G, the product f g ∈ G (closure). If f , g, & h are in G, then f (gh) = (f g)h (associativity). There is an identity e ∈ G such that if g ∈ G, then ge = eg = g. Every g ∈ G has an inverse g −1 ∈ G such that gg −1 = g −1 g = e.
Physical transformations naturally form groups. The product T 0 T represents the successive transformations: first T , then T 0 . The identity element e is the null transformation, the one that does nothing. The inverse T −1 is the transformation that reverses the effect of T . Such a set {T } of transformations will form a group if any two successive transformations is a transformation in the set (closure). If membership in the set depends upon whether a transformation leaves something invariant, then multiplication
356
Group Theory
will be closed. For if both T and T 0 leave that thing unchanged, then so will their product T 0 T . Example: The set of all transformations that leave invariant the distance from the origin of every point in n-dimensional space is the group O(n) of rotations and reflections. Rotations in Rn form the group SO(n). Example: The set of all transformations that leave invariant the spatial difference x − y between every two points x and y in n-dimensional space is the group of translations. In this case, group multiplication is vector addition. Example: The set of all linear transformations that leave invariant the square of the Minkowski distance x21 + x22 + x23 − x20 between any 4-vector x and the origin is the Lorentz group (Hermann Minkowski, 1864–1909; Hendrik Lorentz, 1853–1928). Example: The set of all linear transformations that leave invariant the square of the Minkowski distance (x1 −y1 )2 +(x2 −y2 )2 +(x3 −y3 )2 −(x0 −y 0 )2 between any two 4-vectors x and y is the Poincar´ e group, which includes Lorentz transformations and translations (Henri Poincar´e, 1854–1912). Except for the group of translations, the order of the physical transformations in these examples matters: the transformation T 0 T usually is not the same as T T 0 . Such groups are called nonabelian. A group whose elements all commute [T 0 , T ] ≡ T 0 T − T T 0 = 0
(10.1)
is said to be abelian (Niels Abel, 1802–1829). Matrices also naturally form groups. Since matrix multiplication is associative, any set {D} of n × n non-singular matrices that includes the inverse D−1 of every matrix in the set as well as the identity matrix I automatically satisfies properties 2–4 with group multiplication defined as matrix multiplication. The only tricky part is property 1, closure under multiplication. A set {D} of matrices will form a group as long as the product of any two matrices is in the set. As with physical transformations, one way to ensure closure is to have every matrix leave some property invariant or unchanged. Example: The set of all n×n real matrices that leave the quadratic form x21 + x22 + · · · + x2n unchanged forms the group O(n) of all n × n orthogonal matrices (problems 1 & 2). The set of all n × n orthogonal matrices that have unit determinant forms the group SO(n). Example: The set of all n × n complex matrices that leave invariant the quadratic form x∗1 x1 + x∗2 x2 + · · · + x∗n xn forms the group U (n) of all n × n
10.2 Representations of Groups
357
unitary matrices (problems 3 & 4). The set of all n × n unitary matrices that have unit determinant forms the group SU (n) (problem 5). The groups SO(3) and SU (2) represent rotations. The group SU (3) is the symmetry group of quantum chromodynamics. Physicists have used the groups SU (5) and SO(10) to unify the electro-weak and strong interactions; whether Nature does is unclear. The number of elements in a group is the order of the group. A finite group is a group with a finite number of elements, or equivalently a group of finite order. The parity group whose elements are 1 and −1 under‘ ordinary multiplication is the finite group Z2 , which is abelian and of order 2. A group whose elements g = g({α}) depend continuously upon a set of parameters αa is a continuous group or a Lie group. Continuous groups are of infinite order. Example: The set of all real n×n matrices forms a group called GL(n, R); the subset with unit determinant forms the group SL(n, R). The corresponding groups of matrices with complex entries are GL(n, C) and SL(n, C). The group SL(2, C) is used to represent Lorentz transformations. These groups are continuous (Lie) groups of infinite order as are those of the examples—the translations and rotations, the Lorentz group, the Poincar´e group, O(n), SO(n), U (n), and SU (n).
10.2 Representations of Groups Often one may associate with every element g ∈ G a square, finite-dimensional matrix D(g) such that group multiplication is faithfully copied by matrix multiplication. That is, D(f ) D(g) = D(f g).
(10.2)
The set of matrices D(g) is said to form a representation of the group G. If the matrices of the representation are n × n, then n is the dimension of the representation. Equivalently, the dimension of a representation is the dimension of the vector space on which the matrices act. If the matrices D(g) are unitary, D† (g) = D−1 (g), then they form a unitary representation of the group G. Every n × n non-singular matrix S (i.e., det S 6= 0, aka, invertible) maps any n × n representation D(g) of a group G into an equivalent representation D0 (g) through the similarity transformation D0 (g) = S −1 D(g)S
(10.3)
358
Group Theory
with the same law of multiplication D0 (f ) D0 (g) = S −1 D(f )S S −1 D(g)S = S −1 D(f ) D(g)S = S −1 D(f g)S = D0 (f g).
(10.4)
A proper subspace W of a vector space V is a subspace of lower (but not zero) dimension. A proper subspace W is invariant under the action of a representation D(g) if D(g) maps every vector v ∈ W to a vector D(g)v = v 0 ∈ W . A representation that has a proper invariant subspace is reducible. A representation that is not reducible is irreducible. There is no need to keep track of several equivalent irreducible representations D(a) , D0(a) , D00(a) of any group. So in what follows, we shall choose one of these equivalent irreducible representations D(a) and use it exclusively. A representation is completely reducible if it is equivalent to a representation whose matrices are in block-diagonal form D1 (g) 0 ... 0 D2 (g) . . . (10.5) .. .. .. . . . with each representation Di (g) irreducible. A representation in block-diagonal form is said to be a direct sum of the irreducible representations Di D1 ⊕ D2 ⊕ . . . .
(10.6)
10.3 Representations Acting in Hilbert Space A symmetry transformation g is a map (1.261) of states g : ψ → ψ 0 that preserves their inner products |hφ0 |ψ 0 i|2 = |hφ|ψi|2
(10.7)
and so their predicted probabilities. The action of a group G of symmetry transformations g on the Hilbert space of a quantum theory can be represented either by operators U (g) that are linear and unitary (the usual case) or by ones K(g) that are anti-linear (1.259) and anti-unitary (1.260), as in the case of time reversal. Wigner proved this theorem in the 1930’s, and Weinberg improved it in his 1995 classic (Weinberg, 1995, p. 51) (Eugene Wigner, 1902–1995; Steven Weinberg, 1933–). Example: Suppose, for example, that the hamiltonian H, the square of the angular momentum J 2 , and its 3d component Jz form a complete set of compatible observables, so that the identity operator can be expressed as a
359
10.4 Subgroups
sum over the eigenvectors of these operators X I= |E, j, mihE, j, m|.
(10.8)
E,j,m
Then the matrix element of the unitary operator U (g) between the states |ψi and |φi is X X hφ|U (g)|ψi = hφ| |E 0 , j 0 , m0 ihE 0 , j 0 , m0 |U (g) |E, j, mihE, j, m|ψi. E 0 ,j 0 ,m0
E,j,m
(10.9) Suppose that both H and J 2 are invariant under the action of U (g), that is, that U † (g)HU (g) = H and U † (g)J 2 U (g) = J 2 . Then, HU (g) = U (g)H, and so if H|E, j, mi = E|E, j, mi, then HU (g)|E, j, mi = U (g)H|E, j, mi = EU (g)|E, j, mi
(10.10)
and similarly for J 2 . Thus U (g) cannot change E or j, and so (j)
hE 0 , j 0 , m0 |U (g)|E, j, mi = δE 0 E δj 0 j hm0 |U (g)|mi = δE 0 E δj 0 j Dm0 m (g). (10.11) The matrix element (10.9) then is more simply a sum over the energy E and (j) over the irreducible representations Dm0 m (g) of the group SU (2) X (j) hφ|U (g)|ψi = hφ|E, j, m0 iDm0 m (g)hE, j, m|ψi. (10.12) E,j,m0 ,m
This is how the block-diagonal form (10.5) often appears in calculations. (j) The matrices Dm0 m (g) inherit the unitarity of the operator U (g). Only compact groups can be represented by unitary matrices that are of finite dimension. (A group G is compact if its space of parameters is closed and bounded. A set is closed if the limit of every convergent sequence of its points lies in the set. For example, the interval [a, b] ≡ {x|a ≤ x ≤ b} is closed, but (a, b) ≡ {x|a < x < b} is open.) The group of rotations is compact, but the group of translations and the Lorentz group are not. 10.4 Subgroups If all the elements of a group S are also elements of a group G, then S is a subgroup of G. Every group G has two trivial subgroups—the identity element e and the whole group G itself. Many groups have more interesting nontrivial subgroups. For example, the rotations about a fixed axis is an abelian subgroup of the group of all rotations in 3-dimensional space. A subgroup S ⊂ G is an invariant subgroup if every element f of the
360
Group Theory
subgroup S is left inside the subgroup under the action of every element g of the whole group G, that is, if g −1 f g = f 0 ∈ S.
(10.13)
This condition often is written as Sg = gS
∀g ∈ G.
(10.14)
Invariant subgroups are also called normal subgroups. A set C ⊂ G is called a conjugacy class if it’s invariant under the action of the whole group G, that is, if Cg = gC or g −1 Cg = C
∀g ∈ G.
(10.15)
A subgroup that is the union of a set of conjugacy classes is invariant (aka, normal). The center C of a group G is the set of all elements c ∈ G that commute with every element g of the group, that is, their commutators [c, g] ≡ cg − gc = 0
(10.16)
vanishes for all g ∈ G. Example: Does the center C always form an abelian subgroup of its group G? The product g1 g2 of any two elements g1 and g2 of the center must commute with every element g of G since g1 g2 g = g1 gg2 = gg1 g2 . So the center is closed under multiplication. The elements of the center also must commute with themselves. The identity element e commutes with every g ∈ G, so e ∈ C. If g 0 ∈ C, then g 0 g = gg 0 for all g ∈ G, and so multiplication of this equation from the left and the right by g 0−1 gives gg 0−1 = g 0−1 g, which shows that g 0−1 ∈ G. So the center of any group always is one of its abelian invariant subgroups. The center may be trivial, however, consisting either of the identity or of the whole group. But a group with a non-trivial center can not be simple or semi-simple.
10.5 Cosets If H is a subgroup of a group G, then for every element g ∈ G the set of elements Hg ≡ {hg|h ∈ H, g ∈ G} is a right coset of the subgroup H ⊂ G. (Here ⊂ means is a subset of or equivalently is contained in.) If H is a subgroup of a group G, then for every element g ∈ G the set of elements gH is a left coset of the subgroup H ⊂ G.
10.6 Morphisms
361
The number of elements in a coset is the same as the number of elements of H, which is the order of H. An element g of a group G is in one and only one right coset (and in one and only one left coset) of the subgroup H ⊂ G. Suppose instead that g were in two right cosets g ∈ Hg1 and g ∈ Hg2 , so that g = h1 g1 = h2 g2 for suitable h1 , h2 ∈ H and g1 , g2 ∈ G. Then since H is a (sub)group, we have g2 = h−1 2 h1 g1 = h3 g1 , which says that g2 ∈ Hg1 . But this means that every element of hg2 ∈ Hg2 is of the form hg2 = hh3 g1 = h4 g1 ∈ Hg1 . So every element of hg2 ∈ Hg2 is in Hg1 : the two right cosets are identical, Hg1 = Hg2 . The right (or left) cosets are the points of the quotient coset space G/H. If the subgroup H is an invariant subgroup of G, then by definition (10.14) Hg = gH ∀g ∈ G, and so the left cosets are the same sets as the right cosets. In this case, the coset space G/H is itself a group with multiplication defined by (Hg1 ) (Hg2 ) = {hi g1 hj g2 |hi , hj ∈ H} = hi g1 hj g1−1 g1 g2 |hi , hj ∈ H = {hi hk g1 g2 |hi , hk ∈ H} = {h` g1 g2 |h` ∈ H} = Hg1 g2
(10.17)
which is the multiplication rule of the group G. This group G/H is called the factor group of G by H.
10.6 Morphisms An isomorphism is a one-to-one map between groups that respects their multiplication laws. For example, the relation between two equivalent representations D0 (g) = S −1 D(g)S
(10.18)
is an isomorphism (problem 7). An automorphism is an isomorphism between a group and itself. The map gi → g gi g −1 is one to one because g g1 g −1 = g g2 g −1 implies that g g1 = g g2 , and so that g1 = g2 . This map also preserves the law of multiplication since g g1 g −1 g g2 g −1 = g g1 g2 g −1 . So the map G → gGg −1
(10.19)
362
Group Theory
is an automorphism. It is called an inner automorphism because g is an element of G. An automorphism not of this form (10.19) is called an outer automorphism. 10.7 Schur’s Lemma Part 1: If D1 (g)A = AD2 (g) for all g ∈ G, and if D1 & D2 are inequivalent irreducible representations, then A = 0. Proof: First suppose that A annihilates some vector |xi, that is, A|xi = 0. Let P be the projection operator P into the subspace that A annihilates, which is of at least one dimension. This subspace, incidentally, is called the null space N (A) or the kernel of the matrix A. The representation D2 must leave this null space N (A) invariant since AD2 (g)P = D1 (g)AP = 0.
(10.20)
If N (A) were a proper subspace, then the representation D2 would be reducible, which is contrary to our assumption that D1 and D2 are irreducible. So the null space N (A) must be the whole space upon which A acts, that is, A = 0. A similar argument shows that if hy|A = 0 for some bra hy|, then A = 0. So either A is zero or it annihilates no ket and no bra. In the latter case, A must be square and invertible, which would imply that D2 (g) = A−1 D1 (g)A, that is, that D1 and D2 are equivalent representations, which is contrary to our assumption that they are inequivalent. The only way out is that A vanishes. Part 2: If for a finite-dimensional, irreducible representation D(g) of a group G, we have D(g)A = AD(g) for all g ∈ G, then A = cI. That is, any matrix that commutes with every element of a finite-dimensional, irreducible representation must be a multiple of the identity matrix. Proof: Every square matrix A has at least one eigenvector |xi and eigenvalue c so that A|xi = c|xi because its characteristic equation det(A−cI) = 0 always has at least one root by the fundamental theorem of algebra (5.96). So the null space N (A − cI) has dimension greater than zero. Now D(g)A = AD(g) for all g ∈ G implies that D(g)(A − cI) = (A − cI)D(g) for all g ∈ G. Let P be the projection operator onto the null space N (A − cI). Then we have (A − cI)D(g)P = D(g)(A − cI)P = 0 for all g ∈ G which implies that D(g)P maps vectors into the null space N (A − cI). This null space is therefore invariant under D(g), which means that D is reducible unless the null space N (A − cI) is the whole space. Since by assumption D is irreducible, it follows that N (A − cI) is the whole space, that is, that A = 0.
363
10.8 Characters
Example and Application: Suppose an arbitrary observable O is invariant under the action of the rotation group SU (2) represented by unitary operators U (g) for g ∈ SU (2) U † (g)OU (g) = O
or [O, U (g)] = 0.
(10.21)
These unitary rotation operators commute with the square J 2 of the angular momentum [J 2 , U ] = 0. Suppose that they also leave the hamiltonian H unchanged [H, U ] = 0. Then as shown in Sec. 10.3, the state U |E, j, mi is a sum of states all with the same values of j and E. It follows that X hE, j, m|O|E 0 , j 0 , m0 ihE 0 , j 0 , m0 |U (g)|E 0 , j 0 , m00 i = m0
X
hE, j, m|U (g)|E, j, m0 ihE, j, m0 |O|E 0 , j 0 , m00 i
(10.22)
m0
or more simply in view of (10.11) X X 0 hE, j, m|O|E 0 , j 0 , m0 iDj (g)m0 m00 = D(j) (g)mm0 hE, j, m0 |O|E 0 , j 0 , m00 i. m0
m0
(10.23) hE, j, m|O|E 0 , j 0 , m0 i
Now Part 1 of Schur’s lemma tells us that the matrix must vanish unless the representations are equivalent, which is to say unless j = j 0 . So we have X X hE, j, m|O|E 0 , j, m0 iDj (g)m0 m00 = D(j) (g)mm0 hE, j, m0 |O|E 0 , j, m00 i. m0
m0
(10.24) hE, j, m|O|E 0 , j, m0 i
Now Part 2 of Schur’s lemma tells us that the matrix must be a multiple of the identity. Thus the symmetry of O under rotations simplifies the matrix element to hE, j, m|O|E 0 , j 0 , m0 i = δjj 0 δmm0 Oj (E, E 0 ).
(10.25)
This result is a special case of the Wigner-Eckart theorem (Eugene Wigner, 1902–1995, and Carl Eckart, 1902–1973).
10.8 Characters Suppose the n × n matrices Dij (g) form a representation of a group G 3 g. The character χD (g) of the matrix D(g) is the trace χD (g) = TrD(g) =
n X i=1
Dii (g).
(10.26)
364
Group Theory
Traces are cyclic, that is, TrABC = TrBCA = TrCAB. So if two representations D and D0 are equivalent, so that D0 (g) = S −1 D(g)S, then they have the same characters because χD0 (g) = TrD0 (g) = Tr S −1 D(g)S = Tr D(g)SS −1 = TrD(g) = χD (g). (10.27) If two group elements g1 and g2 are in the same conjugacy class, that is, if g2 = gg1 g −1 for some g ∈ G, then they have the same character in a given representation D(g) because χD (g2 ) = TrD(g2 ) = TrD(gg1 g −1 ) = Tr D(g)D(g1 )D(g −1 ) = Tr D(g1 )D−1 (g)D(g) = TrD(g1 ) = χD (g1 ). (10.28) 10.9 Tensor Products Suppose D1 (g) is a k × k-dimensional representation of a group G, and D2 (g) is an n × n-dimensional representation of the same group. Suppose the vectors |`i for ` = 1 . . . k are the basis vectors of the k-dimensional space Vk on which D1 (g) acts, and that the vectors |mi for m = 1 . . . n are the basis vectors of the n-dimensional space Vn on which D2 (g) acts. The k × n vectors |`, mi are basis vectors for the kn-dimensional tensor-product space Vkn . The matrices DD1 ⊗D2 (g) defined as h`0 , m0 |DD1 ⊗D2 (g)|`, mi = h`0 |D1 (g)|`ihm0 |D2 (g)|mi
(10.29)
act in this kn-dimensional space Vkn and form a representation of the group G; this tensor-product representation usually is reducible. Georgi’s book (Georgi, 1999, p. 309) explains the tricks one may use to decompose reducible tensorproduct representations into direct sums of irreducible representations. Example: The addition of angular momenta illustrates both the tensor product and its reduction to a direct sum of irreducible representations. Here Dj1 (g) and Dj2 (g) respectively are the angular momentum j1 , (2j1 + 1) × (2j1 + 1) and the angular momentum j2 , (2j2 + 1) × (2j2 + 1) representations of the rotation group SU (2). The tensor-product representation DDj1 ⊗Dj2 defined as hm01 , m02 |DDj1 ⊗Dj2 |m1 , m2 i = hm01 |Dj1 (g)|m1 ihm02 |Dj2 (g)|m2 i
(10.30)
is reducible into a direct sum of all the irreducible representations of SU (2) from Dj1 +j2 (g) down to D|j1 −j2 | (g) in integer steps: DDj1 ⊗Dj2 = Dj1 +j2 ⊕ Dj1 +j2 −1 ⊕ · · · ⊕ D|j1 −j2 |+1 ⊕ D|j1 −j2 | each irreducible representation occurring once in the direct sum.
(10.31)
10.10 Finite Groups
365
Example: When one adds two angular momenta of angular momentum (or spin) one-half, the tensor-product matrix DD1/2 ⊗D1/2 is equivalent to the direct sum D1 ⊕ D0 0 −1 D1 (θ) DD1/2 ⊗D1/2 (θ) = S S (10.32) 0 D0 (θ) in which the matrices S, D1 , and D0 respectively are 4 × 4, 3 × 3, and 1 × 1. 10.10 Finite Groups A finite group is one that has a finite number of elements. The number of elements in a group is the order of the group. Example: The group Z2 consists of two elements e and p with multiplication rules ee = e ep = p pe = p pp = e.
(10.33)
Clearly, Z2 is abelian, and its order is 2. The identification e → 1 and p → −1 gives a 1-dimensional representation of the group Z2 in terms of 1 × 1 matrices, i.e., numbers. It is tedious to write the multiplication rules as individual equations. Normally people compress them into a multiplication table like this: ×
e
p
e p
e p
p e
(10.34)
A simple generalization of Z2 is the group Zn whose elements may be represented as exp(i2πm/n) for m = 1, . . . , n. This group is also abelian, and its order is n. Example: The multiplication table for Z3 is ×
e
a
b
e a b
e a b
a b e
b e a
(10.35)
366
Group Theory
10.11 The Regular Representation For any finite group G we can associate an orthonormal vector |gi i with each element gi of the group. So hgi |gj i = δij . These orthonormal vectors |gi i form a basis for a vector space whose dimension is the order of the group. The matrix D(gk ) of the regular representation of G is defined to map any vector |gi i into the vector |gk gi i associated with the product gk gi D(gk )|gi i = |gk gi i.
(10.36)
Since group multiplication is associative, we have D(gj )D(gk )|gi i = D(gj )|gk gi i = |gj (gk gi )i = |(gj gk )gi )i = D(gj gk )|gi i. (10.37) Because the vector |gi i was an arbitrary basis vector, it follows that D(gj )D(gk ) = D(gj gk )
(10.38)
which means that the matrices D(g) satisfy the closure criterion (10.2) for their being a representation of the group G. The matrix D(g) has entries [D(g)]ij = hgi |D(g)|gj i.
(10.39)
The sum of dyadics |g` ihg` | over all the elements g` of a finite group G is the unit matrix X |g` ihg` | = Io (10.40) g` ∈G
in which o is the order of G, that is, the number of elements in G. So by taking the mn matrix element of Eq.( 10.38), we find [D(gj gk )]mn = hgm |D(gj gk )|gn i = hgm |D(gj )D(gk )|gn i (10.41) X X = hgm |D(gj )|g` ihg` |D(gk )|gn i = [D(gj )]m` [D(gk )]`n . g` ∈G
g` ∈G
Example: The regular representation of Z3 is 1 0 0 0 0 1 0 1 0 D(e) = 0 1 0 , D(a) = 1 0 0 , D(b) = 0 0 1 . 0 0 1 0 1 0 1 0 0 (10.42)
367
10.12 Properties of Finite Groups
10.12 Properties of Finite Groups In his book (Georgi, 1999, ch. 1), Georgi proves the following theorems: 1. Every representation of a finite group is equivalent to a unitary representation. 2. Every representation of a finite group is completely reducible. 3. The irreducible representations of a finite abelian group are one dimensional. 4. If D(a) (g) and D(b) (g) are two unitary irreducible representations of dimensions na and nb of a group G of N elements g1 , . . . , gN , then the functions r na (a) D (g) (10.43) N jk are orthonormal and complete in the sense that N na X (a)∗ (b) Dik (gj )D`m (gj ) = δab δi` δkm . N
(10.44)
j=1
5. The order N of a finite group is the sum of the squares of the dimensions of its inequivalent irreducible representations X N= n2a . (10.45) a
Example: The abelian cyclic group ZN with elements gj = e2πij/N
(10.46)
has N one-dimensional irreducible representations D(a) (gj ) = e2πiaj/N
(10.47)
for a = 1, 2, . . . , N . Their orthonormality relation (10.44) is the Fourier formula N 1 X −2πiaj/N 2πibj/N e e = δab . N
(10.48)
j=1
The na are all unity, there are N of them, and the sum of the n2a is N as required by the sum rule (10.45).
368
Group Theory
10.13 Permutations The permutation group on n objects is called Sn . Permutations are made of cycles that change the order of some of the n objects. For instance, the permutation (1 2) is a 2-cycle that means x1 → x2 → x1 ; the unitary operator U ((1 2)) that represents it interchanges states like this: U ((1 2))|+, −i = U ((1 2))|+, 1i |−, 2i = |−, 1i, |+, 2i = |−, +i.
(10.49)
The 2-cycle (34) means x3 → x4 → x3 , it changes (a, b, c, d) into (a, b, d, c). The 3-cycle (1 2 3) means x1 → x2 → x3 → x1 , it changes (a, b, c, d) into (b, c, a, d). The 4-cycle (1 3 2 4) means x1 → x3 → x2 → x4 → x1 and changes (a, b, c, d) into (c, d, b, a). The 1-cycle (2) means x2 → x2 and leaves everything unchanged. The identity element of Sn is the product of 1-cycles e = (1)(2) . . . (n). The inverse of the cycle (1 3 2 4) must invert x1 → x3 → x2 → x4 → x1 , so it must be (1 4 2 3) which means x1 → x4 → x2 → x3 → x1 so that it changes (c, d, b, a) back into (a, b, c, d). Every element of Sn has each integer from 1 to n in one and only one cycle. So an arbitrary element of Sn with `k k-cycles must satisfy n X k `k = n. (10.50) k=1
10.14 Compact and Noncompact Lie Groups Imagine rotating an object repeatedly. Notice that the biggest rotation is by an angle π about some axis and the possible angles form a circle since the rotations are periodic. The rotations form a compact group. The parameter space of a compact group is compact—closed and bounded. Now consider the translations. Imagine moving a pebble to the Sun, then moving it to the next-nearest star, then moving it to the nearest galaxy. If space is flat, then there is no limit to how far one can move a pebble. So the translations form a noncompact group. The parameter space of a non-compact group is not compact. we’ll see why compact Lie groups possess unitary representations, with N × N unitary matrices D(α), while noncompact ones don’t. 10.15 Lie Algebra Continuous groups can be very complicated. So one uses not only algebra but also calculus, and one studies the part of the group that is simplest —
10.15 Lie Algebra
369
the set of group elements g(dα) that are near the identity, e = g(0), for which all αa = 0. If D(g({αa })) is a representation of a Lie group with parameters {αa }, it gets tedious to write D(g({αa })) over and over. So instead one writes g(α) = g({αa }) and D(α) = D(g(α)) = D(g({αa }))
(10.51)
leaving out the explicit mentions both of g and of {αa }. Any matrix D(dα) representing a group element g(dα) that is near the identity is approximately X D(dα) = I + i dαa Xa (10.52) a
where the generators Xa of the group are the partial derivatives ∂ Xa = −i D(α) . ∂αa α=0
(10.53)
The i is inserted so that if the matrices D(α) are unitary, then the generators are hermitian matrices Xa† = Xa .
(10.54)
Compact groups have finite-dimensional, unitary representations and hermitian generators. Our formulas will look nicer if we adopt the convention that we sum over all indices that occur twice in a product. That is, we drop the summation symbol when summing over a repeated index so that (10.52) looks like this D(dα) = I + idαa Xa .
(10.55)
Unless the parameters αa are redundant, the N (G) generators are linearly independent. They span a vector space over the real numbers X = αa Xa
(10.56)
and any such linear combination may be called a generator. By using the Gramm-Schmidt procedure, we may make the N (G) generators Xa orthogonal with respect to the inner product (1.88) (Xa , Xb ) = Tr Xa† Xb = k δab (10.57) in which k is a non-negative normalization constant that in general depends upon the representation. The reason why we don’t normalize the generators will become apparent shortly.
370
Group Theory
Since group multiplication is closed, any power g n (dα) ∈ G, and so we may take the limit iαa Xa n n D(α) = lim D (α/n) = I + = eiαa Xa . (10.58) n→∞ n This parametrization of a representation of a group is called the exponential parametrization. Now for tiny the product 2 2 X )(1 + iXa − 2 b 2 × (1 − iXb − Xb2 )(1 − iXa − 2
eiXb eiXa e−iXb e−iXa ≈ (1 + iXb −
2 2 X ) 2 a 2 2 X )(10.59) 2 a
to order 2 is eiXb eiXa e−iXb e−iXa ≈ 1 + 2 (Xa Xb − Xb Xa ) = 1 + 2 [Xa , Xb ]. (10.60) Since this product represents a group element near the identity, the commutator must be a linear combination of generators of order 2 eiXb eiXa e−iXb e−iXa = ei
2f c X ab c
c Xc . ≈ 1 + i2 fab
(10.61)
By matching (10.60) with (10.61) we have c Xc . [Xa , Xb ] = ifab
(10.62)
c are the structure constants of the group G. The numbers fab By taking the trace of the last equation (10.62) multiplied by Xd† and by using the orthogonality relation (10.57), we find d c c kδcd = ikfab (10.63) Tr Xc Xd† = ifab Tr [Xa , Xb ]Xd† = ifab c is the trace which implies that the structure constant fab c fab = (−i/k)Tr [Xa , Xb ]Xc† .
(10.64)
Because of the anti-symmetry of the commutator [Xa , Xb ], the structure c is anti-symmetric in its lower indices constant fab c c fab = −fba .
(10.65)
From any n × n matrix A, one may make a hermitian matrix A + A† and an anti-hermitian one A − A† . Thus, one may separate the N (G) generators (h) (ah) into a set that are hermitian Xa and a set that are anti-hermitian Xa .
371
10.15 Lie Algebra
The exponential of any imaginary linear combination of n × n hermitian generators D(α) = exp iαa Xa(h) (10.66) is an n × n unitary matrix since † †(h) (h) D (α) = exp −iαa Xa = exp −iαa Xa = D−1 (α).
(10.67)
A group with only hermitian generators is compact and has finite-dimensional unitary representations. On the other hand, the exponential of any imaginary linear combination of anti-hermitian generators D(α) = exp iαa Xa(ah) (10.68) (ah)
is a real exponential of their hermitian counterparts iXa D(α) = exp αa iXa(ah)
(10.69)
whose squared norm kD(α)k2 = Tr D(α)† D(α) = Tr exp 2αa iXa(ah)
(10.70)
grows exponentially and without limit as the parameters αa → ∞. A group with some anti-hermitian generators is non-compact and does not have finite-dimensional unitary representations. (The unitary representations of the translations and of the Lorentz and Poincar´e groups are infinite dimensional.) The rest of this chapter is about compact Lie groups with hermitian generators. The structure-constant formula (10.64) reduces in this case to c † (10.71) fab = (−i/k)Tr [Xa , Xb ]Xc = (−i/k)Tr ([Xa , Xb ]Xc ) . Now, since the trace is cyclic, we have b fac = (−i/k)Tr ([Xa , Xc ]Xb ) = (−i/k)Tr (Xa Xc Xb − Xc Xa Xb )
= (−i/k)Tr (Xb Xa Xc − Xa Xb Xc ) c c = (−i/k)Tr ([Xb , Xa ]Xc ) = fba = −fab .
(10.72)
Interchanging a and b, we get a c c . fbc = fab = −fba
(10.73)
Finally, interchanging b and c gives c b b fab = fca = −fac .
(10.74)
372
Group Theory
Combining (10.72, 10.73, & 10.74), we find that the structure constants of a compact Lie group are totally anti-symmetric b b c c a a fac = −fca = fba = −fab = −fbc = fcb .
(10.75)
Because of this anti-symmetry, it is usual to lower the upper index c fabc ≡ fab
(10.76)
and write the anti-symmetry of the structure constants compact Lie groups more simply as facb = −fcab = fbac = −fabc = −fbca = fcba .
(10.77)
For compact Lie groups, the generators are hermitian, and so the structure constants fabc are real, as we may see by taking the complex conjugate of Eq.(10.71) ∗ fabc = (i/k)Tr (Xc [Xb , Xa ]) = (−i/k)Tr ([Xa , Xb ]Xc ) = fabc .
(10.78)
All the representations of a given group must obey the same multiplication law, that of the group. Thus in the exponential parametrization, if the representation D1 satisfies (10.61) eiXb eiXa e−iXb e−iXa ≈ ei
2f
abc Xc
(10.79)
that is, if with a being the vector with kth component δak and b being the vector with kth component δbk , we have D1 (b )D1 (a )D1 (−b )D1 (−a ) = D1 (2 fabc )
(10.80)
then any other representation D2 must satisfy the same relation with 2 replacing 1: D2 (b )D2 (a )D2 (−b )D2 (−a ) = D2 (2 fabc ).
(10.81)
Such uniformity will occur if the structure constants (10.64) are the same for all representations of a group. To ensure that this is so, we must allow the normalization parameter k in the trace relation (10.64) to vary with each representation Dr (α). The structure constants fabc then are a property of the group G, not of any particular representation D(α). This is why we didn’t make the generators Xa orthonormal.
10.16 The Rotation Group
373
10.16 The Rotation Group The rotations and reflections in three-dimensional space form a compact group O(3) whose elements R are real 3 × 3 matrices that leave invariant the dot-product of any two three vectors (Rx) · (Ry) = xT RT R y = xT Iy = x · y.
(10.82)
These matrices therefore are orthogonal (1.258) RT R = I.
(10.83)
Taking the determinant of both sides and using the transpose (1.195) and product (1.208) rules, we have (det R)2 = 1
(10.84)
whence det R = ±1. The subgroup with det R = 1 is the group SO(3). An SO(3) element near the identity R = I + ω must satisfy (I + ω) T (I + ω) = I.
(10.85)
Neglecting the tiny quadratic term, we find that the infinitesimal matrix ω is antisymmetric ω T = − ω.
(10.86)
One complete set of real 3 × 3 antisymmetric matrices is 0 0 0 0 0 1 0 −1 0 Y1 = 0 0 −1 , Y2 = 0 0 0 , Y3 = 1 0 0 (10.87) 0 1 0 −1 0 0 0 0 0 which we may write as [Yb ]ac = abc
(10.88)
in which abc is the Levi-Civita tensor which is totally antisymmetric with 123 = 1. The Yb are antihermitian, but we make them hermitian by multiplying by i Xb = iYb
(10.89)
so that R = I − iθb Xb . The three hermitian generators Xa satisfy (problem 14) the commutation relations [Xa , Xb ] = ifabc Xc
(10.90)
374
Group Theory
in which the structure constants are given by the Levi-Civita tensor abc fabc = abc
(10.91)
[Xa , Xb ] = iabc Xc .
(10.92)
so that
Physicists usually scale the generators by ~ and define the angular-momentum generator La as La = ~Xa
(10.93)
so that the eigenvalues of the angular-momentum operators are the physical values of the angular momenta. With ~, the commutation relations are [La , Lb ] = i~ abc Lc .
(10.94)
The matrix that represents a right-handed rotation about the axis θ of angle θ = |θ| is D(θ) = eθ·Y = e−iθ·X = e−iθ·L/~ .
(10.95)
By using the fact (1.306) that a matrix obeys its characteristic equation, one may show (problem 16) that the 3 × 3 matrix D(θ) that represents a right-handed rotation of θ radians about the axis θ is Dij (θ) = cos θ δij − sin θ ijk θk /θ + (1 − cos θ) θi θj /θ2
(10.96)
in which a sum over k = 1, 2, 3 is understood. Example 10.1 (Demonstration of Commutation Relations) Take a big sphere with a distinguished point and orient the sphere so that the point lies in the y-direction from the center of the sphere. Now rotate the sphere by a small angle, say 15 degrees or = π/12, right-handedly about the x-axis, then right-handedly about the y-axis by the same angle, then left-handedly about the x-axis and then about the y-axis. These rotations amount to a smaller, left-handed rotation about the (vertical) z-axis in accordance with Eq.(10.79) with ~Xa = L1 = Lx , ~Xb = L2 = Ly , and ~fabc Xc = 12c Lc = L3 = Lz eiLy /~ eiLx /~ e−iLy /~ e−iLx /~ ≈ ei
2 L /~ z
.
(10.97)
And the magnitude of that rotation should be about 2 = (π/12)2 ≈ 0.069 or about 3.9 degrees. Photos of an actual demo, performed by the author, are displayed in Fig. 10.1. By expanding both sides of the demonstrated equation (10.97) in powers of and keeping only the biggest terms that don’t cancel, you may show
375
10.17 The Lie Algebra and Representations of SU(2)
(problem 15) that the generators Lx and Ly satisfy the commutation relation [Lx , Ly ] = i~Lz
(10.98)
of the rotation group.
10.17 The Lie Algebra and Representations of SU(2) The three generators of SU (2) in its 2 × 2 defining representation are Xa = σa /2, and the structure constants of SU (2) are fabc = abc which is totally anti-symmetric with 123 = 1 [Xa , Xb ] = ifabc Xc = [
σa σb σc , ] = iabc . 2 2 2
(10.99)
For every half-integer j=
n 2
for n = 0, 1, 2, 3, . . .
(10.100)
there is an irreducible representation of SU (2) D(j) (θ) = e−iθ·J (j)
(j)
(10.101)
(j)
in which the three generators Xa ≡ Ja are (2j + 1) × (2j + 1) square (j) hermitian matrices. In a basis in which J3 is diagonal, the matrix elements (j) (j) (j) of the complex linear combinations J± ≡ J1 ± iJ2 are h i p (j) (j) 0 ,s±1 (j ∓ s)(j ± s + 1) (10.102) J1 ± iJ2 = δ s 0 s ,s
(j)
and those of J3
are h
(j)
J3
i s0 ,s
= s δs0 ,s .
(10.103) (j)
The sum of the squares of the three generators Ja (2j + 1) × (2j + 1) identity matrix
Ja(j)
2
= j(j + 1) I.
is a multiple of the
(10.104)
Combinations of generators that are a multiple of the identity are called Casimir operators.
376
Group Theory (2)
Example 10.2 (Spin 2) Thus for j = 2, the spin-two matrices J+ and (2) J3 are 0 2 √0 0 0 2 0 0 0 0 0 0 0 1 0 0 6 √0 0 0 (2) (2) J+ = 0 0 0 0 6 0 and J3 = 0 0 0 0 0 0 0 0 0 0 −1 0 0 2 0 0 0 0 −2 0 0 0 0 0 (10.105) (2) † . and J− = J+ The tensor product of any two irreducible representations D(j) and D(k) is equivalent to the direct sum of all the irreducible representations D` for |j − k| ≤ ` ≤ j + k D
(j)
⊗D
(k)
j+k M
=
D`
(10.106)
`=|j−k|
each D` occurring once. Under a rotation R, a field ψ` (x) that transforms under the D(j) representation of SU (2) responds as (j,j 0 )
U (R) ψ` (x) U −1 (R) = D``0 (R−1 ) ψ`0 (Rx).
(10.107)
Example 10.3 (Spin and Statistics) Suppose |a, mi and |b, mi are any eigenstates of the rotation operator J3 with eigenvalue m (in units with ~ = c = 1). Let y and z be any two points whose separation y − z is spacelike (y − z)2 > 0. Then in some Lorentz frame, the two points are at the same time t, and we may chose our coordinate system so that y 0 = (t, x) and z 0 = (t, − x). Let U be the unitary operator that represents a right-handed rotation by π about the 3-axis. Then U |a, mi = e−imπ |a, mi
and hb, m|U −1 = hb, m|eimπ .
(10.108)
And by (10.107), under this rotation, a field ψ of spin j transforms as (j,j 0 )
U (R) ψ` (t, ) U −1 (R) = D``0 (R−1 ) ψ`0 (t, − x) = eiπ` ψ` (t, − x). (10.109) Thus by inserting the identity operator in the form I = U −1 U and using both (10.108) and (10.107), we find, since the phase factors exp(−imπ) and exp(imπ) cancel, hb, m|ψ` (t, x) ψ` (t, −x)|b, mi = hb, m|U ψ` (t, x)U −1 U ψ` (t, −x)U −1 |a, mi = e2iπ` hb, m|ψ` (t, −x)ψ` (t, x)|a, mi. (10.110)
10.18 The Defining Representation of SU(2)
377
Now if j is an integer, then so is `, and the phase factor exp(2iπ`) = 1 is unity. In this case, we find that the mean value of the equal-time commutator vanishes hb, m|[ψ` (t, x), ψ` (t, − x)]|a, mi = 0.
(10.111)
One the other hand, if j is half an odd integer, that is, j = (2n + 1)/2, where n is an integer, then the phase factor exp(2iπ`) = −1 is minus one. In this case, the mean value of the equal-time anti-commutator vanishes hb, m|{ψ` (t, x), ψ` (t, − x)}|a, mi = 0.
(10.112)
While not a proof of the spin-statistics theorem, this argument shows that the behavior of fields under rotations does determine their statistics.
10.18 The Defining Representation of SU(2) The smallest positive value of angular momentum is ~/2. The spin-one-half angular momentum operators are represented by three 2 × 2 matrices ~ σa 2 matrices −i 1 0 , and σ3 = 0 0 −1 Sa =
in which the σa are the Pauli 0 1 0 σ1 = , σ2 = 1 0 i
(10.113)
(10.114)
which obey the multiplication law σi σj = δij + i
3 X
ijk σk .
(10.115)
k=1
The Pauli matrices divided by 2 satisfy the commutation relations (10.92) of the rotation group 1 1 1 [ σa , σb ] = iabc σc (10.116) 2 2 2 and generate the elements of the group SU (2) σ θ θ exp i θ · = I cos + i θˆ · σ sin 2 2 2 √ 2 in which I is the 2×2 identity matrix, θ = θ and θˆ = θ/θ. It follows from (10.116) that the spin operators satisfy [Sa , Sb ] = i~abc Sc .
(10.117)
(10.118)
378
Group Theory
The raising and lowering operators S± = S1 ± iS2
(10.119)
have simple commutators with S3 [S3 , S± ] = ±i~S± .
(10.120)
This relation implies that if the state |j, mi is an eigenstate of S3 with eigenvalue ~m, then the states S± |j, mi either vanish or are eigenstates of S3 with eigenvalues ~(m ± 1) S3 S± |j, mi = S± S3 |j, mi ± i~S± |j, mi = ~(m ± 1)S± |j, mi.
(10.121)
Thus the raising and lowering operators raise and lower the eigenvalues of S3 . When j = 1/2, the possible values of m are m = ±1/2, and so with the usual sign and normalization conventions S+ |−i = ~|+i
and S− |+i = ~|−i
(10.122)
while S+ |+i =)
and S− |−i = 0.
(10.123)
The square of the total spin operator is simply related to the raising and lowering operators and to S3 1 1 S 2 = S12 + S22 + S32 = S+ S− + S− S+ + S32 . 2 2
(10.124)
But the squares of the Pauli matrices are unity, and so Sa2 = (~/2)2 for all three values of a. Thus 3 S 2 = ~2 (10.125) 4 for a spin one-half system. Example: Consider two spin operators S (1) and S (2) as in Eq.(10.113) acting on two spin-one-half systems. Let the tensor-product states |±, ±i = |±i1 |±i2 = |±i1 ⊗ |±i2 (1)
be the eigenstates of S3
(2)
and S3
(10.126)
so that, for instance
~ |+, −i 2 ~ (2) S3 |+, −i = − |+, −i. 2 (1)
S3 |+, −i =
(10.127)
379
10.18 The Defining Representation of SU(2)
In terms of these states, find the eigenstates and eigenvalues of 2 S 2 = S (1) + S (2)
(10.128)
and of (1)
(2)
S3 = S3 + S3 .
(10.129)
Solution: From Eqs.(1.490–1.498), the state |+, +i is an eigenstate of S3 with eigenvalue ~ ~ ~ |+, +i + |+, +i = ~|+, +i. 2 2 (10.130) So the state of angular momentum ~ in the 3-direction is (1)
(2)
S3 |+, +i = S3 |+, +i + S3 |+, +i =
|1, 1i = |+, +i.
(10.131)
Similarly, the state |−, −i is an eigenstate of S3 with eigenvalue −~ ~ ~ (1) (2) S3 |−, −i = S3 |−, −i + S3 |−, −i = − |−, −i − |−, −i = − ~|−, −i. 2 2 (10.132) So the state of angular momentum ~ in the negative 3-direction is |1, −1i = |−, −i.
(10.133)
The states |+, −i and |−, +i are eigenstates of S3 with eigenvalue 0 (1)
(2)
S3 |+, −i = S3 |+, −i + S3 |+, −i =
~ ~ |+, −i − |+, −i = 0 2 2
(10.134)
and ~ ~ (1) (2) S3 |−, +i = S3 |−, +i + S3 |−, +i = − |−, +i + |−, +i = 0. (10.135) 2 2 To see which of these states are eigenstates of S 2 , we use the lowering operator for the combined system (1)
(2)
S− = S− + S−
and the rules (10.122 & 10.123) to lower find the state |1, 0i (1) (2) S− |+, +i = S− + S− |+, +i = ~ (|−, +i + |+, −i) .
(10.136)
(10.137)
Thus the state |1, 0i is 1 |1, 0i = √ (|+, −i + |−, +i) . 2
(10.138)
380
Group Theory
The orthogonal and normalized combination of |+, −i and |−, +i must be the state of spin zero 1 |0, 0i = √ (|+, −i − |−, +i) 2
(10.139)
with the usual sign convention. To check that the states |1, 0i and |0, 0i really are eigenstates of S 2 , we use (10.124 & 10.125) to write S 2 as 2 3 S 2 = S (1) + S (2) = ~2 + 2S (1) · S (2) 2 3 2 (1) (2) (1) (2) (1) (2) = ~ + S+ S− + S− S+ + 2S3 S3 . (10.140) 2 (1) (2)
(1) (2)
Now the sum S+ S− + S− S+ merely interchanges the states |+, −i and |−, +i and multiplies them by ~2 , so 2 3 2 ~ |1, 0i + ~2 |1, 0i − ~2 |1, 0i 2 4 2 2 = 2~ |1, 0i = s(s + 1)~ |1, 0i
S 2 |1, 0i =
(10.141)
so s = 1. Because of the relative minus sign in formula (10.139) for the state |0, 0i, 3 2 1 ~ |0, 0i − ~2 |1, 0i − ~2 |1, 0i 2 2 = 0~2 |1, 0i = s(s + 1)~2 |1, 0i
S 2 |0, 0i =
(10.142)
so s = 0.
10.19 The Jacobi Identity Let A, B, and C be any three square matrices. They satisfy the product rule of commutators [A, BC] = ABC − BCA = ABC − BAC + BAC − BCA = [A, B]C + B[A, C].
(10.143)
Interchanging B and C gives [A, CB] = [A, C]B + C[A, B].
(10.144)
If we now subtract the second equation from the first, we get [A, [B, C]] = [[A, B], C] + [B, [A, C]]
(10.145)
10.20 The Adjoint Representation
381
which is the Jacobi identity. The cyclic form of the Jacobi identity is equivalent [A, [B, C]] + [B, [C, A]] + [C, [A, B]] = 0
(10.146)
because [B, [C, A]] = −[B, [A, C]] and [C, [A, B]] = −[[A, B], C].
10.20 The Adjoint Representation Like any three square matrices, the generators Xa , Xb , and Xc satisfy the Jacobi identity (10.146) [Xa , [Xb , Xc ]] + [Xb , [Xc , Xa ]] + [Xc , [Xa , Xb ]] = 0.
(10.147)
By using the structure constants of the group, we may express each of these double commutators as a linear combination of the generators [Xa , [Xb , Xc ]] = [Xa , ifbcd Xd ] = − fbcd fade Xe
[Xb , [Xc , Xa ]] = [Xb , ifcad Xd ] = − fcad fbde Xe
[Xc , [Xa , Xb ]] = [Xc , ifabd Xd ] = − fabd fcde Xe .
(10.148)
So the Jacobi identity (10.147) implies that (fbcd fade + fcad fbde + fabd fcde ) Xe = 0
(10.149)
or since the generators are linearly independent fbcd fade + fcad fbde + fabd fcde = 0.
(10.150)
If we define a set of matrices Ta by (Tb )ac = ifabc
(10.151)
then, since the structure constants are totally anti-symmetric, we may write the three terms in the preceding equation (10.150) as fbcd fade = fcbd fdae = (−Tb Ta )ce
(10.152)
fcad fbde = −fcad fdbe = (Ta Tb )ce
(10.153)
fabd fcde = −ifabd (Td )ce
(10.154)
and
382
Group Theory
or in matrix notation [Ta , Tb ] = ifabc Tc .
(10.155)
So the matrices Ta , which we made out of the structure constants by the rule (Tb )ac = ifabc (10.151), obey the same algebra (10.62) as do the generators Xa . They are the generators in the adjoint representation of the Lie algebra. If the Lie algebra has N generators Xa , then the N generators Ta in the adjoint representation are N × N matrices.
10.21 The Casimir Operator For any compact Lie algebra, the sum of the squares of all the generators is the Casimir operator C=
N X
ta ta .
(10.156)
n=1
This operator commutes with every generator tb [C, tb ] = [ta ta , tb ] = [ta , tb ]ta + ta [ta , tb ]ta = ifabc tc ta + ta ifabc tc = i (fabc + fcba ) tc ta = 0
(10.157)
because of the total anti-symmetry (10.77) of the structure constants. It follows that the Casimir C must commute with every matrix D(α) [C, D(α)] = [C, exp(iαa ta )] = 0
(10.158)
of the representation generated by the ta ’s. Thus, by part 2 of Schur’s lemma (Sec. 10.7), the Casimir operator must be a multiple of the identity matrix C = ta ta = cI.
(10.159)
The constant c depends upon the representation D(α) and is called the quadratic Casimir.
10.22 Tensor Operators for SU(2) (j)
Suppose Am is a set of 2j + 1 operators whose commutation relations with the generators Ji of rotations are (j)
(j)
[Ji , A(j) m ] = A` (Ji )`m
(10.160)
in which the sum over ` runs from −j to j. Then A(j) is said to be a spin-j tensor operator for the group SU (2).
383
10.23 Simple and Semi-Simple Lie Algebras (1)
For instance, if j = 1, then (Ji )`m = i~`im , and so a spin-1 tensor operator of SU (2) is a vector A1m that transforms as (j)
(j)
[Ji , A(1) m ] = A` i~`im = i~im` A`
(10.161)
under rotations. Let’s rewrite the definition (10.160) as (j)
(j)
(j) Ji A(j) m = A` (Ji )`m + Am Ji (j)
(10.162) (j)
and specialize to the case i = 3 so that (J3 )`m is diagonal, (J3 )`m = ~mδ`m (j)
(j)
(j)
(j) (j) (j) J3 A(j) m = A` (J3 )`m + Am J3 = A` ~mδ`m + Am J3 = Am (~m + J3 ) (10.163) Thus if the state |j, m0 , Ei is an eigenstate of J3 with eigenvalue ~m0 , then (j) the state Am |j, m0 , Ei is an e-vec of J3 with eigenvalue ~(m + m0 ) (j) 0 (j) 0 0 0 J3 A(j) m |j, m , Ei = Am (~m + J3 ) |j, m , Ei = ~ m + m Am |j, m , Ei. (10.164) The J3 eigenvalues add.
10.23 Simple and Semi-Simple Lie Algebras An invariant subalgebra is a set of generators Xa whose commutator with every generator Yb of the group is a linear combination of the Xc ’s [Xa , Yb ] = ifabc Xc .
(10.165)
The whole algebra and the null algebra are trivial invariant subalgebras. An algebra with no non-trivial invariant subalgebras is a simple algebra. A simple algebra generates a simple group. An algebra that has no nontrivial abelian invariant subalgebras is a semi-simple algebra. A semisimple algebra generates a semi-simple group. Example: The symmetry group of the standard model of particle physics is a direct product of an SU (3) group that acts on colored fields, an SU (2) group that acts on left-handed quark and lepton fields, and a U (1) group that acts on fields that carry hypercharge. Each of these three groups, is an invariant subgroup of the full symmetry group SU (3)c ⊗ SU (2)` ⊗U (1)Y , and the last one is abelian. Thus the symmetry group of the standard model is neither simple nor semi-simple. A simple symmetry group has very nice properties, and for this reason physicists have invented grand unification in which a simple symmetry group G contains the symmetry
384
Group Theory
group of the standard model. Georgi and Glashow suggested the group SU (5) in 1976 (Howard Georgi, 1947– ; Sheldon Glashow, 1932– ). Others have proposed SO(10) and even bigger groups. 10.24 SU(3) The group SU (3) consists of all 3 × 3 unitary matrices of determinant unity. The generators Xa of the defining 3×3 representation are half the hermitian Gell-Mann matrices λa 1 Xa = λa (10.166) 2 which are 1 0 0 0 −i 0 0 1 0 λ1 = 1 0 0 , λ2 = i 0 0 , λ3 = 0 −1 0 , 0 0 0 0 0 0 0 0 0 0 0 1 0 0 −i 0 0 0 λ4 = 0 0 0 , λ5 = 0 0 0 , λ6 = 0 0 1 , 1 0 0 i 0 0 0 1 0 0 0 0 1 0 0 1 λ7 = 0 0 −i , λ8 = √ 0 1 0 . (10.167) 3 0 i 0 0 0 −2 (Murray Gell-Mann, 1929–. The eight generators Xa are orthogonal with the normalization constant k = 1/2 1 Tr (Xa Xb ) = δab (10.168) 2 and satisfy the commutation relation [Xa , Xb ] = ifabc Xc .
(10.169)
The trace formula (10.64) gives us the SU (3) structure constants as fabc = −2iTr ([Xa , Xb ]Xc ) .
(10.170)
While no two generators of SU (2) commute, two generators of SU (3) do. In the representation (10.166–10.167), X3 and X8 are diagonal and so commute [X3 , X8 ] = 0. They generate the Cartan subalgebra of SU (3).
(10.171)
10.25 SU(3) and Quarks
385
10.25 SU(3) and Quarks The generators defined by Eqs.(10.166 & 10.167) give us the 3 × 3 representation D(α) = exp (iαa Xa )
(10.172)
in which the sum is over the eight generators Xa a = 1, 2, . . . 8. This representation acts on complex 3-vectors and is called the 3. Note that if D(α1 )D(α2 ) = D(α3 )
(10.173)
then the complex conjugates of these matrices obey the same multiplication rule D∗ (α1 )D∗ (α2 ) = D∗ (α3 )
(10.174)
and so form another representation of SU (3). It turns out that (unlike in SU (2)) this representation is inequivalent to the 3; it is the 3. There are three quarks with masses less than about 100 MeV/c2 —the u, d, and s quarks. The other three quarks c, b, and t are more massive by factors of 12, 45, and 173. Nobody knows why. Gell-Mann suggested that the low-energy strong interactions were approximately invariant under unitary transformations of the three light quarks, which he represented by a 3, and of the three light anti-quarks, which he represented by a 3. He imagined that the eight light pseudo-scalar mesons, that is, the three pions π − , π 0 , π + , the 0 neutral η, and the four kaons K 0 , K + , K − K , were composed of a quark and an anti-quark. So they should transform as the tensor product 3 ⊗ 3 = 8 ⊕ 1.
(10.175)
He put the eight pseudo-scalar mesons into an 8. He imagined that the eight light baryons — the two nucleons N & P , the three sigmas Σ− , Σ0 , Σ+ , the neutral lambda Λ, and the two cascades Ξ− & Ξ0 were each made of three quarks. They should transform as the tensor product 3 ⊗ 3 ⊗ 3 = 10 ⊕ 8 ⊕ 8 ⊕ 1.
(10.176)
He put the eight light baryons into one of these 8’s. When he was writing these papers, there were nine spin-3/2 resonances with masses somewhat heavier than 1200 MeV/c2 — four ∆’s, three Σ∗ ’s, and two Ξ∗ ’s. He put these into the 10 and predicted the tenth and its mass. When a tenth spin3/2 resonance, the Ω− , was found with a mass close to his prediction of 1680 MeV/c2 , his SU (3) theory became wildly popular among high-energy
386
Group Theory
physicists. Within a few years, a SLAC team discovered quarks, and GellMann won the Nobel prize. 10.26 Cartan Subalgebra In any Lie group, the maximum set of mutually commuting generators Ha generate the Cartan subalgebra [Ha , Hb ] = 0
(10.177)
which is an abelian subalgebra. The number of generators in the Cartan subalgebra is the rank of the Lie algebra. The Cartan generators Ha can be simultaneously diagonalized, and their eigenvalues or diagonal elements are the weights Ha |µ, x, Di = µa |µ, x, Di
(10.178)
in which D labels the representation and x whatever other variables are needed to specify the state. The vector µ is the weight vector. The roots are the weights of the adjoint representation. 10.27 The Quaternions If z and w are any two complex numbers, then the 2 × 2 matrix z w q= −w∗ z ∗
(10.179)
is a quaternion. The quaternions are closed under addition and multiplication and under multiplication by a real number (problem 19), but not under multiplication by an arbitrary complex number. The squared norm of q is its determinant kqk2 = det q = |z|2 + |w|2 .
(10.180)
The matrix products q † q and q q † are the squared norm kqk2 multiplied by the 2 × 2 identity matrix q † q = q q † = kqk2 I
(10.181)
The 2 × 2 matrix iσ2 =
0 1 −1 0
(10.182)
provides another expression for kqk2 in terms of q and its transpose q T q T iσ2 q = kqk2 iσ2 .
(10.183)
10.28 The Symplectic Group Sp(2n)
387
Clearly kqk = 0 implies q = 0, and the norm of a product of quaternions is the product of their norms p p (10.184) kq1 q2 k = det(q1 q2 ) = det q1 det q2 = kq1 kkq2 k. The quaternions therefore form an associative division algebra (over the real numbers); the only others are the real numbers and the complex numbers; the octonions are a non-associative division algebra. One equivalently may use the Pauli matrices to define for any real vector x with four components a quaternion q(x) as q(x) = x0 − iσk xk = x0 − iσ · x 0 x − ix3 −x2 − ix1 = . x2 − ix1 x0 + ix3
(10.185)
The product rule (10.115) for the Pauli matrices tells us that the product of two quaternions is q(x) q(y) = (x0 − iσ · x)(y 0 − iσ · y) = x0 y 0 − iσ · (y 0 x + x0 y) − i(x × y) · σ − x · y (10.186) so their commutator is [q(x), q(y)] = −2i(x × y) · σ.
(10.187)
10.28 The Symplectic Group Sp(2n) The symplectic group Sp(2n) consists of 2n × 2n matrices W that map ntuples q of quaternions into n-tuples q 0 = W q of quaternions with the same value of the quadratic quaternionic form kq 0 k2 = kq10 k2 + kq20 k2 + · · · + kqn0 k2 = kq1 k2 + kq2 k2 + · · · + kqn k2 = kqk2 . (10.188) By (10.181), the quadratic form kq 0 k2 times the 2 × 2 identity matrix I is equal to the hermitian form q 0† q 0 kq 0 k2 I = q 0† q 0 = q10† q10 + · · · + qn0† qn0 = q † W † W q
(10.189)
and so any matrix W that is both a 2n × 2n unitary matrix and an n × n matrix of quaternions keeps kq 0 k2 = kqk2 kq 0 k2 I = q † W † W q = q † q = kqk2 I.
(10.190)
388
Group Theory
The group Sp(2n) thus consists of all 2n × 2n unitary matrices that also are n × n matrices of quaternions. (This last requirement is needed so that q 0 = W q is an n-tuple of quaternions.) The generators Xa of the symplectic group Sp(2n) are 2n × 2n directproduct matrices of the form I ⊗ A,
σ1 ⊗ S1 ,
σ2 ⊗ S2 ,
and σ3 ⊗ S3
(10.191)
in which I is the 2 × 2 identity matrix, the three σi ’s are the Pauli matrices, A is an imaginary n × n anti-symmetric matrix, and the Si are n × n real symmetric matrices. These generators Xa close under commutation [Xa , Xb ] = ifabc Xc .
(10.192)
Any imaginary linear combination iαa Xa of these generators is not only a 2n×2n anti-hermitian matrix but also an n×n matrix of quaternions. Thus the matrices D(α) = eiαa Xa
(10.193)
are both unitary 2n × 2n matrices and n × n quaternionic matrices and so are elements of the group Sp(2n). Example: There is no 1 × 1 anti-symmetric matrix, and there is only one 1 × 1 symmetric matrix. So the generators Xa of the group Sp(2) are just the Pauli matrices Xa = σa , and Sp(2) = SU (2). Since the elements g(α) of the group SU (2) are quaternions of unit norm (problem 17), it follows that the product g(α)q is a quaternion with the same squared norm kg(α)qk2 = det(g(α)q) = det(g(α)) det q = det q = kqk2 .
(10.194)
Example: Apart from scale factors, there are three real symmetric 2 × 2 matrices S1 = σ1 , S2 = I, and S3 = σ3 and one imaginary anti-symmetric 2 × 2 matrix A = σ2 . So there are 10 generators of Sp(4) = SO(5) 0 −iI 0 σk X1 = I ⊗ σ2 = , Xk1 = σk ⊗ σ1 = iI 0 σk 0 σk 0 σk 0 Xk2 = σk ⊗ I = , Xk3 = σk ⊗ σ3 = (10.195) 0 σk 0 −σk where k runs from 1 to 3. We may see Sp(2n) from a different viewpoint if we use (10.183) to write the quadratic form kqk2 in terms of a 2n × 2n matrix J that has n copies of
389
10.28 The Symplectic Group Sp(2n)
iσ2 on its 2 × 2 diagonal
iσ2 0 0 0 iσ2 0 0 0 iσ2 J = 0 0 0 .. .. .. . . . 0 0 0
0 0 0 .. . .. . 0
... ... ... ... .. . 0
0 0 0
0 0 iσ2
(10.196)
(and zeros elsewhere) as kqk2 J = q T Jq.
(10.197)
Thus any n × n matrix of quaternions W that satisfies W T JW = J
(10.198)
kW qk2 J = q T W T JW q = q T Jq = kqk2 J
(10.199)
also satisfies
and so leaves invariant the quadratic form (10.188). The group Sp(2n) therefore consists of all 2n × 2n matrices W that satisfy (10.198) and that also are n × n matrices of quaternions. The symplectic group is something of a physics orphan. Its only wellknown application is in classical mechanics, and that application uses the non-compact symplectic group Sp(2n, R), not the compact symplectic group Sp(2n). The elements of Sp(2n, R) are all real 2n × 2n matrices T that satisfy T T JT = J with the J of (10.196); those near the identity are of the form T = exp(JS) in which S is a 2n × 2n real symmetric matrix (problem 21). Example: The matrices cosh θ sinh θ T =± (10.200) sinh θ cosh θ are elements of the non-compact symplectic group Sp(2, R) (problem 22). A dynamical map M takes the phase-space 2n-tuple z = (q1 , p1 , . . . , qn , pn ) from z(t1 ) to z(t2 ). One may show that M’s jacobian matrix Mab =
∂za (t2 ) ∂zb (t1 )
(10.201)
is in Sp(2n, R) if and only if its dynamics are hamiltonian q˙a =
∂H ∂pa
and p˙a = −
∂H ∂qa
(10.202)
390
Group Theory
(Carl Jacobi, 1804–1851; William Hamilton, 1805–1865, inventor of quaternions).
10.29 Compact Simple Lie Groups ´ Cartan (1869–1951) showed that all compact, simple Lie groups fall into Elie four infinite classes and five discrete cases. For n = 1, 2, . . ., his four classes are • An = SU (n + 1) which are (n + 1) × (n + 1) unitary matrices with unit determinant, • Bn = SO(2n + 1) which are (2n + 1) × (2n + 1) orthogonal matrices with unit determinant, • Cn = Sp(2n) which are 2n × 2n symplectic matrices, and • Dn = SO(2n) which are 2n × 2n orthogonal matrices with unit determinant. The five discrete cases are the exceptional groups G2 , F4 , E6 , E7 , and E8 . The exceptional groups are associated with the octonians a + bα iα
(10.203)
where the α-sum runs from 1 to 7; the eight numbers a and bα are real; and the seven iα ’s obey the multiplication law iα iβ = −δαβ + gαβγ iγ
(10.204)
in which gαβγ is totally anti-symmetric with g123 = g247 = g451 = g562 = g634 = g375 = g716 = 1.
(10.205)
Like the quaternions and the complex numbers, the octonians form a division algebra with an absolute value 1/2 (10.206) |a + bα iα | = a2 + b2α that satisfies |AB| = |A||B|
(10.207)
but they lack associativity. The group G2 is the subgroup of SO(7) that leaves the gαβγ ’s of (10.204) invariant.
10.30 Group Integration
391
10.30 Group Integration Suppose we want to integrate some function f (g) over a group. Naturally, we’d want to do so in a way that gave equal weight to every element of the group. In particular, if g 0 is any group element, we’d want the integral of the shifted function f (g 0 g) to be the same as the integral of f (g) Z Z f (g) dg = f (g 0 g) dg. (10.208) Such a measure dg is said to be left invariant (Creutz, 1983, chap. 8). Let’s use the letters a = a1 , . . . , an , b = b1 , . . . , bn , and so forth to label the elements g(a), g(b), so that an integral over the group is Z Z f (g) dg = f (g(a)) m(a) dn a (10.209) in which m(a) is the the left-invariant measure and the integration is over the whole set of a’s that label all the elements of the group. To find the left-invariant measure m(a), we use the multiplication law of the group g(a(c, b)) = g(c) g(b)
(10.210)
and impose the requirement (10.208) of left invariance with g 0 ≡ g(c) Z Z Z n n f (g(b)) m(b) d b = f (g(c)g(b)) m(b) d b = f (g(a(c, b))) m(b) dn b. (10.211) We change variables from b to a by using the jacobian det(∂b/∂a) which gives us dn b = det(∂b/∂a) dn a Z Z f (g(b)) m(b) dn b = f (g(a)) det(∂b/∂a) m(b) dn a. (10.212) Replacing b by a on the left-hand side of this equation, we find m(a) = det(∂b/∂a) m(b)
(10.213)
or since det(∂b/∂a) = 1/ det(∂a(c, b)/∂b) m(a(c, b)) = m(b)/ det(∂a(c, b)/∂b).
(10.214)
So if we let g(b) → g(0) = e, the identity element of the group, and set m(e) = 1, then we find for the measure m(c) = m(a(c, b))|b=0 = 1/ det(∂a(c, b)/∂b)|b=0 .
(10.215)
392
Group Theory
Example 10.4 (The Invariant Measure for SU(2)) A general element of the group SU (2) is given by (10.117) as σ θ θ exp i θ · = I cos + i θˆ · σ sin . (10.216) 2 2 2 Setting a0 = cos(θ/2) and a = θˆ sin(θ/2), we have g(a) = a0 + i a · σ
(10.217)
in which a2 ≡ a20 + a · a = 1. Thus, the parameter space for SU (2) is the unit sphere S3 in four dimensions. Its invariant measure is Z Z Z 2 4 2 2 4 δ(1 − a ) d a = δ(1 − a0 − a ) d a = (1 − a2 )−1/2 d3 a (10.218) or m(a) = (1 − a2 )−1/2 . We also can write the arbitrary element (10.217) of SU (2) as p g(a) = ± 1 − a2 + i a · σ and the group-multiplication law (10.210) as p p p 1 − a2 + i a · σ = 1 − c2 + i c · σ 1 − b2 + i b · σ .
(10.219)
(10.220)
(10.221)
Thus, by multiplying both sides of this equation by σi and taking the trace, we find (problem 23) that the parameters a(c, b) that describe the product g(c) g(b) are p p (10.222) a(c, b) = 1 − c2 b + 1 − b2 c + c × b. To compute the jacobian of our formula (10.215) for the invariant measure, we differentiate this expression (10.222) at b = 0 and so find (problem 24) m(a) = 1/ det(∂a(c, b)/∂b)|b=0 = (1 − a2 )−1/2
(10.223)
as the left-invariant measure in agreement with (10.219).
10.31 The Lorentz Group The Lorentz group O(3, 1) is the set of all linear transformations L that leave invariant the Minkowski inner product xy ≡ x · y − x0 y 0 = xT ηy
(10.224)
10.31 The Lorentz Group
in which η is the diagonal matrix −1 0 η= 0 0
0 1 0 0
0 0 1 0
0 0 . 0 1
393
(10.225)
So L is in O(3, 1) if for all 4-vectors x and y (Lx) T ηL y = xT LT η Ly = xT η y.
(10.226)
Since x and y are arbitrary, this condition amounts to LT η L = η.
(10.227)
Taking the determinant of both sides and using the transpose (1.195) and product (1.208) rules, we have (det L)2 = 1.
(10.228)
So det L = ±1, and every Lorentz transformation L has an inverse. Multiplying (10.227) by η, we find ηLT ηL = η 2 = I
(10.229)
L−1 = ηLT η.
(10.230)
which identifies L−1 as
The subgroup of O(3, 1) with det L = 1 is the proper Lorentz group SO(3, 1). To find its Lie algebra, we consider a Lorentz matrix L = I +ω that differs from the identity matrix I by a tiny matrix ω and require it to satisfy the condition (10.227) for membership in the Lorentz group I + ω T η (I + ω) = η + ω T η + η ω + ω T ω = η. (10.231) Neglecting ω T ω, we have ω T η = −η ω or since η 2 = I ω T = − η ω η.
(10.232)
This equation says (problem 26) that under transposition the time-time and space-space elements of ω change sign, while the time-space and space-time elements do not. That is, the tiny matrix ω must be for infinitesimal θ and λ a linear combination ω =θ·R+λ·B
(10.233)
394
Group Theory
of the six matrices 0 0 R1 = 0 0
0 0 0 0
0 0 0 0 , 0 −1 1 0
0 0 0 0 R2 = 0 0 0 −1
0 0 0 0
0 1 , 0 0
0 0 R3 = 0 0
0 0 0 0 −1 0 1 0 0 0 0 0 (10.234)
and 1 0 0 0 (10.235) which satisfy condition (10.232). The three Rj are 4 × 4 versions of the rotation generators (10.87); the three Bj generate Lorentz boosts. If we write L = I + ω as 0 1 B1 = 0 0
1 0 0 0
0 0 0 0
0 0 , 0 0
0 0 B2 = 1 0
0 0 0 0
1 0 0 0
0 0 , 0 0
0 0 B3 = 0 1
L = I − iθ` iR` − iλj iBj ≡ I − iθ` J` − iλj Kj
0 0 0 0
0 0 0 0
(10.236)
then the three matrices J` = iR` are imaginary and anti-symmetric, and therefore hermitian. But the three matrices Kj = iBj are imaginary and symmetric, and so are anti-hermitian. Thus, the 4 × 4 matrix L is not unitary. The reason is that the Lorentz group is not compact. One may verify (problem 27) that the six generators J` and Kj satisfy three sets of commutation relations: [Ji , Jj ] = iijk Jk
(10.237)
[Ji , Kj ] = iijk Kk
(10.238)
[Ki , Kj ] = − iijk Jk .
(10.239)
The first (10.237) says that the three J` generate the rotation group SO(3); the second (10.238) says that the three boost generators transform as a 3-vector under SO(3); and the third (10.239) mumbles something to the effect that a null sequence of infinitesimal boosts is a rotation. These three sets of commutation relations form the Lie algebra of the Lorentz group SO(3, 1). Incidentally, one may show (problem 28) that if J and K satisfy these commutation relations (10.237–10.239), then so do J
and
− K.
(10.240)
10.31 The Lorentz Group
395
The infinitesimal Lorentz transformation (10.236) is the 4 × 4 matrix 1 λ1 λ2 λ3 λ1 1 − θ3 θ2 . (10.241) L = I + ω = I + θ` R` + λj Bj = λ2 θ3 1 − θ1 λ 3 − θ2 θ1 1 It moves any 4-vector x to x0 = L x or in components x0a = Lab xb x00 = x0 + λ1 x1 + λ2 x2 + λ3 x3 x01 = λ1 x0 + x1 − θ3 x2 + θ2 x3 x02 = λ2 x0 + θ3 x1 + x2 − θ1 x3 x03 = λ3 x0 − θ2 x1 + θ1 x2 + x3 .
(10.242)
More succinctly with t = x0 , this is t0 = t + λ · x x0 = x + tλ + θ ∧ x
(10.243)
in which ∧ ≡ × means cross-product. For arbitrary real θ and λ, the matrices L = e−iθ` J` −iλj Kj
(10.244)
form the subgroup of SO(3, 1) that is connected to the identity matrix I. This subgroup preserves the sign of the time of any time-like vector, that is, if x2 < 0, and y = Lx, then y 0 x0 > 0. It is called the proper orthochronous Lorentz group. The rest of the (homogeneous) Lorentz group can be obtained from it by space P, time T , and space-time PT reflections. The task of finding all the finite-dimensional irreducible representations of the proper orthochronous homogeneous Lorentz group becomes vastly simpler when we write the commutation relations (10.237–10.239) in terms of the non-hermitian matrices J`± =
1 (J` ± iK` ) 2
(10.245)
which generate two independent rotation groups [Ji+ , Jj+ ] = iijk Jk+ [Ji− , Jj− ] = iijk Jk− [Ji+ , Jj− ] = 0.
(10.246)
Thus the Lie algebra of the Lorentz group is equivalent to two copies of
396
Group Theory
the Lie algebra (10.99) of SU (2) but with generators J ± that are not hermitian. The finite-dimensional irreducible representations of the proper orthochronous homogeneous Lorentz group are the direct products 0
0
D(j,j ) = D(j) ⊗ D(j )
(10.247) 0
of the non-unitary representations D(j) and D(j ) generated by these Lie algebras of SU (2). Under a Lorentz transformation L, a field ψ` (x) that 0 transforms under the D(j,j ) representation of the Lorentz group responds as (j,j 0 )
U (L) ψ` (x) U −1 (L) = D``0 (L−1 ) ψ`0 (Lx).
(10.248)
Although these representations are not unitary, the SO(3) subgroup of the Lorentz group is represented unitarily by the hermitian matrices J = J + + J −.
(10.249)
(j,j 0 )
Thus, the representation D describes objects of the spins s that can arise from the direct product of spin-j with spin-j 0 (Weinberg, 1995, p. 231) s = j + j 0 , j + j 0 − 1, . . . , |j − j 0 |.
(10.250)
For instance, D(0,0) describes a spinless field or particle, while D(1/2,0) and D(0,1/2) respectively describe right-handed and left-handed spin-1/2 fields or particles. The representation D(1/2,1/2) describes objects of spin 1 and spin 0—the spatial and time components of a 4-vector. The generators Kj of the Lorentz boosts are related to J ± by K = −iJ + + iJ −
(10.251)
which like (10.249) follows from the definition (10.245). The interchange of J + and J − replaces the generators J and K with J and − K, a substitution that we know (10.240) is legitimate. 10.32 Two-Dimensional Representations of the Lorentz Group The generators of the representation D(1/2,0) with j = 1/2 and j 0 = 0 are given by (10.249 & 10.251) with J + = σ/2 and J − = 0; they are 1 J= σ 2
1 and K = −i σ. 2
(10.252)
Thus 2×2 matrix D(1/2,0) that represents the Lorentz transformation (10.244) L = e−iθ` J` −iλj Kj
(10.253)
397
10.32 Two-Dimensional Representations of the Lorentz Group
is D(1/2,0) (θ, λ) = exp (−iθ · σ/2 − λ · σ/2) .
(10.254)
And so the generic D(1/2,0) matrix is D(1/2,0) (θ, λ) = e−z·σ/2
(10.255)
with λ = 0, the “standard” p boost that takes the 4-vector k = (m, 0) to p = (p0 , p), where p0 = m2 + p2 , is a boost in the pˆ direction B(p) = R(p) ˆ B3 (p0 ) R−1 (p) ˆ = exp (α pˆ · B)
(10.256)
in which cosh α = p0 /m and sinh α = |p|/m, as one may show by expanding the exponential (problem 30). For λ = α p, ˆ one may show (problem 31) that the matrix D(1/2,0) (0, λ) is ˆ D(1/2,0) (0, α p) ˆ = e−αp·σ/2 = I cosh(α/2) − pˆ · σ sinh(α/2) p p = I (p0 + m)/(2m) − pˆ · σ (p0 − m)/(2m) p0 + m − pˆ · σ = p (10.257) 2m(p0 + m)
in the third line of which the 2 × 2 identity matrix I is suppressed. Under D(1/2,0) , the vector (−I, σ) transforms like a 4-vector. For tiny θ and λ, one may show (problem 33) that the vector (−I, σ) transforms as D†(1/2,0) (θ, λ)(−I)D(1/2,0) (θ, λ) = −I + λ · σ D†(1/2,0) (θ, λ) σ D(1/2,0) (θ, λ) = σ + (−I)λ + θ × σ (10.258) which is how the 4-vector (t, x) transforms (10.243). Under a finite Lorentz transformation L the 4-vector S a ≡ (−I, σ) becomes D†(1/2,0) (L) S a D(1/2,0) (L) = Lab S b .
(10.259)
A field ξ(x) that responds to a unitary Lorentz transformation U (L) as U (L) ξ(x) U −1 (L) = D(1/2,0) (L−1 ) ξ(Lx)
(10.260)
is called a left-handed Weyl spinor. One may show (problem 34) that the action density L` (x) = i ξ † (x) (∂0 I − ∇ · σ) ξ(x)
(10.261)
398
Group Theory
is Lorentz covariant, that is U (L) L` (x) U −1 (L) = L` (Lx).
(10.262)
Example 10.6 (Why L` is Lorentz covariant) To see why, we first note that the derivatives ∂b0 in L` (Lx) are with respect to x0 = Lx. Since the inverse matrix L−1 takes x0 back to x = L−1 x0 or in tensor notation xa = L−1ab x0b , the derivative ∂b0 is ∂b0 =
∂ ∂xa ∂ ∂ = = L−1ab a = ∂a L−1ab . ∂x ∂x0b ∂x0b ∂xa
(10.263)
Now using the abbreviation ∂0 I − ∇ · σ ≡ − ∂a S a and the transformation laws (10.259 & 10.260), we have U (L) L` (x) U −1 (L) = i ξ † (Lx)D(1/2,0)† (L−1 )( − ∂a S a )D(1/2,0) (L−1 ) ξ(Lx) = i ξ † (Lx)( − ∂a L−1ab S b ) ξ(Lx) = i ξ † (Lx)( − ∂b0 S b ) ξ(Lx) = L` (Lx)
(10.264)
which shows that L` is Lorentz covariant. Incidentally, the rule (10.263) ensures, among other things, that the divergence ∂a V a is invariant (∂a V a )0 = ∂a0 V 0a = ∂b L−1b a Lac V c = ∂b δ b c V c = ∂b V b .
(10.265)
Example 10.7 (Why ξ is left handed) The space-time integral S of the action density L` is stationary when ξ(x) satisfies the wave equation (∂0 I − ∇ · σ) ξ(x) = 0
(10.266)
(E + p · σ) ξ(p) = 0.
(10.267)
or in momentum space
Multiplying from the left by (E − p · σ), we see that the energy of a particle created or annihilated by the field ξ is the same as its momentum E = |p| in accord with the absence of a mass term in the action density L` . And because the spin of the particle is represented by the matrix J = σ/2, the momentum-space relation (10.267) says that ξ(p) is an eigenvector of pˆ · J pˆ · J ξ(p) = −
1 ξ(p) 2
(10.268)
with eigenvalue − 1/2. A particle whose spin is opposite to its momentum is said to have negative helicity or to be left handed. Nearly massless neutrinos are nearly left handed.
399
10.32 Two-Dimensional Representations of the Lorentz Group
One may add to this action density the Majorana mass term † LM (x) = − m ξ T (x) σ2 ξ(x) − m ξ T (x) σ2 ξ(x) (10.269) which is Lorentz covariant because the matrices σ1 and σ3 anti-commute with σ2 which is anti-symmetric (problem 36). Since charge is conserved, only neutral fields like neutrinos can have Majorana mass terms. The generators of the representation D(0,1/2) with j = 0 and j 0 = 1/2 are given by (10.249 & 10.251) with J + = 0 and J − = σ/2; they are 1 J= σ 2
1 and K = i σ. 2
(10.270)
Thus 2 × 2 matrix D(0,1/2) (θ, λ) that represents the Lorentz transformation (10.244) L = e−iθ` J` −iλj Kj
(10.271)
is D(0,1/2) (θ, λ) = exp (−iθ · σ/2 + λ · σ/2) = D(1/2,0) (θ, − λ)
(10.272)
which differs from D(1/2,0) (θ, λ) merely by the sign of λ. The generic D(0,1/2) matrix is the complex unimodular 2 × 2 matrix D(0,1/2) (θ, λ) = ez
∗ ·σ/2
(10.273)
with λ = 0, the “standard” boost (10.256) that takes the 4-vector k = (m, 0) to p = (p0 , p) is the 4 × 4 matrix B(p) = exp (α pˆ · B) in which cosh α = p0 /m and sinh α = |p|/m. This Lorentz transformation with θ = 0 and λ = α p, ˆ is represented by the matrix (problem 32) ˆ D(0,1/2) (0, α p) ˆ = eαp·σ/2 = I cosh(α/2) + pˆ · σ sinh(α/2) p p = I (p0 + m)/(2m) + pˆ · σ (p0 − m)/(2m) p0 + m + pˆ · σ = p (10.274) 2m(p0 + m)
in the third line of which the 2 × 2 identity matrix I is suppressed. Under D(0,1/2) , the vector (I, σ) transforms as a 4-vector; for tiny z D†(0,1/2) (θ, λ) I D(0,1/2) (θ, λ) = I + λ · σ D†(0,1/2) (θ, λ) σ D(0,1/2) (θ, λ) = σ + Iλ + θ × σ as in (10.243).
(10.275)
400
Group Theory
A field ζ(x) that responds to a unitary Lorentz transformation U (L) as U (L) ζ(x) U −1 (L) = D(0,1/2) (L−1 ) ζ(Lx)
(10.276)
is called a right-handed Weyl spinor. One may show (problem 35) that the action density Lr (x) = i ζ † (x) (∂0 I + ∇ · σ) ζ(x)
(10.277)
U (L) L(x) U −1 (L) = L(Lx).
(10.278)
is Lorentz covariant
Example 10.9 (Why ζ is right handed) An argument like that of example (10.7) shows that the field ζ(x) satisfies the wave equation (∂0 I + ∇ · σ) ζ(x) = 0
(10.279)
(E − p · σ) ζ(p) = 0.
(10.280)
or in momentum space
Thus, E = |p|, and ζ(p) is an eigenvector of pˆ · J pˆ · J ζ(p) =
1 ζ(p) 2
(10.281)
with eigenvalue 1/2. A particle whose spin is parallel to its momentum is said to have positive helicity or to be right handed. Nearly massless anti-neutrinos are nearly right handed. The Majorana mass term LM (x) = − m ζ T (x) σ2 ζ(x) − m ζ T (x) σ2 ζ(x)
†
(10.282)
like (10.269) is Lorentz covariant.
10.33 The Dirac Representation of the Lorentz Group Dirac’s representation of SO(3, 1) is the direct sum D(1/2,0) ⊕ D(0,1/2) of D(1/2,0) and D(0,1/2) . Its generators are the 4 × 4 matrices i −σ 0 1 σ 0 J= and K = . (10.283) 0 σ 2 0 σ 2 Dirac’s representation uses the Clifford algebra of the gamma matrices γ a which satisfy the anti-commutation relation {γ a , γ b } ≡ γ a γ b + γ b γ a = 2η ab
(10.284)
10.33 The Dirac Representation of the Lorentz Group
401
in which η is the 4 × 4 diagonal matrix (10.225) with η 00 = −1 and η jj = 1 for j = 1, 2, and 3. Remarkably, the generators of the Lorentz group J ij = ijk Jk
and J 0j = Kj .
(10.285)
may be represented as commutators of gamma matrices i J ab = − [γ a , γ b ]. 4
(10.286)
They transform the gamma matrices as a 4-vector [J ab , γ c ] = −iγ a η bc + iγ b η ac
(10.287)
(problem 37) and satisfy the commutation relations i[J ab , J cd ] = η bc J ad − η ac J bd − η da J cb + η db J ca
(10.288)
of the Lorentz group (Weinberg, 1995, p. 213–217) (problem 38). The gamma matrices γ a are not unique; if S is any 4 × 4 matrix with an inverse, then the set γ 0a ≡ Sγ a S −1 also satisfies the definition (10.284). The choice 0 1 0 σ 0 γ = −i and γ = −i (10.289) 1 0 −σ 0 is useful in high-energy physics because it lets us assemble a left-handed spinor and a right-handed spinor into a 4-component Majorana spinor ξ ψM = . (10.290) ζ (1)
(2)
If two Majorana spinors ψM and ψM have the same mass, then one may combine them into a Dirac spinor 1 ξ (1) + iξ (2) 1 (1) ξD (2) ψD = √ ψM + iψM = √ . (10.291) (1) + iζ (2) = ζ ζ 2 2 D The action for a 4-spinor ψ, whether Majorana or Dirac, often is written as L = − ψ (γ a ∂a + m) ψ ≡ − ψ (6 ∂ + m) ψ
(10.292)
in which ψ ≡ iψ † γ 0 = ψ †
0 1 = ζ † ξ† . 1 0
(10.293)
402
Group Theory
The kinetic part is the sum of the left-handed L` and right-handed Lr action densities (10.261 & 10.277) − ψ γ a ∂a ψ = iξ † (∂0 I − ∇ · σ) ξ + i ζ † (∂0 I + ∇ · σ) ζ.
(10.294)
The Dirac mass term − m ψψ = − m ζ † ξ + ξ † ζ
(10.295)
conserves charge even if ψ is a charged Dirac 4-spinor ψD 1 ψD = √ ψ (1) + iψ (2) 2
(10.296)
in which case it is † † ξD + ξD ζD − mψ D ψD = − m ζD m h (1)† = − ζ − iζ (2)† ξ (1) + iξ (2) 2 i + ξ (1)† − iξ (2)†
ζ (1) + iζ (2)
.
(10.297)
One may show (problem 40) that if ξ is a left-handed spinor transforming as (10.260), then the spinor †! 0 −i ξ1 ∗ ζ = σ2 ξ ≡ (10.298) i 0 ξ2† transforms as a right-handed spinor (10.276), that is ∗ ∗ ez ·σ/2 σ2 ξ ∗ = σ2 e−z·σ/2 ξ .
(10.299)
Similarly, ξ = σ2 ζ ∗ is left handed if ζ is right handed. Thus ζ † ξ = ξ T σ2 ξ = ζ † σ2 ζ ∗ .
(10.300)
One therefore can write a Dirac mass term (10.301) as a specific combination of Majorana mass terms m h (1) T − mψ D ψD = − ξ − iξ (2) T σ2 ξ (1) + iξ (2) 2 i + ζ (1) T − iζ (2) T σ2 ζ (1) + iζ (2) (10.301) or entirely in terms of either left-handed ξ or right-handed ζ spinors.
403
10.34 The Poincar´e Group
10.34 The Poincar´ e Group The elements of the Poincar´e group are products of Lorentz transformations and translations in space and time. The Lie algebra of the Poincar´e group therefore includes the commutators (10.237–10.239) of the generators J and K of the Lorentz group as well as the hamiltonian H and the momentum operator P that respectively generate translations in time and space. Suppose T (y) is a translation that takes a 4-vector x to x + y and T (z) is a translation that takes a 4-vector x to x + z. Then T (z)T (y) and T (y)T (z) both take x to x + y + z. So if a translation T (y) = T (t, y) is represented by a unitary operator U (t, y) = exp(iHt − iP · y), then the hamiltonian H and the momentum operator P commute with each other [H, P j ] = 0
and [P i , P j ] = 0.
(10.302)
We can figure out the commutation relations of H and P with the angularmomentum J and boost K operators by realizing that P a = (H, P ) is a 4-vector. Let U (θ, λ) = e−iθ·J−iλ·K
(10.303)
be the (infinite-dimensional) unitary operator that represents (in Hilbert space) the infinitesimal Lorentz transformation L=I +θ·R+λ·B
(10.304)
where R and B are the six 4 × 4 matrices (10.234 & 10.235). Then because P is a 4-vector under Lorentz transformations, we have U −1 (θ, λ)P U (θ, λ) = e+iθ·J+iλ·K P e−iθ·J−iλ·K = (I + θ · R + λ · B) P (10.305) or using (10.275) (I + iθ · J + iλ · K) H (I − iθ · J − iλ · K) = H + λ · P
(10.306)
(I + iθ · J + iλ · K) P (I − iθ · J − iλ · K) = P + Hλ + θ ∧ P . Thus, one finds (problem 40) that H is invariant under rotations, while P transforms as a 3-vector [Ji , H] = 0
and [Ji , Pj ] = iijk Pk
(10.307)
and that [Ki , H] = −iPi
and [Ki , Pj ] = iδij H.
(10.308)
By combining these equations with (10.288), one may write (problem 41)
404
Group Theory
the Lie algebra of the Poincar´e group as i[J ab , J cd ] = η bc J ad − η ac J bd − η da J cb + η db J ca i[P a , J bc ] = η ab P c − η ac P b [P a , P b ] = 0.
(10.309)
10.35 Problems 1. Show that all n × n (real) orthogonal matrices O leave invariant the quadratic form x21 + x22 + · · · + x2n , that is, that if x0 = Ox, then x02 = x2 . 2. Show that the set of all n × n orthogonal matrices forms a group. 3. Show that all n × n unitary matrices U leave invariant the quadratic form |x1 |2 + |x2 |2 + · · · + |xn |2 , that is, that if x0 = U x, then |x|02 = |x|2 . 4. Show that the set of all n × n unitary matrices forms a group. 5. Show that the set of all n × n unitary matrices with unit determinant forms a group. 6. Invent a group of order 3 and compute its multiplication table. For extra credit, prove that the group is unique. 7. Show that the relation (10.18) between two equivalent representations is an isomorphism. 8. Suppose that D1 and D2 are equivalent, irreducible representations of a finite group G so that D2 (g) = SD1 (g)S −1
∀g ∈ G.
(10.310)
What can you say about a matrix A that satisfies D2 (g) A = A D1 (g)
∀g ∈ G?
9. Find all components of the matrix exp(iαA) in which 0 0 −i A = 0 0 0 . i 0 0
(10.311)
(10.312)
10. If [A, B] = B, find eiαA Be−iαA . Hint: what are the α-derivatives of this expression (10.313)?
(10.313)
10.35 Problems
405
11. Show that the tensor-product matrix (10.29) of two representations D1 and D2 is a representation. 12. Find a 4 × 4 matrix S that relates the tensor-product representation D 1 1 to the direct sum D1 ⊕ D0 . 2⊗2
13. Find the generators in the adjoint representation of the group with structure constants fabc = abc
(10.314)
where a, b, c run from 1 to 3. Hint: The answer is three 3 × 3 matrices Xa , often written as La . 14. Show that the generators (10.89) satisfy the commutation relations (10.92). 15. Show that the demonstrated equation (10.97) implies the commutation relation (10.98). 16. Use the Cayley-Hamilton theorem (1.306) to show that the 3 × 3 matrix (10.95) that represents a right-handed rotation of θ radians about the axis θ is given by (10.96). 17. Show that every 2 × 2 unitary matrix of unit determinant is a quaternion of unit norm. 18. For the group SU (3), find the structure constants f123 and f231 . 19. Show that the quaternions as defined by (10.179) are closed under addition and multiplication and that the product xq is a quaternion if x is real and q is a quaternion. 20. Show that the generators (10.191) of Sp(2n) obey commutation relations of the form (10.192) for some real structure constants fabc . 21. Show that for 0 < 1, the real 2n × 2n matrix T = exp(JS) satisfies T T JT = J (at least up to terms of order 2 ) and so is in Sp(2n, R). 22. Show that the matrices T of (10.200) are in Sp(2, R). 23. Use the parametrization (10.220) of the group SU (2), show that the parameters a(c, b) that describe the product g(a(c, b)) = g(c) g(b) are those of (10.222). 24. Use formulas (10.222) and (10.215) to show that the left-invariant measure for SU (2) is given by (10.223).
406
Group Theory
25. In tensor notation, which is explained in chapter 11, the condition (10.232) a that I + ω be an infinitesimal Lorentz transformation reads ω T b = ωb a = − ηbc ω c d η da in which sums over c and d from 0 to 3 are understood. In this notation, the matrix ηef lowers indices and η gh raises them, so that ωb a = − ωbd η da . (Both ηef and η gh are numerically equal to the matrix η displayed in equation (10.225).) Multiply both sides of this equation by ηae and use the relation η da ηae = η de ≡ δ de to show that the matrix ωab with both indices lowered (or raised) is anti-symmetric, that is, ωba = − ωab
and ω ba = − ω ab .
(10.315)
26. Show that the six matrices (10.234) and (10.235) satisfy the SO(3, 1) condition (10.232). 27. Show that the six generators J and K obey the commutations relations (10.237–10.239). 28. Show that if J and K satisfy the commutation relations (10.237–10.239) of the Lie algebra of the Lorentz group, then so do J and − K. 29. Show that the six generators J + and J − obey the commutations relations (10.246). 30. Relate the parameter α in the definition (10.256) of the standard boost B(p) to the 4-vector p and the mass m. 31. Derive the formulas for D(1/2,0) (0, α p) ˆ given in equation (10.257). 32. Derive the formulas for D(0,1/2) (0, α p) ˆ given in equation (10.257). 33. For infinitesimal complex z, derive the 4-vector properties (10.258 & 10.275) of (−I, σ) under D(1/2,0) and of (I, σ) under D(0,1/2) . 34. Show that under the unitary Lorentz transformation (10.260), the action density (10.277) is Lorentz covariant (10.278). 35. Show that under the unitary Lorentz transformation (10.276), the action density (10.261) is Lorentz covariant (10.262). 36. Show that under the unitary Lorentz transformations (10.260 & 10.276), the Majorana mass terms (10.269 & 10.282) are Lorentz covariant. 37. Show that the definitions of the gamma matrices (10.284) and of the generators (10.286) imply that the gamma matrices transform as a 4vector under Lorentz transformations (10.287).
10.35 Problems
407
38. Show that (10.286) and (10.287) imply that the generators J ab satisfy the commutation relations of the Lorentz group. 39. Show that the spinor ζ = σ2 ξ ∗ defined by (10.298) is right handed (10.276) if ξ is left handed (10.260. 40. Use (10.306) to get (10.307 & 10.308). 41. Derive (10.309) from (10.288, 10.302, & 10.308).
408
Group Theory
Figure 10.1 Demonstration of equation (10.97) and the commutation relation (10.98). Upper left: black ball with a white stick pointing in the y-direction; the x-axis is to the reader’s left, the z-axis is vertical. Upper right: ball after a small right-handed rotation about the x-axis. Center left: ball after that rotation is followed by a small right-handed rotation about the y-axis. Center right: ball after these rotations are followed by a small left-handed rotation about the x-axis. Bottom: ball after these rotations are followed by a small left-handed rotation about the y-axis. The approximate effect of the four rotations is a small left-handed rotation about the z-axis.
11 Tensors and Local Symmetries
11.1 Points and Coordinates 11.2 Points A point on a curved surface or in a curved space also is a point in a flat space of higher dimension. This big, flat space is the embedding space. For instance, a point p on a sphere also is a point in three-dimensional space. Since we always can add extra dimensions, the embedding space is arbitrary. For instance, a point p on a sphere also is a point in four-dimensional space-time. One could add more dimensions to p, but we shall use as few dimensions as possible, three in the case of a sphere. Whitney’s embedding theorem assures us that every n-dimensional connected, smooth manifold can be embedded in 2n-dimensional flat space R2n . So the embedding space for general relativity has no more than eight dimensions, at least for smooth curved spaces.
11.3 Coordinates Points are distinguished by coordinates. A point p on a sphere can be labeled by its polar and azimuthal angles (θ, φ) with respect to a polar axis and a choice of meridian. If we use a different axis and meridian, then we’ll have different coordinates (θ0 , φ0 ) for the same point p. Points are physical, coordinates are imaginary. When we change our system of coordinates, the points don’t change, but their coordinates do. Because points p in space-time with coordinates xi (p) are so important in physics, we’ll often use the letter x for coordinates. When we write x = xi (p), we mean that x is a vector of coordinates xi where i runs over the dimensions of the manifold — two in the case of the sphere. Often we’ll
410
Tensors and Local Symmetries
call the coordinates in various coordinate systems x = xi (p) and x0 = x0i (p) and so forth. Most points p have unique coordinates x = xi (p) and x0 = x0i (p) in each coordinate system. For instance, the polar coordinates θ and φ of all points p on a sphere are unique — except for the north (and south) poles which are labeled by θ = 0 (and θ = π) and all 0 ≤ φ < 2π. By using more than one coordinate system, one can arrange to label every point uniquely in most systems. Of course, in the flat three-dimensional space in which the sphere is a surface, each point of the sphere does have unique x, y, z coordinates, p~ = (x, y, z). If we are using suitable coordinate systems that uniquely represent points p in some local region, then the maps x0 = x0i = x0i (p) = x0i (p(x)) = x0i (x)
(11.1)
x = xi = xi (p) = xi (p(x0 )) = xi (x0 )
(11.2)
and are well defined and one to one. Since the coordinates x = xi (p) label the point p, one might call the coordinates x = xi (p) “the point x,” but the reader should keep in mind that p and x are different. The point p is unique, but it has many coordinates x, x0 , etc. 11.4 Vectors 11.5 Contravariant Vectors By differentiating the expression x0i (x) with respect to x, one has dx0i =
X ∂x0i j
∂xj
dxj .
(11.3)
This rule defines contravariant vectors. A quantity Ai is a contravariant vector if it transforms like dxi X ∂x0i A0i = Aj . (11.4) ∂xj j
The coordinate differentials dxi form a contravariant vector. A contravariant vector Ai (x) that depends upon the coordinates x of a spacetime point p is called a contravariant vector field. It transforms like (11.4) A0i (x0 ) =
X ∂x0i j
∂xj
Aj (x).
(11.5)
411
11.6 Scalars
By differentiating the expression xi (x0 ) with respect to x0 , we find dxi =
X ∂xi dx0j . ∂x0j
(11.6)
j
11.6 Scalars A quantity B that is the same in all coordinate systems B0 = B
(11.7)
is a scalar. If the quantity B depends upon the spacetime point x, and if x and x0 describe the same physical point, then B(x) is a scalar field if B 0 (x0 ) = B(x).
(11.8)
The temperature T (x) is a scalar field.
11.7 Covariant Vectors The chain rule tells us that partial derivatives transform as X ∂xj ∂ ∂ = . 0i ∂x ∂x0i ∂xj
(11.9)
j
This rule defines covariant vectors: A vector Ci that transforms as Ci0 =
X ∂xj Cj ∂x0i
(11.10)
j
is a covariant vector. If Ci is also a function of x, then Ci (x) is called a covariant vector field. Example: The spacetime derivatives of a scalar field form a covariant vector field. For by differentiating the equation that defines a scalar field B 0 (x0 ) = B(x)
(11.11)
one finds with the aid of the chain rule that ∂B(x) X ∂xj ∂B(x) ∂B 0 (x0 ) = = 0i ∂x ∂x0i ∂x0i ∂xj j
which shows that ∂B(x)/∂xj is a covariant vector field.
(11.12)
412
Tensors and Local Symmetries
11.8 Euclidean Space In some cases, the distinction between covariant and contravariant vectors vanishes or is minor. If we use euclidean coordinates to describe points in euclidean space, then covariant and contravariant vectors are the same. Euclidean space has a natural inner product, the usual dot-product. If the space has N dimensions, the dot-product of two points p and p0 is (p, p0 ) = p · p0 =
N X
0
pα p α
(11.13)
α=1
p0
p0
which is symmetric, p · = · p. We can choose different sets {ei } and {e0i } of N orthonormal basis vectors ei · ej =
N X
eαi eαj = δij = e0i · e0j =
α=1
N X
0
0
eiα ejα
(11.14)
α=1
to represent any point p in the space either as a linear combination of the vectors ei with coefficients xi or as a linear combination of the vectors e0i with coefficients x0i N N X X i p= ei x = e0i x0i . (11.15) i=1
i=1
For now, we will work with basis vectors that are independent of the coordinates ∂ei = 0 etc. (11.16) ∂xj In fact, here, the basis vectors define the coordinates. The basis vectors are the derivatives ∂p ∂p ei = and e0i = (11.17) i ∂x ∂x0i so the coordinates also define the basis vectors. But these basis vectors are not functions of the point p or of the coordinates x and x0 . Since the basis vectors are orthonormal, the component x0i is given by x0i = e0i · p =
N X
e0i · e0j x0j =
j=1
N X
δij x0j = x0i
(11.18)
j=1
but we also can write x0i = e0i · p =
N X j=1
e0i · ej xj .
(11.19)
11.9 Minkowski Space
413
Because the basis vectors e and e0 are all independent of x, the coefficients ∂x0i /∂xj of the transformation law (11.4) for contravariant vectors are ∂x0i = e0i · ej . ∂xj
(11.20)
Similarly, the coefficient xj is xj = e j · p =
N X
ej · e0i x0i
(11.21)
i=1
so the coefficients ∂xj /∂x0i of the transformation law (11.10) for covariant vectors are ∂xj = ej · e0i . (11.22) ∂x0i Because the euclidean dot-product (11.13) is symmetric ∂x0i ∂xj 0 0 = e · e = e · e = j j i i ∂xj ∂x0i
(11.23)
contravariant and covariant vectors transform the same way in euclidean space. The coefficients e0i · ej form an orthogonal matrix, and the linear operator N X
T
ei e0i =
i=0
N X
|ei ihe0i |
(11.24)
i=0
is an orthogonal (real, unitary) transformation.
11.9 Minkowski Space Minkowski space is a space-time with one time dimension, labeled by a = 0, and N space dimensions. In special relativity, N = 3, and the Minkowski metric η −1 if α = β = 0 αβ ηαβ = η = (11.25) 1 if α = β > 0 0 if α 6= β defines a real, symmetric inner product (p, q) between any pair of points p and q (p, q) = p · q =
N X N X α=0 β=0
pα ηαβ q β = (q, p).
(11.26)
414
Tensors and Local Symmetries
If one time component vanishes, this inner product reduces to the dotproduct (11.13). We can use different sets {ei } and {e0i } of N + 1 Lorentz-orthonormal basis vectors N X N X
ei · ej =
eαi ηαβ eβj = ηij = e0i · e0j
(11.27)
α=0 β=0
to represent any point p in the space either as a linear combination of the vectors ei with coefficients xi or as a linear combination of the vectors e0i with coefficients x0i N N X X p= e i xi = e0i x0i . (11.28) i=0
i=0
The dual vectors, which carry upper indices, are defined as ei =
N X
η ij ej and e0i =
j=0
N X
η ij e0j .
(11.29)
j=0
They are orthonormal to the vectors ei and e0i with respect to the inner product (11.26) i
e · ej =
N X
N X
ik
η ek · ej =
j=0
η ik ηkj = δji
(11.30)
j=0
and similarly e0i · e0j = δji .
(11.31)
The component x0i is therefore given by 0i
0i
x =e ·p=
N X
e0i · ej xj .
(11.32)
j=0 0
Such a linear map from a four-vector xi to a four-vector x i is called a Lorentz transformation Lij = e0i · ej .
(11.33)
The inner product (p, q) of two points p = ei xi = e0i x0i and q = ek y k = is physical and therefore invariant under Lorentz transformations e0k y 0k
(p, q) = xi y k ei · ek = ηik xi y k = x0i y 0k e0i · e0k = ηik x0i y 0k .
(11.34)
415
11.9 Minkowski Space
That is, ηrs xr y s = ηik Li r xr Lks y s
(11.35)
or since xr and y s are arbitrary i k ηrs = ηik Li r Lks = Li r ηik Lks = LT r ηik L s
(11.36)
or in matrix notation η = LT η L.
(11.37)
The matrix p γ2 − 1 γ p 2 γ γ −1 L= 0 0 0 0
0 0 1 0
0 0 0 1
(11.38)
p where γ = 1/ 1 − v 2 /c2 represents a Lorentz transformation that is a boost in the x-direction. Boosts and rotations are Lorentz transformations. If the basis vectors e and e0 are independent of p and of x, then the coefficients of the transformation law (11.4) for contravariant vectors are ∂x0i = e0i · ej = ej · e0i . ∂xj
(11.39)
Similarly, the component xj is j
j
x =e ·p=
N X
ej · e0i x0i
(11.40)
i=0
so that the coefficients of the transformation law (11.10) for covariant vectors are ∂xj = ej · e0i = e0i · ej . (11.41) ∂x0i The contravariant coefficients (11.4) are related to the covariant ones (11.10) by N
N
N
N
XX XX ∂x0i ∂x` 0i ik 0 ` ik = e · e = η η e · e = η η j j` j` k ∂xj ∂x0k k=0 `=0
(11.42)
k=0 `=0
so in Minkowski space, the coefficients for the two kinds of transformation laws differ only by occasional minus signs.
416
Tensors and Local Symmetries
So if Ai is a contravariant vector, then by (11.4) N X ∂x0i
0
Ai =
∂xj
j=0
Aj
(11.43)
and so N X
N X N X
0i
ηsi A =
i=0
ηsi
i=0 j=0
∂x0i j A . ∂xj
(11.44)
The relation (11.42) between the two kinds of coefficients now implies that N X
0i
ηsi A =
i=0
N X N X N X N X
ηsi η ik ηj`
i=0 j=0 k=0 `=0
=
N X N X N X
δsk ηj`
k=0 `=0 j=0
=
N X N X `=0 j=0
∂x` j A ∂x0k
∂x` j A ∂x0k
N N ∂x` j X ∂x` X η`j Aj ηj` 0s A = ∂x ∂x0s `=0
(11.45)
j=0
which shows that A` =
N X
η`j Aj
(11.46)
j=0
is a covariant vector A0s =
X ∂x` A` . ∂x0s
(11.47)
`
Since η with upper indices is the inverse of η with lower indices, the transformation 3 X
k`
η A` =
3 X N X
k`
j
η η`j A =
`=0 j=0
`=0
N X
δjk Aj = Ak
(11.48)
j=0
switches the covariant vector A` back to its contravariant form Ak . So in Minkowski space, one uses η to raise and lower indices Ai =
N X j=0
j
i
ηij A and A =
N X
η ij Aj
(11.49)
j=0
that is, to change vectors from covariant to contravariant and vice versa. This trick works for tensors too, as we’ll see.
417
11.10 Special Relativity
11.10 Special Relativity Here N = 3, and the Minkowski space is flat and has four dimensions. The inner product (p − q) · (p − q) of the interval p − q between two points is physical and independent of the coordinates and therefore invariant. If the points p and q are close neighbors with coordinates xi + dxi for p and xi for q, then that invariant inner product is (p − q) · (p − q) =
3 X 3 X i=0 j=0
i
j
ei dx · ej dx =
3 X 3 X
dxi ηij dxj = dx2 − (dx0 )2
i=0 j=0
(11.50) with = c dt. (At some point in what follows, we’ll measure distance in light-seconds so that c = 1.) If the points p and q are on the trajectory of a particle moving at velocity v, then this invariant quantity is the square of the invariant distance ds2 = dx2 − c2 dt2 = v 2 − c2 dt2 (11.51) dx0
which always is negative since v < c. The time in the rest frame of the particle is the proper time. The square of its differential element is dτ 2 = − ds2 /c2 = 1 − v 2 /c2 dt2 . (11.52) A particle of mass zero moves at the speed of light, and so its proper time is zero. But for a particle of mass m > 0 moving at speed v, the element of proper time dτ isp smaller than the corresponding element of laboratory time dt by the factor 1 − v 2 /c2 . The proper time is the time in the rest frame of the particle, dτ = dt when v = 0. So if T (0) is the lifetime of a particle (at rest), then the apparent lifetime T (v) when the particle is moving at speed v is dτ T (0) T (v) = dt = p =p 1 − v 2 /c2 1 − v 2 /c2
(11.53)
which is longer — an effect known as time dilation.
11.11 Example: Time Dilation in Muon Decay A muon at rest has a mean life of T (0) = 2.2×10−6 seconds. Cosmic rays hitting nitrogen and oxygen nuclei make pions high in the Earth’s atmosphere; the pions rapidly decay into muons in 2.6 × 10−8 s. A muon moving at the speed of light from 10 km takes at least T = 10 km/300, 000 (km/sec) = 3.3 × 10−5 s to hit the ground. Were it not for time dilation, the probability
418
Tensors and Local Symmetries
P of such a muon reaching the ground as a muon would be P = e−T /T = exp(−33/2.2) = e−15 = 2.6 × 10−7 .
(11.54)
The (rest) mass of a muon is 105.66 MeV. So a muon of energy E = 749 MeV has by (11.61) a time-dilation factor of 749 1 E 1 p = = 7.0888 = p = . 2 2 2 mc 105.7 1 − v /c 1 − (0.99)2
(11.55)
So a muon moving at a speed of v = 0.99 c has an apparent mean life T (v) given by Eq.(11.53) as T (v) =
E 2.2 × 10−6 s T (0) p p = = 1.6 × 10−5 s. (11.56) T (0) = mc2 1 − v 2 /c2 1 − (0.99)2
The probability of survival with time dilation is P = e−T /T (v) = exp(−33/16) = 0.12
(11.57)
so that 12% survive. Time dilation increases the chance of survival by a factor of 470,000 — no small effect. 11.12 Kinematics From the scalar dτ , and the contravariant vector dxi , we can make the 4-vector dxi dt dx0 dx 1 i u = = , =p (c, v) (11.58) dτ dτ dt dt 1 − v 2 /c2 p in which u0 = c dt/dτ = c/ 1 − v 2 /c2 and u = u0 v/c. The product mui is the energy-momentum 4-vector pi dxi dt dxi m dxi =m =p dτ dτ dt 1 − v 2 /c2 dt m E =p (c, v) = ,p . 2 2 c 1 − v /c
pi = m ui = m
(11.59)
Its invariant inner product is a constant characteristic of the particle proportional to the square of its mass c2 pi pi = mc ui mc ui = −E 2 + c2 p 2 = −m2 c4 .
(11.60)
Note that the time-dilation factor is the ratio of the energy of a particle to its rest energy E 1 p = (11.61) mc2 1 − v 2 /c2
419
11.13 Electrodynamics
and the velocity of the particle is its momentum divided by its equivalent mass E/c2 p . (11.62) v= E/c2 The analogue of F = m a is m
d2 xi dui dpi = m = = fi dτ 2 dτ dτ
(11.63)
in which p0 = E, and f i is a 4-vector force. Example: p + π + → Σ+ + K + What is the minimum energy that a beam of pions must have to produce a sigma hyperon and a kaon by striking a proton at rest? We set c = 1 and compute the LHS of the invariant quantity (pp + pπ )2 = (pΣ + pK )2 in the pp = (mp , 0) frame and its RHS in the frame in which both the Σ and the K have zero spatial momentum: (pp + pπ )2 = p2p + p2π + 2pp · pπ = −m2p − m2π − 2mp Eπ = (pΣ + pK )2 = − (mΣ + mK )2 .
(11.64)
Thus, since the relevant masses (in MeV) are mΣ+ = 1189.4, mK + = 493.7, mp = 938.3, and mπ+ = 139.6, the minimum total energy of the pion is (mΣ + mK )2 − m2p − m2π ≈ 1030 Eπ = 2mp
MeV.
(11.65)
11.13 Electrodynamics In electrodynamics and in mksa (si) units, the three-dimensional vector potential A and the scalar potential φ form a covariant 4-vector potential −φ Ai = ,A . (11.66) c The contravariant 4-vector potential is Ai = (φ/c, A). The magnetic induction is B =∇×A
(11.67)
and the electric field is Ei = c
∂A0 ∂Ai − ∂xi ∂x0
= −
∂φ ∂Ai − ∂xi ∂t
(11.68)
where x0 = ct. In 3-vector notation, E is given by the gradient of φ and the time-derivative of A ˙ E = −∇φ − A. (11.69)
420
Tensors and Local Symmetries
In terms of the second-rank, antisymmetric Faraday field-strength tensor Fij =
∂Aj ∂Ai − = −Fji i ∂x ∂xj
(11.70)
the electric field is Ei = c Fi0 and the magnetic field Bi is 3 3 1 X ∂Ak 1 X ∂Ak Bi = = (∇ × A)i . − ijk Fjk = ijk 2 2 ∂xj ∂xj jk=1
(11.71)
jk=1
The inverse equation Fjk =
3 X
jki Bi
(11.72)
i=1
for spatial j and k follows from the Levi-Civita identity (1.528) 3 X
jki Bi =
i=1
= =
3 3 3 3 1 X X 1 X X jki inm Fnm = ijk inm Fnm 2 2
1 2
nm=1 i=1 3 X
nm=1 i=1
(δjn δkm − δjm δkn ) Fnm
nm=1
1 (Fjk − Fkj ) = Fjk . 2
(11.73)
In 3-vector notation and mksa = si units, Maxwell’s equations are the homogeneous pair ∇·B =0
and ∇ × E + B˙ = 0
(11.74)
the second being Faraday’s law an inhomogeneous pair consisting of Gauss’s law and the Maxwell-Amp` ere law ∇ · D = ρf
˙ and ∇ × H = j f + D.
(11.75)
Here ρf is the density of free charge and j f is the free current density. By free, we understand charges and currents that are not restrained by chemical bonds. The divergence of ∇ × H vanishes (like that of any curl), and so the Maxwell-Amp`ere law and Gauss’s law imply that free charge is conserved ˙ = ∇ · j f + ρ˙ f . 0 = ∇ · (∇ × H) = ∇ · j f + ∇ · D
(11.76)
If we use this continuity equation to replace ∇ · j f with − ρ˙ f in this ˙ then we see that the Maxwell-Amp`ere same equation 0 = ∇ · j f + ∇ · D,
11.13 Electrodynamics
421
law preserves Gauss’s law in time ∂ (−ρf + ∇ · D) . ∂t Similarly, Faraday’s law preserves the constraint ∇ · B = 0 ˙ = 0 = ∇ · jf + ∇ · D
(11.77)
∂ ∇ · B = 0. (11.78) ∂t In a linear medium, the electric displacement D is related to the electric field E by a constant D = E and the magnetic field H differs from the magnetic induction B by a constant H = B/µ. On a sub-nanometer scale, the microscopic form of Maxwell’s equations applies. On this scale, the homogeneous equations (11.74) are unchanged, but the inhomogeneous ones are 0 = − ∇ · (∇ × E) =
∇·E =
ρ 0
E˙ and ∇ × B = µ0 j + 0 µ0 E˙ = µ0 j + 2 c
(11.79)
in which ρ and j are the total charge and current densities, and 0 = 8.854 × 10−12 F/m and µ0 = 4π × 10−7 N/A2 are the electric and magnetic constants, whose product is the inverse of the square of the speed of light, 0 µ0 = 1/c2 . Gauss’s law and the Maxwell-Amp`ere law (11.79) imply (problem 3) that the microscopic (total) current 4-vector j = (cρ, j) obeys the continuity equation ρ˙ + ∇ · j = 0. In vacuum, ρ = j = 0, D = 0 E, and H = B/µ0 , and so Maxwell’s equations in vacuum are and ∇ × E + B˙ = 0 (11.80) 1 ˙ ∇ · E = 0 and ∇ × B = 2 E. c Two of these equations ∇ · B = 0 and ∇ · E = 0 are constraints. Taking the curl of the other two equations, we find ∇·B =0
1 ¨ 1 ¨ E and ∇ × (∇ × B) = − 2 B. (11.81) 2 c c One may use the Levi-Civita identity (1.528) to show that (problem 4) ∇ × (∇ × E) = −
∇ × (∇ × E) = ∇ (∇ · E) − 4E and ∇ × (∇ × B) = ∇ (∇ · B) − 4B (11.82) in which 4 ≡ ∇2 . Since in vacuum the divergence of E vanishes, and since that of B always vanishes, these identities and the curl-curl equations (11.81) tell us that E and B satisfy the wave equations 1 ¨ E − 4E = 0 c2
and
1 ¨ B − 4B = 0. c2
(11.83)
422
Tensors and Local Symmetries
The homogeneous pair (11.74) are equivalent to the identity ∂i Fjk + ∂k Fij + ∂j Fki = ∂i (∂j Ak − ∂k Aj ) + ∂k (∂i Aj − ∂j Ai ) + ∂j (∂k Ai − ∂i Ak ) = 0
(11.84)
which follows from the definition (11.70) of the antisymmetric Maxwell fieldstrength tensor. This relation, known as the Bianchi identity, actually is a generally covariant tensor equation `ijk ∂i Fjk = 0
(11.85)
in which `ijk is totally anti-symmetric, as explained in Sec. 11.36. There are four versions of this identity (corresponding to the four ways of choosing three different indices i, j, k from among four and leaving out one, `). The ` = 0 case gives the scalar equation ∇ · B = 0, and the three that have ` 6= 0 give the vector equation ∇ × E + B˙ = 0. In tensor notation, the microscopic form of the two inhomogeneous equations (11.79)—the laws of Gauss and Amp`ere—are ∂i F ki = µ0 j k
(11.86)
in which j k is the current 4-vector j k = (cρ, j) .
(11.87)
The Lorentz force law for a particle of charge q is m
dxj d2 xi dui dpi = m = = f i = q F ij = q F ij uj . 2 dτ dτ dτ dτ
(11.88)
We may cancel a factor of dt/dτ from both sides and find for i = 1, 2, 3 3 X dpi = q −F i0 + ijk Bk vj (11.89) dt jk=1
or in 3-vector notation dp = q (E + v × B) . dt
(11.90)
The only correction needed in Maxwell’s electrodynamics is to replace p = mv with p = mu in the this equation. The reason why so little of classical electrodynamics was changed by special relativity is that electric and magnetic effects were accessible to measurement during the 1800’s. Classical electrodynamics was almost perfect.
423
11.14 Lorentz Transformations
The i = 0 component of the Lorentz force law (11.88) gives the rate of change of the particle’s energy as dE = qE ·v (11.91) dt which incidentally shows that work is done by the electric field but not by the magnetic field. Keeping track of factors of the speed of light is a lot of trouble and a distraction; in what follows, we’ll often use units with c = 1. 11.14 Lorentz Transformations Lorentz transformations are linear transformations of the vectors ei that preserve the inner product ei · ej = ηij . These 4 × 4 matrices can get messy, so students are advised to think mainly in terms of scalars, like pi ηij xj = px = p · x − Et. The same physical point p can be represented by many sets of basis vectors
p=
3 X
i
ei x =
i=0
3 X
e0i x0i .
(11.92)
i=0
We consider only basis vectors that are fixed, that is, independent of p and of the coordinates x and x0 , and that are lorthonormal, that is, that satisfy (11.27) for N = 3 ei · ej = ηij = e0i · e0j .
(11.93)
The vectors ei are four linearly independent four-dimensional vectors, and so they span 4-d Minkowski space and can represent the vectors e0i as e0i =
3 X
Lik ek
(11.94)
k=0
Lik
where the coefficients are real numbers. If we substitute this expansion into the lorthonormality condition (11.93), then we find ηij =
e0i
·
e0j
=
3 X k=0
Lik ek
·
3 X
Lj` e`
=
`=0
3 X
Lik
Lj` ek
k,`=0
· e` =
3 X
Lik Lj` ηk` .
k,`=0
(11.95) A 4 × 4 matrix
Lik
that satisfies this condition ηij =
3 X k,`=0
Lik ηk` Lj` .
(11.96)
424
Tensors and Local Symmetries
is a Lorentz transformation. In matrix notation, this condition is η = L η LT
(11.97)
where LT is the transpose of L, that is, (LT )` j = Lj ` . Incidentally, since η 2 = 1, the 4 × 4 identity matrix, this shows that 1 = η 2 = L η LT η = η L η LT
(11.98)
so the inverse matrices L−1 and (LT )−1 are L−1 = η LT η and (LT )−1 = η L η.
(11.99)
Now by Eq.(11.94), the coefficients Li k are Li k = ek · e0i = e0i · ek .
(11.100)
The expansion (11.92) of the point p in terms of the vectors ei and e0i allows us to relate the coordinates x0 and x: 3 X
x0i = e0 i · p =
e0 i · ej xj
(11.101)
j=0
If we use η to raise i and lower k, then we find, using the formula (11.100) for the matrix elements Lk` , that e0 i · ej is 0i
e ·ej =
3 X k=0
η ik e0k ·
3 X
3 X
`
ηj` e =
η
ik
ηj` e0k ·e`
=
η ik ηj` Lk` . (11.102)
k,`=0
k,`=0
`=0
3 X
So by (11.101), the coordinates x0 that describe the point p in Eq.(11.92) are related to the coordinates x by x0i =
3 X
e0 i · ej xj =
j=0
3 X
η ik ηj` Lk`
k,`=0
3 X
xj
(11.103)
j=0
or in matrix notation x0 = η L η x = (LT )−1 x
(11.104)
by (11.99). The matrix (LT )−1 = η L η is itself a Lorentz matrix Λ since it satisfies the condition (11.97): η = Λ η ΛT = η L η η η LT η = (LT )−1 LT η = η
(11.105)
so one usually writes that the coordinates transform as x0 = Λ x
and
x = Λ−1 x0
(11.106)
425
11.15 Tensors
and the vectors as e0 = (ΛT )−1 e
and
e = (ΛT ) e0
(11.107)
In special relativity, contravariant vectors transform as dx0i = Λi j dxj
(11.108)
i ∂ ∂xj ∂ ∂ = = Λ−1 j . 0i 0i j ∂x ∂x ∂x ∂xj
(11.109)
and covariant ones as
11.15 Tensors Tensors are structures that transform like products of vectors. A first-rank tensor is just a vector. There are three kinds of second-rank tensors. A second-rank contravariant tensor is a quantity M ij that transforms as M 0ij =
X ∂x0i ∂x0j k,l
∂xk ∂xl
M kl .
(11.110)
A second-rank mixed tensor is a quantity Nji that transforms as Nj0i =
X ∂x0i ∂xl N k. ∂xk ∂x0j l
(11.111)
k,l
A second-rank covariant tensor is a quantity Fij that transforms as Fij0 =
X ∂xk ∂xl Fkl . ∂x0i ∂x0j
(11.112)
k,l
The Maxwell field strength tensor Fkl (x) is an example of a second-rank covariant tensor; another is the metric of spacetime gij (x). You may define tensors of higher rank by extending the above definitions to quantities with more indices.
11.16 Adding Tensors Since the transformation laws that define tensors are linear, any linear combination of tensors of a given rank and kind is a tensor of that rank and kind. Thus if Fij and Gij are both second-rank covariant tensors, then so is their sum Hij = Fij + Gij .
(11.113)
426
Tensors and Local Symmetries
11.17 Tensor Equations Maxwell’s equations relate the derivatives of the field-strength tensor to the current density ∂F ik = ji ∂xk
(11.114)
and the derivatives of the field-strength tensor to each other 0 = ∂i Fjk + ∂k Fij + ∂j Fki .
(11.115)
They are generally covariant tensor equations, as we’ll see in Secs.(11.35 & 11.36). Suppose we can write a physical law in one coordinate system as K ij = 0.
(11.116)
Then in any other coordinate system, the corresponding equation K 0ij = 0
(11.117)
is also valid since K 0ij =
X ∂x0i ∂x0j k,l
∂xk ∂xl
K kl = 0.
(11.118)
Thus by writing a theory in terms of vectors and tensors, one may create a theory that is true in all coordinate systems. If our own coordinate system is not special, then we’d better use tensor equations so that they are the same in other coordinate systems. The standard way to make such a generally applicable theory is to begin with an action that is itself invariant under arbitrary coordinate transformations.
11.18 The Summation Convention and Contractions Clumsy notation is distracting; unwieldy notation is hard to write. To make equations easier to understand, Einstein introduced the summation convention. In this convention an index that appears twice in the same monomial, once as a subscript and once as a superscript, is considered to be a dummy index that is summed over. Sums over dummy indices are called contractions because the dummy indices disappear. Thus, for example, the summation convention together with the chain
11.19 The Kronecker Delta
427
rule imply that X ∂x0i ∂xk ∂x0i ∂xk ∂x0i ≡ = = δji . 0j 0j k k ∂x0j ∂x ∂x ∂x ∂x
(11.119)
k
The repeated index k has disappeared in a contraction.
11.19 The Kronecker Delta As an example of the use of the summation convention, let’s apply it to the transformation law for the Kronecker delta. If δji is a mixed tensor, then it transforms as ∂x0i ∂xk ∂x0i ∂xl k δ = . (11.120) δj0i = ∂xk ∂x0j l ∂xk ∂x0j By the preceding example, this last expression is δj0i =
∂x0i = δji ∂x0j
(11.121)
which shows that the Kronecker delta is an invariant, mixed, second-rank tensor.
11.20 Symmetry of Tensors A covariant tensor with more than one index is symmetric if it is unchanged by any permutation of its indices. Thus the tensor Sij is symmetric if Sij = Sji .
(11.122)
Similarly a contravariant tensor with more than one index is symmetric if it is unchanged by any permutation of its indices. Thus the tensor S ij is symmetric if S ij = S ji .
(11.123)
A second-rank covariant or contravariant tensor is antisymmetric if it changes sign when its indices are interchanged. Thus Aij and B ij are antisymmetric if Aij = −Aji
(11.124)
B ij = −B ji .
(11.125)
and
428
Tensors and Local Symmetries
11.21 Products of Tensors The product of two tensors A and B is a tensor with a rank equal to the sum of the ranks of the tensors A and B minus twice the number of contractions. Thus the product of two covariant vectors is a second-rank covariant tensor Fij = Ai Bj .
(11.126)
And the product of one covariant vector and one contravariant vector is a second-rank mixed tensor if there are no contractions Fij = Ai B j
(11.127)
and a scalar if there is one contraction D = Ai B i .
(11.128)
11.22 The Quotient Rule Suppose that the product B A of a quantity B (with unknown transformation properties) with an arbitrary tensor A (of a given rank and kind) is a tensor. Then B is itself a tensor. The simplest example is when Bi Ai is a scalar for all contravariant vectors Ai Bi0 A0i = Bj Aj .
(11.129)
Then since Ai is a contravariant vector, we have Bi0 A0i = Bi0
∂x0i j A = B j Aj ∂xj
(11.130)
or 0i 0 ∂x Bi j − Bj Aj = 0. ∂x
(11.131)
Now since this equation holds for all vectors A, we may promote it to the level of a vector equation Bi0
∂x0i − Bj = 0. ∂xj
(11.132)
If we now multiply by a transformation matrix Bi0
∂xj ∂x0i ∂xj = B j ∂xj ∂x0k ∂x0k
(11.133)
429
11.23 The Metric Tensor
and use Eq.(11.119), then we see that the unknown quantity Bi indeed transforms as a covariant vector: Bk0 =
∂xj Bj . ∂x0k
(11.134)
The quotient rule works for unknowns B and tensors A of arbitrary rank and kind. The proof in each case is very similar to the one given here.
11.23 The Metric Tensor So far we have been considering coordinate systems with constant basis vectors ei that do not vary with the physical point p. The simplest generalization is the case in which an arbitrary physical point p can be written in terms of the linearly independent sets of basis vectors ei (x) and e0i (x0 ) as p = ei (x) xi = e0i (x0 ) x0i
(11.135)
as in Eqs.(11.28) or (11.92). Ordinary 3-dimensional space in cylindrical or spherical coordinates are two examples. Instead of (11.135), we shall assume only that we can write the change in p(x) due to an infinitesimal change dxi (p) in its coordinates xi (p) as dp(x) = ei (x)dxi = e0i (x0 ) dx0i .
(11.136)
The basis vectors ei and e0i are the partial derivatives of the space-time vector p with respect to the coordinates, ei (x) =
∂p ∂xi
and e0i (x0 ) =
∂p ∂x0i
(11.137)
and they are linearly related to each other e0i (x0 ) =
∂p ∂xj ∂p ∂xj = = ej (x). ∂x0i ∂x0i ∂xj ∂x0i
(11.138)
They are covariant vectors in the embedding space, which has its own inner product ei (x) · ej (x) =
N X
eai (x) ηab ebj (x)
(11.139)
a=1
which may be euclidean or minkowskian. In general, these basis vectors are not normalized or orthogonal. Their 0 (x0 ) inner products define the metrics gij (x) and gij gij (x) = ei (x) · ej (x)
0 and gij (x0 ) = e0i (x0 ) · e0j (x0 )
(11.140)
430
Tensors and Local Symmetries
of the two coordinate systems. Since real inner products are symmetric, the metric tensor is symmetric, gij = gji . Since the basis vectors ei are covariant vectors, the metric gij is a second-rank covariant tensor, as may be seen from its transformation law ∂x` ∂xk ∂x` ∂xk e (x)· e (x) = gk` (x). (11.141) k ` ∂x0i ∂x0j ∂x0i ∂x0j Example 11.1 (The Sphere) Let the point p be a euclidean three-vector representing a point on the two-dimensional surface of a sphere of radius r. In the surface of the sphere, the usual spherical coordinates (θ, φ) label the point p, and the basis vectors are 0 gij (x0 ) = e0i (x0 )·e0j (x0 ) =
∂p = r θˆ ∂θ
(11.142)
∂p ˆ = r sin θ φ. ∂φ
(11.143)
eθ = and eφ =
So by its definition (11.140), the metric tensor for a sphere of radius r has components gθθ = eθ · eθ = r2
(11.144)
gθφ = eθ · eφ = 0
(11.145) 2
2
gφφ = eφ · eφ = r sin θ
(11.146)
and determinant det g = r4 sin2 φ. 11.24 A Basic Axiom Points are real, coordinate systems imaginary. So p, q, p−q, and (p−q)·(p−q) are all invariant quantities. When p and q = p + dp both lie in the physical subspace and are infinitesimally close to each other, the vector dp can be represented by the changes dxi of the coordinates xi as dp = ei dxi . In this case, both dp and the inner product dp · dp are independent of the coordinates. That is, dp and dp2 are the same in all systems of coordinates. So the (squared) distance dp2 in terms of the metric gij = ei · ej and the differentials dxi dp2 ≡ dp · dp = (ei dxi ) · (ej dxj ) = gij dxi dxj
(11.147)
0 = e0 · e0 and the is the same as dp2 expressed in terms of the metric gij i j differentials dx0 i 0 dp2 ≡ dp · dp = (e0i dx0 i ) · (e0j dx0 j ) = gij dx0 i dx0 j .
(11.148)
431
11.25 The Contravariant Metric Tensor
The basic axiom of riemannian geometry is that infinitesimal distances dp2 are invariant under general coordinate transformations. It follows from the fact that points are real. This invariance and the quotient rule provide a second reason why gij is a second-rank covariant tensor.
11.25 The Contravariant Metric Tensor By the quotient theorem, the inverse of the metric tensor gij as defined by the equation g ik gkj = δji
(11.149)
is a second-rank contravariant tensor.
11.26 Raising and Lowering Indices The contraction of a contravariant vector Ai with the metric tensor gives a covariant vector. By convention, one typically uses the symbol Ai for that covariant vector: Ai = gij Aj .
(11.150)
This operation is called lowering the index on Ai . Similarly the contraction of a covariant vector Bj with the inverse of the metric tensor g ij produces a contravariant vector which is conventionally called B i : B i = g ij Bj .
(11.151)
The vectors ei , for instance, are given by ei = g ij ej .
(11.152)
They are therefore orthonormal with respect to the basis vectors ei : ei · ej = ei · g jk ek = g jk ei · ek = g jk gik = g jk gki = δij .
(11.153)
11.27 Orthogonal Coordinates in Flat 2- or 3-Space In flat two- or three-dimensional space, it is convenient to use orthogonal basis vectors and orthogonal coordinates. A change dxi in the coordinates moves the point p by (11.136) dp = ei dxi .
(11.154)
432
Tensors and Local Symmetries
The metric gij is the scalar product (11.140) gij = ei · ej .
(11.155)
Since the vectors ei are orthogonal, the metric is diagonal gij = ei · ej = h2i δij .
(11.156)
g ij = h−2 i δij
(11.157)
The inverse metric g ij is and it raises indices, for instance e i = g ij ej = h−2 i ei .
(11.158)
As in (11.147), the invariant squared distance dp2 between nearby points is 2
dp = dp · dp =
n X
i
j
gij dx dx =
i,j=1
n X
h2i (dxi )2
(11.159)
i=1
where n is 2 or 3, and the invariant volume element is dn p = g
n Y
dxi = gdn x
(11.160)
i=1
in which g = h1 h2 or h1 h2 h3 is the square-root of the determinant of gij . The important special case in which all the scale factors hi are unity is rectangular coordinates. We also can use basis vectors eˆi that are orthonormal. By (11.156 & 11.158), these vectors are eˆi = ei /hi = hi e i
(11.161)
eˆi · eˆj = δij .
(11.162)
and they satisfy In terms of them, a physical and invariant vector V can be written as V = ei V i = hi eˆi V i = e i Vi = h−1 ˆi Vi = eˆi V i i e
(11.163)
where V i ≡ hi V i = h−1 i Vi
(no sum).
(11.164)
The dot-product is then i
j
V · U = gij V V =
3 X i=1
V i U i.
(11.165)
433
11.28 Polar Coordinates
11.28 Polar Coordinates In polar coordinates in two-dimensional flat space, the change dp in a point p due to changes in the coordinates is dp = rˆ dr + θˆ r dθ
(11.166)
dp = er dr + eθ dθ
(11.167)
so
ˆ The metric tensor for polar coordiwith er = eˆr = rˆ and eθ = r eˆθ = r θ. nates is 1 0 (gij ) = (ei · ej ) = . (11.168) 0 r2 The contravariant basis vectors are e r = rˆ and e θ = eˆθ /r. A physical vector V is ˆ (11.169) V = V i ei = Vi e i = V r rˆ + V θ θ.
11.29 Cylindrical Coordinates For cylindrical coordinates in three-dimensional flat space, the change dp in a point p due to changes in the coordinates is dp = ρˆ dρ + φˆ ρ dφ + zˆ dz
(11.170)
dp = eρ dρ + eφ dφ + ez dz
(11.171)
so
ˆ and ez = eˆz with eρ = eˆρ = ρˆ, eφ = ρ eˆφ = ρ φ, cylindrical coordinates is 1 0 (gij ) = (ei · ej ) = 0 ρ2 0 0
= zˆ. The metric tensor for 0 0 . 1
(11.172)
The contravariant basis vectors are e ρ = ρˆ, e φ = eˆφ /ρ, and e z = zˆ. A physical vector V is V = V i ei = Vi e i = V ρ ρˆ + V φ φˆ + V z zˆ.
(11.173)
Incidentally, since p = (ρ cos φ, ρ sin φ, z)
(11.174)
434
Tensors and Local Symmetries
the formulas for the basis vectors of cylindrical coordinates in terms of those of rectangular coordinates are (problem 5) ρˆ = cos φ x ˆ + sin φ y ˆ ˆ φ = − sin φ x ˆ + cos φ y ˆ zˆ = zˆ.
(11.175)
11.30 Spherical Coordinates For spherical coordinates in three-dimensional flat space, the change dp in a point p due to changes in the coordinates is ˆ r sin θ dφ dp = rˆ dr + θˆ r dθ + φ = er dr + eθ dθ + eφ dφ ˆ and eφ = r sin θ φ. ˆ The so er = rˆ, eθ = r θ, coordinates is 1 0 (gij ) = (ei · ej ) = 0 r2 0 0 r2
(11.176)
metric tensor for spherical 0 . 0 2 sin θ
(11.177)
ˆ and eˆφ = φ. ˆ The conThe orthonormal basis vectors are eˆr = rˆ, eˆθ = θ, r θ φ ˆ ˆ travariant basis vectors are e = rˆ, e = θ/r, e = φ/(r sin θ). A physical vector V is ˆ V = V i ei = Vi e i = V r rˆ + V θ θˆ + V φ φ. (11.178) Incidentally, since p = (r sin θ cos φ, r sin θ sin φ, r cos θ)
(11.179)
the formulas for the basis vectors of spherical coordinates in terms of those of rectangular coordinates are (problem 6) rˆ = sin θ cos φ x ˆ + sin θ sin φ y ˆ + cos θ zˆ θˆ = cos θ cos φ x ˆ + cos θ sin φ y ˆ − sin θ zˆ ˆ = − sin φ x φ ˆ + cos φ y ˆ
(11.180)
11.31 The Gradient of a Scalar Field If f (x) is a scalar field, then the difference between it and f (x + dx) is df (x) =
∂f (x) i dx . ∂xi
(11.181)
11.32 Derivatives and Affine Connections
435
The gradient ∇f is defined as df = ∇f · dp.
(11.182)
dp = ej dxj ,
(11.183)
Since
the invariant form ∇f =
∂f (x) i e ∂xi
(11.184)
satisfies the definition of the gradient ∇f : ∇f · dp =
∂f (x) i ∂f (x) i ∂f (x) i j e · ej dxj = δj dx = dx = df. i i ∂x ∂x ∂xi
(11.185)
In two polar coordinates, the gradient is ∇f =
∂f (x) i 1 ∂f (x) ∂f (r, θ) 1 ∂f (r, θ) ˆ e = eˆi = rˆ + θ. i i ∂x hi ∂x ∂r r ∂θ
(11.186)
In three cylindrical coordinates, the gradient is ∂f (x) i 1 ∂f (x) e = eˆi ∂xi hi ∂xi 1 ∂f (ρ, φ, z) ˆ ∂f (ρ, φ, z) ∂f (ρ, φ, z) ρˆ + φ+ zˆ = ∂ρ ρ ∂φ ∂z
∇f =
(11.187)
while in three spherical coordinates it is ∂f (x) i 1 ∂f (x) e = eˆi i ∂x hi ∂xi 1 ∂f (r, θ, φ) ˆ 1 ∂f (r, θ, φ) ˆ ∂f (r, θ, φ) = rˆ + θ+ φ. (11.188) ∂r r ∂θ r sin θ ∂φ
∇f =
11.32 Derivatives and Affine Connections Derivatives of vectors are a bit more subtle. If F (x) is a vector field, then its invariant description is F (x) = F i (x) ei (x).
(11.189)
Since the basis vectors ei (x) may vary with x, its derivative contains two terms ∂F (x) ∂F (x)i ∂ei (x) = ei (x) + F (x)i . (11.190) j j ∂x ∂x ∂xj
436
Tensors and Local Symmetries
In general, the derivative of a vector ei is not a linear combination of the basis vectors ek . For instance, on the 2-d surface of a sphere in 3-d, the derivative ∂eθ = −ˆ r (11.191) ∂θ is not a linear combination of the vectors eθ and eφ . The inner product of a derivative ∂ei /∂xj with a dual basis vector ek is an affine connection ∂ei (x) Γkji = ek · (11.192) ∂xj which also is called a Christoffel symbol of the second kind and written as k = Γkij . (11.193) ij This is the Levi-Civita connection; other affine connections exist but are used less often. Do not bother to memorize the different roles of i and j in the definition (11.192); the affine connection Γkji (x) is symmetric in i and j, as we’ll see in a moment. Affine connections are not tensors; they are connections. Tensors transform homogeneously; connections transform inhomogeneously. The electromagnetic field Ai (x) and other gauge fields are connections. The vectors ei themselves were defined in Eq.(11.137) as the spacetime derivatives of the point p. Thus by Eq.(11.192), the affine connections are related to double derivatives ∂ej (x) ∂ei (x) ∂2p ∂2p k k Γkji (x) = ek · = e · = e · = ek · = Γkij (x) ∂xj ∂xj ∂xi ∂xi ∂xj ∂xi (11.194) and so are symmetric in their two lower indices: Γkij = Γkji .
(11.195)
Since the affine connection Γkij is symmetric in i and j, in four-dimensional space-time, there are 10 of them for each value of k, or 40 in all. The 10 correspond to 3 rotations, 3 boosts, and 4 translations. 11.33 Green Notation To save paper, time, and nerves, we compress our notation further, denoting spacetime derivatives by indices preceded by a comma ∂A ≡ A,i ∂xi
and
∂2A ≡ A,ik . ∂xi ∂xk
(11.196)
11.34 Covariant Derivatives
437
In this notation, one may write part of (11.194) as ei,j = ej,i .
(11.197)
11.34 Covariant Derivatives In green notation, the derivative of a contravariant vector field F (x) = F i (x)ei (x) is F,j = F,ji ei + F i ei,j
(11.198)
and, in general, it lies outside the space spanned by the basis vectors ei . So we use the affine connections (11.192) to form the inner product ek · F,j = ek · F,ji ei + F i ei,j = F,ji δik + F i Γkji = F,jk + Γkji F i . (11.199) This covariant derivative of a contravariant vector field often is written with a semicolon F;jk = ek · F,j = F,jk + Γkji F i .
(11.200)
It is a mixed second-rank tensor. The invariant change dF projected into ek is ek · dF = ek · F,j dxj = F;jk dxj .
(11.201)
The covariant derivative of a vector V written in terms of its covariant components Vi must be V,j = (Vk ek ),j = Vk,j ek + Vk ek,j .
(11.202)
To relate the derivatives of the vectors ei to the affine connections Γkij , we differentiate the orthonormality relation δik = ek · ei
(11.203)
0 = ek,j · ei + ek · ei,j
(11.204)
ek,j · ei = −ek · ei,j = −Γkij .
(11.205)
ei · ek,j = −Γkij .
(11.206)
which gives us or Thus And so inner product of ei with the derivative of V is ei · V,j = ei · Vk,j ek + Vk ek,j = Vi,j − Vk Γkij .
(11.207)
438
Tensors and Local Symmetries
This covariant derivative of a covariant vector field often is written with a semicolon Vi;j = ei · V,j = Vi,j − Vk Γkij .
(11.208)
It is a second-rank covariant tensor. Note the minus sign in Vi;j and the plus sign in F;jk . The change ei · dV is ei · dV = ei · V,j dxj = Vi;j dxj .
(11.209)
Since dV is invariant, ei covariant, and dxj contravariant, the quotient rule (11.22) confirms that the covariant derivative Vi;j of a covariant vector Vi is a 2d-rank covariant tensor.
11.35 The Covariant Curl Because the connection Γkij is symmetric in its lower indices, as shown by Eq.(11.195), the covariant curl of a covariant vector Vi is simply its ordinary curl Vj;i − Vi;j = Vj,i − Vk Γkji − Vi,j + Vk Γkij = Vj,i − Vi,j .
(11.210)
Thus the Faraday field-strength tensor Fij which is defined as the curl of the covariant vector field Ai Fij = Aj,i − Ai,j
(11.211)
is a generally covariant second-rank tensor. In orthogonal coordinates, the 3-d curl is defined in terms of the totally antisymmetric invariant tensor ijk , where 123 = 1, as ∇×V =
3 X
(∇ × V )i eˆi =
i=1
3 X 1 ei ijk Vk;j h1 h2 h3
(11.212)
ijk=1
which, in view of (11.210) and the antisymmetry of ijk , is ∇×V =
3 X
(∇ × V )i eˆi =
i=1
3 X ijk=1
1 ei ijk Vk,j hi hj hk
(11.213)
or by (11.161 & 11.164) ∇×V =
3 X ijk=1
3 X 1 1 ijk hi eˆi Vk,j = hi eˆi ijk (hk V k ),j . hi hj hk hi hj hk ijk=1
(11.214)
11.36 Covariant Derivatives and Antisymmetry
One often writes this as a e1 1 ∂1 ∇×V = h1 h2 h3 V1
determinant e2 e3 1 ∂2 ∂3 = h h 1 2 h3 V2 V3
439
h1 eˆ1 h2 eˆ2 h3 eˆ3 ∂1 ∂2 ∂3 . h1 V 1 h2 V 2 h3 V 3 (11.215)
In cylindrical coordinates, the curl is ρˆ ρ φˆ zˆ 1 ∇ × V = ∂ρ ∂φ ∂z ρ V ρ ρV φ V z
.
(11.216)
In spherical coordinates, it is 1 ∇×V = 2 r sin θ
rˆ rθˆ r sin θ φˆ ∂r ∂θ ∂φ V r rV θ r sin θ V φ
.
(11.217)
11.36 Covariant Derivatives and Antisymmetry By applying our rule (11.208) for the covariant derivative of a covariant vector to a second-rank tensor Aij , we get Aij;k = Aij,k − Γ`ik A`j − Γ`jk Ai` .
(11.218)
Suppose now that our tensor is antisymmetric Aij = −Aji .
(11.219)
Then by adding together the three cyclic permutations of the indices ijk we find that the antisymmetry of the tensor and the symmetry (11.195) of the affine connection conspire to cancel the non-linear terms (in pairs of the same color) Aij;k + Aki;j + Ajk;i = Aij,k − Γ`ik A`j − Γ`jk Ai` + Aki,j − Γ`kj A`i − Γ`ij Ak` + Ajk,i − Γ`ji A`k − Γ`ki Aj` = Aij,k + Aki,j + Ajk,i
(11.220)
an identity named after Luigi Bianchi (1856–1928). The Maxwell field-strength tensor Fij is antisymmetric by construction (Fij = Aj,i − Ai,j ), and so the homogeneous Maxwell equations Fij,k + Fki,j + Fjk,i = 0
(11.221)
440
Tensors and Local Symmetries
are tensor equations valid in all coordinate systems. This is another example of how amazingly right Maxwell was back in the 1800s.
11.37 Affine Connection and Metric Tensor To relate the affine connection Γm ij to the derivatives of the metric tensor gij , it is useful to first introduce another kind of affine connection. The affine connection of the first kind [ij, k] is defined as an affine connection of the second kind with all lower indices: [ij, k] ≡ gkm Γm ij .
(11.222)
m Neither [ij, k] nor Γm ij is a tensor. Because the affine connection Γij is symmetric (11.195) in its two lower indices, the affine connection of the first kind [ij, k] is symmetric in its first two indices m [ij, k] = gkm Γm ij = gkm Γji = [ji, k].
(11.223)
One may obtain the Christoffel symbol Γsij by raising the index k of [ij, k] s m s g sk [ij, k] = g sk gkm Γm ij = δm Γij = Γij .
(11.224)
It follows from the definition (11.192) of the affine connection of the second kind that the affine connections of the first kind are related to the derivatives of the basis vectors ei [ij, k] = gkm em · ei,j = ek · ei,j .
(11.225)
Now by differentiating the definition of the metric tensor gij = ei · ej
(11.226)
and using the definition of the first kind of affine connections, we find gij,k = ei,k · ej + ei · ej,k = [ik, j] + [jk, i].
(11.227)
If we now subtract this relation from two instances of the same relation with the indices suitably permuted gik,j = [ij, k] + [kj, i]
(11.228)
gjk,i = [ji, k] + [ki, j]
(11.229)
keeping in mind the symmetry (11.223), then we find that four of the six
441
11.38 Covariant Derivative of Metric Tensor
terms cancel gik,j + gjk,i − gij,k = [ij, k] + [kj, i] + [ji, k] + [ki, j] − [ik, j] − [jk, i] = 2[ij, k]
(11.230)
leaving an expression for [ij, k] [ij, k] =
1 2
(gik,j + gjk,i − gij,k ) .
(11.231)
Thus by raising the odd index on a affine connection of the first kind Γsij = g sk [ij, k]
(11.232)
we obtain a formula expressing this affine connection of the second kind in terms of spacetime derivatives of the metric tensor: Γsij = 12 g sk (gik,j + gjk,i − gij,k ) .
(11.233)
Since the connection [ij, k] can be obtained from Γsij , it follows that all the affine connections are determined by the metric tensor and its derivatives. 11.38 Covariant Derivative of Metric Tensor Covariant derivatives of second-rank and higher-rank tensors are formed by iterating our formulas for the covariant derivatives of vectors. For instance, the covariant derivative of the metric tensor is gij;k ≡ gij,k − Γ`ik g`j − Γnkj gin .
(11.234)
One way to derive this formula is to proceed as in Sec.(11.34) by differentiating the invariant metric tensor gij ei ⊗ ej in which the vector product ei ⊗ ej is a kind of direct product g,k = (gij ei ⊗ ej ),k = gij,k ei ⊗ ej + gij ei,k ⊗ ej + gij ei ⊗ ej,k .
(11.235)
We now take the inner product of this derivative with e` ⊗ en (e` ⊗ en , g,k ) = gij,k e` · ei en · ej + gij e` · ei,k en · ej + gij e` · ei en · ej,k (11.236) and use the rules e` · ei = δ`i and e` · ei,k = −Γi`k (11.206) to write (e` ⊗ en , g,k ) = g`n;k = g`n,k − gij Γi`k δnj − gij δ`i Γjnk
(11.237)
or g`n;k = g`n,k − Γi`k gin − Γjnk g`j
(11.238)
which is (11.234) inasmuch as both gij and Γkij are symmetric in their two lower indices.
442
Tensors and Local Symmetries
If we now substitute our formula (11.233) for the connections Γlik and Γnkj gij;k = gij,k − 21 g ls (gis,k + gsk,i − gik,s ) glj − 12 g ns (gjs,k + gsk,j − gjk,s ) gin (11.239) then after using the fact (11.149) that the metric tensors gij and g jk are mutually inverse, we find gij;k = gij,k − 12 δjs (gis,k + gsk,i − gik,s ) − 12 δis (gjs,k + gsk,j − gjk,s ) = gij,k − 12 (gij,k + gjk,i − gik,j ) − 12 (gji,k + gik,j − gjk,i )
(11.240)
which vanishes identically! The covariant derivative of the metric tensor is zero gij;k = 0.
(11.241)
This result follows from our choice of the Levi-Civita connection (11.192); it is not true for some other connections. 11.39 Divergence of Contravariant Vector The contraction of the covariant derivative of a contravariant vector is a scalar known as the divergence, ∇ · V = V;ii = V,ii + Γiik V k .
(11.242)
Because two indices in the connection Γiik = 12 g im (gim,k + gkm,i − gik,m )
(11.243)
are contracted, its last two terms cancel because they differ only by the interchange of the dummy indices i and m g im gkm,i = g mi gkm,i = g im gki,m = g im gik,m .
(11.244)
So the contracted connection collapses to Γiik = 12 g im gim,k .
(11.245)
There is a nice formula for this last expression involving the absolute-value of the determinant of the metric tensor. To derive it, let us recall that like any determinant, the determinant det(g) of the metric tensor is given by the cofactor sum (1.196) X det(g) = gij Cij (11.246) j
along any row or column, that is, over j for fixed i or over i for fixed j, where
11.39 Divergence of Contravariant Vector
443
Cij is the cofactor defined as (−1)i+j times the determinant of the reduced matrix consisting of the matrix gij with row i and column j omitted. Thus the partial derivative of g with respect to the ijth element gij is ∂ det(g) = Cij , ∂gij
(11.247)
in which we consider gij and gji to be independent variables for the purposes of this differentiation. Let us also recall (1.198) that the inverse g ij of the metric tensor can be expressed in terms of the cofactor matrix as g ij =
Cji 1 ∂ det(g) . = det(g) det(g) ∂gij
(11.248)
Now by the chain rule, we may write the derivative of the determinate det(g) as ∂ det(g) det(g),k = gij,k = gij,k det(g) g ij . (11.249) ∂gij Hence by (11.245), the contracted connection Γiik is Γiik
=
1 im 2 g gim,k
√ ( g),k det(g),k | det(g)|,k g,k = = = = √ 2 det(g) 2| det(g)| 2g g
(11.250)
in which we used g for the absolute-value of the determinant of the metric tensor. g = |det(gij )| .
(11.251)
Thus from (11.242), we arrive at our formula for the covariant divergence of a contravariant vector: √ √ ( g),k ( g V k ),k ∇ · V = V;ii = V,ii + Γiik V k = V,kk + √ V k = . (11.252) √ g g √
In two orthogonal coordinates, equations (11.156 & 11.164) imply that g = h1 h2 and V k = V k /hk , and so the divergence of a vector V is 2 1 X h1 h2 ∇·V = Vk h1 h2 hk ,k
(11.253)
k=1
which in polar coordinates (section 11.28) with hr = 1 and hθ = r, is i i 1 h 1 h ∇·V = r V r ,r + V θ ,θ = r V r ,r + V θ,θ . (11.254) r r
444
√
Tensors and Local Symmetries
In three orthogonal coordinates, equations (11.156 & 11.164) imply that g = h1 h2 h3 and V k = V k /hk , and so the divergence of a vector V is 3 X 1 h1 h2 h3 ∇·V = Vk h1 h2 h3 hk ,k
(11.255)
k=1
which in cylindrical coordinates (Sec. 11.29) with hρ = 1, hφ = ρ, and hz = 1, is i 1 h ρ V ρ ,ρ + V φ ,φ + ρ V z ,z ∇·V = ρ i 1 h ρ V ρ ,ρ + V φ,φ + ρ V z,z . = (11.256) ρ In spherical coordinates (Sec. 11.30) with hr = 1, hθ = r, and hφ = r sin θ, the divergence ∇ · V is h i 1 ∇·V = 2 r2 sin θ V r ,r + r sin θ V θ ,θ + r V φ ,φ r sin θ h i 1 = 2 sin θ r2 V r ,r + r sin θ V θ ,θ + rV φ,φ . (11.257) r sin θ 11.40 The Covariant Laplacian Since the gradient of a scalar field f is a covariant vector, we may use the inverse metric tensor g ij to write the laplacian 4f of a scalar f as the covariant divergence of the contravariant vector g ik f,k 4f = ∇ · ∇f = (g ik f,k );i .
(11.258)
The divergence formula (11.252) gives for the invariant laplacian √ ( gg ik f,k ),i 4f = . (11.259) √ g In two p orthogonal coordinates, equations (11.156 & 11.157) imply that g = | det(gij )| = h1 h2 and that g ik f,k = h−2 i f,i , and so the laplacian of a scalar f is ! 2 X 1 h1 h2 f,i ,i . (11.260) 4f = h1 h2 h2i i=1 √
In polar coordinates, where h1 = 1, h2 = r, and g = r2 , the laplacian is i 1h 4f = (rf,r ),r + r−1 f,θ ,θ = f,rr + r−1 f,r + r−2 f,θθ . (11.261) r
11.41 The Principle of Stationary Action
445
In three (11.156 & 11.157) imply that p orthogonal coordinates, equations ik g = | det(gij )| = h1 h2 h3 and that g f,k = h−2 i f,i , and so the laplacian of a scalar f is ! 3 X 1 h1 h2 h3 4f = f,i ,i . (11.262) h1 h2 h3 h2i i=1
√
In cylindrical coordinates (Sec. 11.29), g = ρ2 and the laplacian is 1 1 1 1 (ρ f,ρ ),ρ + f,φφ + ρ f,zz = f,ρρ + f,ρ + 2 f,φφ + f,zz . (11.263) 4f = ρ ρ ρ ρ In spherical coordinates (Sec. 11.30), g is g = | det(gij )| = r4 sin2 θ and the inverse g ij of the metric tensor is 1 0 0 . (g ij ) = 0 r−2 0 −2 −2 0 0 r sin θ
(11.264)
(11.265)
Thus the laplacian of f is 4f = =
r2 sin θf,r r2 f,r r2
,r
,r
+ (sin θf,θ ),θ + (f,φ / sin θ),φ
r2 sin θ (sin θf,θ ),θ f,φφ + + 2 2 . 2 r sin θ r sin θ
(11.266)
If the function f is a function only of the radial variable r, then the laplacian is simply 4f (r) =
1 2 0 0 1 2 r f (r) = [rf (r)]00 = f 00 (r) + f 0 (r) 2 r r r
(11.267)
in which the primes denote r-derivatives.
11.41 The Principle of Stationary Action It follows from a path-integral formulation of quantum mechanics that the classical motion of a particle is given by the principle of stationary action δS = 0. In the simplest case of a free, non-relativistic particle, the lagrangian is L = (m/2)x˙ 2 and the action is Z t2 m 2 S= x˙ dt. (11.268) t1 2
446
Tensors and Local Symmetries
The classical trajectory is the one that when varied slightly by δx (with δx(t1 ) = δx(t2 ) = 0) does not change the action to first order in δx Z t2 Z t2 d m (x˙ · δx) − m¨ mx˙ · δ x˙ dt = 0 = δS = x · δx dt dt t1 t1 Z t2 Z t2 t2 ¨ · δx dt = −m ¨ · δx dt x x (11.269) = m [x˙ · δx]t1 − m t1
t1
since δx(t1 ) = δx(t2 ) = 0 and since the change in the velocity is the time derivative of the change in the path δ x˙ = x˙ 0 − x˙ =
d 0 d (x − x) = δx. dt dt
(11.270)
If the first-order change in the action is to vanish for arbitrary small variations δx in the path, then the acceleration must vanish ¨=0 x
(11.271)
which is the classical equation of motion for a free particle. If the particle is moving under the influence of a potential V (x), then the action is Z t2 m ˙2 S= x − V (x) dt. (11.272) 2 t1 Since δV (x) = ∇V (x) · δx, the principle of stationary action requires that Z t2 0 = δS = (−m¨ x − ∇V ) · δx dt (11.273) t1
or m¨ x = −∇V
(11.274)
which is the classical equation of motion for a particle of mass m in a potential V . The action for a free particle of mass m in special relativity is Z τ2 Z τ2 q S = −m dτ = − m 1 − x˙2 dt (11.275) τ1
τ1
where the dot labels a time derivative. The requirement of stationary action is Z τ2 q Z τ2 x˙ · δ x˙ ˙ 2 p 0 = δS = − δ m 1 − x dt = m dt (11.276) τ1 τ1 1 − x˙2
11.41 The Principle of Stationary Action
447
p But 1/ 1 − x˙2 = dt/dτ and so Z τ2 Z τ2 dx dδx dt dx dδx dt dτ 0 = δS = m · dt = m · dt dt dt dτ dt dτ dτ τ1 dt Z τ2 τ1 dx dδx · dτ. (11.277) =m dτ τ1 dτ So integrating by parts, keeping in mind that δx(τ2 ) = δx(τ1 ) = 0, we have Z τ2 2 d x d d2 x · δx dτ. 0 = δS = m (x˙ · δx) − 2 · δx dτ = −m 2 dτ dτ τ1 dτ τ1 (11.278) To have this hold for arbitrary δx, we need Z
τ2
d2 x =0 dτ 2
(11.279)
which is the equation of motion for a free particle in special relativity. What about a charged particle in an electromagnetic field Ai ? Its action is Z τ2 Z τ2 S = −m dτ + q Ai (x) dxi . (11.280) τ1
τ1
We now treat the first term in a four-dimensional manner q −ηij dxi δdxj = −uj δdxj = −uj dδxj δdτ = δ −ηij dxi dxj = p −ηij dxi dxj
(11.281)
in which uj = dxj /dτ is the 4-velocity (11.58). The variation of the other term is δ Ai dxi = (δAi ) dxi + Ai δdxi = Ai,j δxj dxi + Ai dδxi (11.282) Putting them together, we get for δS Z τ2 i dδxj dδxi j dx muj δS = + qAi,j δx + qAi dτ. dτ dτ dτ τ1
(11.283)
After integrating by parts the last term, dropping the boundary terms, and changing a dummy index, we get Z τ2 i dAj j duj j j dx δx + qAi,j δx −q δx dτ δS = −m dτ dτ dτ τ1 Z τ2 duj dxi = −m + q (Ai,j − Aj,i ) δxj dτ. (11.284) dτ dτ τ1
448
Tensors and Local Symmetries
If this first-order variation of the action is to vanish for arbitrary δxj , then the particle must follow the path 0 = −m
duj dxi + q (Ai,j − Aj,i ) dτ dτ
(11.285)
or dpj = qF ji ui dτ
(11.286)
which is the Lorentz force law (11.88).
11.42 The Invariant Action for a Particle The invariant action for a particle of mass m moving along a path xi (t) is Z Z τ2 1 − gij dxi dxj 2 . dτ = − m S = −m (11.287) τ1
Proceeding as in Eq.(11.281), we compute its variation δdτ as δdτ = δ
q −δ(gij )dxi dxj − 2gij dxi δdxj p −gij dxi dxj = 2 −gij dxi dxj
= − 12 gij,k δxk ui uj dτ − gij ui δdxj = − 12 gij,k δxk ui uj dτ − gij ui dδxj
(11.288)
in which uj = dxj /dτ is the 4-velocity (11.58). The condition of stationary action then is Z τ2 Z τ2 j k i j i dδx 1 dτ (11.289) 0 = δS = − m δdτ = m 2 gij,k δx u u + gij u dτ τ1 τ1 which we integrate by parts keeping in mind that δxj (τ2 ) = δxj (τ1 ) = 0 Z τ2 d(gij ui ) j k i j 1 0=m g δx u u − δx dτ 2 ij,k dτ τ1 Z τ2 dui j k i j i k j 1 δx =m g δx u u − g u u δx − g dτ. (11.290) ij ij,k 2 ij,k dτ τ1 Now interchanging the dummy indices j and k on the second and third terms, we have Z τ2 dui i j i j 1 0=m g u u − g u u − g δxk dτ (11.291) ik,j ik 2 ij,k dτ τ1
11.42 The Invariant Action for a Particle
449
or since δxk is arbitrary 0 = 21 gij,k ui uj − gik,j ui uj − gik
dui . dτ
(11.292)
If we multiply by this equation of motion by g rk and note that gik,j ui uj = gjk,i ui uj , then we find dur + 12 g rk (gik,j + gjk,i − gij,k ) ui uj . (11.293) dτ Finally, using the symmetry gij = gji and the formula (11.233) for Γrij , we get dur + Γrij ui uj (11.294) 0= dτ or equivalently i j d2 xr r dx dx 0= + Γ (11.295) ij dτ 2 dτ dτ which is the geodesic equation. In empty space, particles fall along geodesics independently of their masses. One may show (Weinberg, 1972) that the right-hand side of the geodesic equation (11.295) is a 4-vector because under general coordinate transformations, the inhomogeneous terms arising from x ¨r cancel those from Γrij x˙ i x˙ j . The action for a particle of mass m and charge q in a gravitational field r Γij and an electromagnetic field Ai is Z Z τ2 1 i j 2 S = −m − gij dx dx +q Ai (x) dxi (11.296) 0=
τ1
because the interaction q Ai dxi is invariant under general coordinate transformations. By Eqs.(11.284 & 11.291), the first-order change in S is Z τ2 dui i j i j i k 1 δS = m 2 gij,k u u − gik,j u u − gik dτ + q (Ai,k − Ak,i ) u δx dτ τ1 (11.297) and so by combining the Lorentz force law (11.286) and the geodesic equation (11.295) and by writing F ri x˙ i as F ri x˙ i , we have R
i j d2 xr q r dxi r dx dx + Γ − F (11.298) ij dτ 2 dτ dτ m i dτ as the equation of motion of a particle of mass m and change q. Throughout this section, dots denote derivatives with respect to proper time. It is striking how nearly perfect the electromagnetism of Faraday and Maxwell is.
0=
450
Tensors and Local Symmetries
11.43 The Principle of Equivalence The geodesic equation (11.295) follows from the principle of equivalence. To see why, we use Weinberg’s formulation (Weinberg, 1972, ch. 3) of Einstein’s principle of equivalence: In any gravitational field, one may choose free-fall coordinates in which all physical laws take the same form as in special relativity without acceleration or gravitation—at least over a suitably small volume of space-time. How small? Small enough so that the gravitational field is constant and uniform throughout the volume. Suppose a particle is moving under the influence of gravitation alone. Then one may choose coordinates y(x) so that the particle obeys the forcefree equation of motion d2 y i =0 dτ 2
(11.299)
dτ 2 = −ηik dy i dy k .
(11.300)
with dτ the proper time
The chain rule applied to y i (x) in (11.299) gives d ∂y i dxk 0= dτ ∂xk dτ ∂y i d2 xk ∂ 2 y i dxk dxj = + k j . 2 k ∂x dτ ∂x ∂x dτ dτ
(11.301)
We multiply by ∂x` /∂y i and use the identity ∂x` ∂y i = δk` ∂y i ∂xk
(11.302)
to get the equation of motion (11.299) in the x-coordinates k j d2 xk ` dx dx + Γ =0 kj dτ 2 dτ dτ
(11.303)
in which the affine connection is Γ`kj =
∂x` ∂ 2 y i . ∂y i ∂xk ∂xj
(11.304)
So the principle of equivalence tells that a particle in a gravitational field obeys the geodesic equation (11.295).
11.44 Weak, Static Gravitational Fields
451
11.44 Weak, Static Gravitational Fields Slow motion in a weak, static gravitational field is an important example. Because the motion is slow, we neglect ui compared to u0 and simplify the geodesic equation (11.294) to dur + Γr00 (u0 )2 . (11.305) dτ Because the gravitational field is static, we neglect the time derivatives gk0,0 and g0k,0 in the formula (11.233) for Γr00 and find 0=
Γr00 =
1 2
g rk (g0k,0 + g0k,0 − g00,k )
= − 12 g rk g00,k
(11.306)
and Γ000 = 0. Because the field is weak, the metric must differ from ηij by at most a tiny tensor hij gij = ηij + hij
(11.307)
so that to first order in |hij | 1 Γr00 = − 12 h00,r
(11.308)
for r = 1, 2, 3. With these three simplifications, the geodesic equation (11.295) reduces to d2 xr = 12 (u0 )2 h00,r (11.309) dτ 2 or 2 1 dx0 d2 xr = h00,r . (11.310) dτ 2 2 dτ So for slow motion the ordinary acceleration is d2 x c2 = ∇h00 . (11.311) dt2 2 In this limit of slow motion in weak static fields, Newton’s theory of gravitation works well. If φ is his potential, then g00 = −1 + h00 = −1 − 2φ/c2
and so
h00 = −2φ/c2 .
(11.312)
Thus, if the particle is at a distance r from a mass M, then φ= −
GM r
φ 2GM = c2 rc2
(11.313)
d2 x GM r = − ∇φ = ∇ = − GM 3 . 2 dt r r
(11.314)
whence
h00 = −2
and
452
Tensors and Local Symmetries
How weak are the static gravitational fields we know about? If g00 = −1 − 2φ/c2 , then |φ/c2 | ≈ 10−39 on the surface of a proton, 10−9 on the Earth, 10−6 on the surface of the sun, and 10−4 on the surface of a white dwarf.
11.45 Gravitational Time Dilation The invariant interval of proper time q dτ = (1/c) −gij dxi dxj
(11.315)
in a free-fall frame with local coordinates y k is q dτ = (1/c) −ηij dy i dy j .
(11.316)
In rest frames in the two coordinate systems, it is dτ = dy 0 /c = dty
(11.317)
and dτ =
√
−g00 dx0 /c =
√
−g00 dtx .
(11.318)
So the time interval dtx in the arbitrary coordinates xk is related to that in a free-fall rest frame by dty dτ dtx = p =p . −g00 (x) −g00 (x)
(11.319)
If the field is weak and static, then by (11.312) −g00 = 1 + 2 φ/c2
(11.320)
which generally is slightly less than unity since φ/c2 is negative and tiny. So the apparent time dtx dty dτ dtx = p =p 2 1 + 2 φ(x)/c 1 + 2 φ(x)/c2
(11.321)
is longer than the rest free-fall time dty which is the same as the invariant proper time dτ . Clocks appear to slow down near big masses.
11.46 Examples of Gravitational Time Dilation
453
11.46 Examples of Gravitational Time Dilation Since frequency is inverse to time, it follows from the relation (11.318) between proper time and apparent time that the apparent frequency ν of a clock at rest in a gravitational field is √ ν = ν0 −g00 (11.322) where ν0 is the clock’s proper or intrinsic frequency. In the limit of slow motion in weak static gravitational fields, −g00 is given by (11.320), and so the frequency ν(φ) at gravitational potential φ is p ν(φ) = ν(0) 1 + 2 φ/c2 . (11.323) The ratio of two frequencies in two potentials is s ν(φ2 ) 1 + 2 φ2 /c2 = ν(φ1 ) 1 + 2 φ1 /c2
(11.324)
and, since the fields are weak, we may approximate it as ν2 − ν1 1 + φ2 /c2 ν1 + ν2 − ν1 =1+ ≈ ≈ 1 + (φ2 − φ1 )/c2 ν1 ν1 1 + φ1 /c2
(11.325)
or as ∆ν ∆φ = 2 ν c
(11.326)
where ∆ν = ν2 − ν1 and ∆φ = φ2 − φ1 . Since the dimensionless potential at the surface of the Sun is φ /c2 = −2.12 × 10−6 , light from there is shifted to the red by 2 parts per million. Pound and Rebka in 1960 used the M¨ossbauer effect to measure the blue shift of light falling down a 22.6 m shaft. They found a blue shift of ∆ν ∆φ gh = 2 = 2 = 2.46 × 10−15 . ν c c
(11.327)
11.47 Curvature The curvature tensor is i Rmnk = Γimn,k − Γimk,n + Γikj Γjnm − Γinj Γjkm
(11.328)
which we may write as the commutator i Rmnk = (Rnk )i m = [∂k + Γk , ∂n + Γn ]i m
(11.329)
454
Tensors and Local Symmetries
in which the Γ’s are treated as matrices, 0 Γk 0 Γ0k 1 1 Γ1 k 0 Γk 1 Γk = 2 Γ2 k 0 Γk 1 3 Γk 0 Γ3k 1
that is Γ0k 2 Γ1k 2 Γ2k 2 Γ3k 2
Γ0k 3 Γ1k 3 Γ2k 3 Γ3k 3
(11.330)
and (Γk Γn )i m = Γikj Γjnm and − (Γn Γk )i m = −Γinj Γjkm . Sign ambiguity: Just as there are two popular conventions for the Faraday tensor Fik which differ by a minus sign, so too there are two conventions for the curvature i tensor Rmnk . Weinberg (Weinberg, 1972) et al. use the definition (11.328), while Carroll (Carroll, 2003) et al. use minus that. The Ricci tensor is a contraction of the curvature tensor n Rmk = Rmnk
(11.331)
and the curvature scalar is a further contraction R = g mk Rmk .
(11.332)
11.48 Example: Curvature of a Sphere While in four-dimensional space-time, the indices in Eqs.(11.328–11.332) run from 0 to 3, on the sphere, the indices are simply θ and φ. So there are only eight possible affine connections, and because of their symmetry Γiθφ = Γiφθ , as shown by (11.195), only six are independent. The point p on a sphere of radius r has xyz-coordinates p = r (sin θ cos φ, sin θ sin φ, cos θ)
(11.333)
so the two 3-vectors are eθ =
∂p = r (cos θ cos φ, cos θ sin φ, − sin θ) = r θˆ ∂θ
(11.334)
∂p ˆ = r sin θ (− sin φ, cos φ, 0) = r sin θ φ ∂φ
(11.335)
and eφ =
and the metric gij = ei · ej is (gij ) =
r2 0 . 0 r2 sin2 θ
(11.336)
455
11.48 Example: Curvature of a Sphere
Differentiating the vectors eθ and eφ , we find eθ,θ = − r (sin θ cos φ, sin θ sin φ, cos θ) = −r rˆ ˆ eθ,φ = r cos θ (− sin φ, cos φ, 0) = r cos θ φ
(11.337) (11.338)
eφ,θ = eθ,φ
(11.339)
eφ,φ = − r sin θ (cos φ, sin φ, 0) .
(11.340)
The metric with upper indices (g ij ) is the inverse of the metric (gij ) −2 r 0 ij (g ) = (11.341) 0 r−2 sin−2 θ so the dual vectors ei are e θ = r−1 (cos θ cos φ, cos θ sin φ, − sin θ) = r−1 θˆ 1 ˆ 1 (− sin φ, cos φ, 0) = φ. eφ = = r sin θ r sin θ The affine connections are given by (11.192) as Γijk = Γikj = e i · ej,k .
(11.342)
(11.343)
Since both eθ and e φ are perpendicular to rˆ, the affine connections Γθθθ and ˆ so Γφ = 0 as well. Similarly, Γφθθ both vanish. Also, eφ,φ is orthogonal to φ, φφ ˆ so Γθ = Γθ also vanishes. eθ,φ is perpendicular to θ, θφ
φθ
The two non-zero affine connections are ˆ · r cos θ φ ˆ = cot θ Γφθφ = e φ · eθ,φ = r−1 sin−1 θ φ
(11.344)
and Γθφφ = e θ · eφ,φ = − sin θ (cos θ cos φ, cos θ sin φ, − sin θ) · (cos φ, sin φ, 0) = − sin θ cos θ.
(11.345)
The curvature scalar for the two-sphere is by (11.332 & 11.331) R = g θθ Rθθ + g φφ Rφφ θ = g θθ Rθθθ +
φ g θθ Rθφθ
(11.346) θ + g φφ Rφθφ +
φ g φφ Rφφφ .
(11.347)
The curvature tensor involves the affine matrices, which in turn involve the two non-zero affine connections Γφθφ = Γφφθ = cot θ and Γθφφ = − sin θ cos θ. So the two Christoffel matrices are ! 0 0 0 0 Γθ = = (11.348) 0 cot θ 0 Γφθφ
456
Tensors and Local Symmetries
and Γφ =
0 Γθφφ Γφφθ 0
!
=
0 − sin θ cos θ . cot θ 0
(11.349)
Their commutator is 0 cos2 θ cot2 θ 0
[Γθ , Γφ ] =
= −[Γφ , Γθ ]
(11.350)
and both [Γθ , Γθ ] and [Γφ , Γφ ] vanish. So the commutator formula (11.329) gives θ Rθθθ = [∂θ + Γθ , ∂θ + Γθ ]θ θ = 0 φ Rθφθ = [∂θ + Γθ , ∂φ + Γφ ]φθ = (Γφ,θ )φθ + [Γθ , Γφ ]φθ
= (cot θ),θ + cot2 θ = −1 θ Rφθφ = [∂φ + Γφ , ∂θ + Γθ ]θ φ = − (Γφ,θ )θ φ + [Γ φ , Γθ ]θ φ
= cos2 θ − sin2 θ − cos2 θ = − sin2 θ φ Rφφφ = [∂φ + Γφ , ∂φ + Γφ ]φφ = 0
(11.351)
and thus by (11.341 & 11.347), the curvature scalar R is φ θ =− + g φφ Rφθφ R = g θθ Rθφθ
1 1 2 − 2 = − 2 2 r r r
(11.352)
for a sphere of radius r. Gauss invented a formula for the curvature K of a surface; for all twodimensional surfaces, his K = −R/2.
11.49 Einstein’s Equations The source of the gravitational field is the energy-momentum tensor Tij . In many astrophysical and most cosmological models, the energy-momentum tensor is assumed to be that of a perfect fluid, which is isotropic in its rest frame, does not conduct heat, and has zero viscosity. For a perfect fluid of pressure p and density ρ with 4-velocity ui (defined by (11.58)), the energymomentum (aka, stress-energy) tensor Tij is Tij = p gij + (p + ρ) ui uj
(11.353)
in which gij is the space-time metric. An important special case is the energy-momentum tensor due to a nonzero value of the energy density of the vacuum. In this case p = −ρ and the
11.50 Standard Form
457
energy-momentum tensor is Tij = −ρ gij
(11.354)
in which ρ is the (presumably constant) value of the energy density of the ground state of the theory. This energy density ρ is a plausible candidate for the dark-energy density. It is equivalent to a cosmological constant Λ = 8πGρ. Whatever its nature, the energy-momentum tensor usually is defined so as to satisfy the conservation law 0 = (T ab );a = ∂a T ab + Γaac T cb − T ac Γcba .
(11.355)
In terms of the energy-momentum tensor, Einstein’s field equations are Rij − 21 gij R = −8π G Tij
(11.356)
in which the Ricci tensor Rij is defined by Eq.(11.331) and the scalar curvature R is defined by Eq.(11.332). Here G = 6.7087 × 10−39 ~c (GeV/c2 )−2 is Newton’s constant, which in mksa units is G = 6.6742 × 10−11 m3 kg−1 s−2 . Taking the trace of Einstein’s equations (11.356) and using g ji gij = δjj = 4, one has R − 2R = −8πG T ii
(11.357)
R = 8πG T ii .
(11.358)
or
So another form of Einstein’s equation (11.356) is T Rij = −8πG Tij − gij 2
(11.359)
in which T = T ii is the trace of the energy-momentum tensor. On small scales, such as that of our solar system, one may neglect dark energy. So in empty space and on small scales, the energy-momentum tensor vanishes Tij = 0 = T , Einstein’s equation is Rij = 0
(11.360)
and the scalar curvature vanishes, R = 0.
11.50 Standard Form Since tensor equations are independent of the choice of coordinates, it makes sense to choose as simple a set of coordinates as possible. For a static and
458
Tensors and Local Symmetries
isotropic gravitational field, this choice is called the standard form. The standard form(Weinberg, 1972) of a static, isotropic metric is dτ 2 = B(r) dt2 − A(r) dr2 − r2 dθ2 + sin2 θ dφ2 (11.361) in which B(r) and A(r) are functions that one may find by solving the field equations (11.356). Since dτ 2 = −gij dxi dxj , the non-zero components of the metric tensor are grr = A(r), gθθ = r2 , gφφ = r2 sin2 θ, and g00 = −B(r). Since gij is diagonal, the components of its inverse are just g rr = A−1 (r), g θθ = r−2 , g φφ r−2 sin−2 θ, and g 00 = −B −1 (r). One may use its definition (11.243) to compute the connection Γiik by differentiating gij . For instance, Γθrθ = 1/r and Γφrφ = 1/r. Once one has Γiik , one may use Eqs.(11.329– 11.331) to compute the Ricci tensor Rij . One finds, for instance (Weinberg, 1972), 0 B 00 (r) 1 B 0 (r) A (r) B 0 (r) 1 A0 (r) Rrr = − + − (11.362) 2B(r) 4 B(r) A(r) B(r) r A(r) in which the primes mean d/dr. 11.51 Schwarzschild’s Solution If one ignores the small dark-energy parameter Λ, one may solve Einstein’s field equations (11.360) in empty space Rij = 0
(11.363)
outside a mass M for the standard form of the Ricci tensor. One finds (Weinberg, 1972) that A(r) B(r) = 1 and that r B(r) = r plus a constant, and one determines the constant by invoking the Newtonian limit g00 = −B → −1 + 2M G/r as r → ∞. In 1916, K. Schwarzschild found the solution 2M G 2M G −1 2 2 2 dτ = 1 − dt − 1 − dr − r2 dθ2 + sin2 θ dφ2 r r (11.364) which can serve as a basis for analyzing orbits around a star. The singularity in 2M G −1 grr = 1 − (11.365) r at the Schwarzschild radius r = 2M G is an artifact of the coordinates; the scalar curvature R and other invariant curvatures are not singular at the Schwarzschild radius. Moreover, for the Sun, the Schwarzschild radius r = 2M G is only 2.95 km, far less than the radius of the Sun, which is
11.52 Black Holes
459
6.955 × 105 km. So the surface at r = 2M G is far from the empty space in which Rij = 0.
11.52 Black Holes In this section, we will not set the speed of light equal to unity. Suppose an uncharged, spherically symmetric cloud or star of mass M has collapsed within a sphere of radius rb less than its Schwarzschild radius r = 2M G/c2 . Then for r > rb , the Schwarzschild metric (11.364) is correct. By Eq.(11.315), the apparent time dt of a process of proper time dτ at r ≥ 2M G/c2 is r √ 2M G dt = dτ / −g00 = dτ / 1 − 2 . (11.366) c r The apparent time dt becomes infinite as r → 2M G/c2 . To outside observers, the star seems frozen in time. Due to the gravitational red shift (11.322), light of frequency ν0 emitted at r ≥ 2M G/c2 will have frequency ν r √ 2M G ν = ν0 −g00 = ν0 1 − 2 (11.367) c r when observed at great distance. Light coming from the surface at r = 2M G/c2 is red-shifted to zero frequency ν = 0. The star is black. It is a black hole with a surface or horizon at its Schwarzschild radius r = 2M G/c2 , although there is no singularity there. If the radius of the Sun were less than its Schwarzschild radius of 2.95 km, then the Sun would be a black hole. The radius of the Sun is 6.955 × 105 km. Usually, we picture a black hole as very dense, but in fact a diffuse sphere of dust with density ρ and radius rb would be a black hole as long as 2M G 4 3 G rb < =2 πr ρ . (11.368) c2 3 b c2 So if the dust cloud is big enough — that is, if s 3c2 rb > 8π G ρ
(11.369)
then it is a black hole. This is true even if the density ρ is small. Of course, the density will not stay small as the cloud contracts inexorably under its own gravitation.
460
Tensors and Local Symmetries
Black holes are not really black. Hawking showed that the intense gravitational field of a black hole of mass M radiates at temperature T =
~ c3 8π k G M
(11.370)
in which k = 8.617343 × 10−5 eV K−1 is Boltzmann’s constant, and ~ is Planck’s constant h = 6.6260693 × 10−34 J s divided by 2π, ~ = h/(2π). The black hole is entirely converted into radiation after a time t=
5120 π G2 3 M ~ c4
(11.371)
proportional to the cube of its mass.
11.53 Cosmology Astrophysical observations tell us that on the largest observable scales, space is flat or very nearly flat; that the visible universe contains at least 1090 particles; and that the cosmic microwave background radiation is isotropic to one part in 105 , apart from a Doppler shift due the motion of the Earth. These and other facts suggest that the Big Bang occurred after a brief era of inflation—very rapid expansion in which the visible universe expanded by exp(60) = 1026 or 60 e-folds. During and for 65,000 years after the Big Bang, the energy density of the universe was mostly that of radiation. The era of radiation was followed by the longer era of matter in which matter dominated the energy density for 8.8 billion years. The last 5 billion years have been the beginning of an era of dark energy, which promises to last forever. During it, dark energy is the largest component of the energy density of the universe, and it accelerates the expansion of the universe. The dark-energy density now is ρv = 1.37 × 10−29 c2 g cm−3 or 75.7 ± 2.1% of the energy density needed to make the universe flat. This critical energy density is ρc = 3H 2 /(8πG) in which H is the Hubble “constant.” Matter makes up 24.6 ± 2.8% of the critical density, and baryons only 4.2 ± 0.2% of it. Baryons are only 17% of the total matter in the visible universe. The other 83% does not interact with light and is called dark matter. It is now some 13.8 billion years since the Big Bang. The present value of the Hubble constant is H0 = 100 h km s−1 Mpc−1 in which the fudge factor h is about 0.72 ± 0.03, and the critical density is ρc = 1.87837 × 10−29 h2 c2 g cm−3 . Although these constants are given in units in which the speed of light c is not unity, the display equations continue to be in units in which c = 1.
11.53 Cosmology
461
Einstein’s equations (11.356) are second-order, non-linear partial differential equations for 10 unknown functions gij (x) in terms of the energymomentum tensor Tij throughout the universe, which, of course, we don’t know. The problem is not quite hopeless, however. The ability to choose arbitrary coordinates, the appeal to symmetry, and the choice of a reasonable form of Tij all help. Hubble showed us that the universe is expanding. The cosmic microwave background—photons left over from the big bang—look the same in all spatial directions (apart from a Doppler shift due to the motion of the Earth relative to the local super-cluster of galaxies). Observations of clusters of galaxies tell us that the universe may be homogeneous on suitably large scales of distance. So it is plausible that the universe is homogeneous and isotropic in space but not in time. One may show (Carroll, 2003) that for a universe of such symmtery, the line element in comoving coordinates is 2
2
ds = −dt + a
2
dr2 2 2 2 2 + r dθ + sin θ dφ . 1 − k r2
(11.372)
Whitney’s embedding theorem tells us that any smooth four-dimensional manifold can be embedded in a flat space of eight dimensions with a suitable signature. Actually, we need only four or five dimensions to embed the space-time described by the line element (11.372). If the universe is closed, then the signature is (−1, 1, 1, 1, 1), and our three-dimensional space is the three-sphere which is the surface of a four-dimensional sphere in four space dimensions. The points of the universe then are p = (t, a sin χ sin θ cos φ, a sin χ sin θ sin φ, a sin χ cos θ, a cos χ)
(11.373)
in which 0 ≤ χ ≤ π, 0 ≤ θ ≤ π, and 0 ≤ φ ≤ 2π. If the universe is flat, then the embedding space is flat, four-dimensional Minkowski space with points p = (t, ar sin θ cos φ, ar sin θ sin φ, ar cos θ)
(11.374)
in which 0 ≤ θ ≤ π and 0 ≤ φ ≤ 2π. If the universe is open, then the embedding space is a flat five-dimensional space with signature (−1, 1, 1, 1, −1), and our three-dimensional space is the surface of a four-dimensional hyperboloid in a flat Minkowski space of four dimensions. The points of the universe then are p = (t, a sinh χ sin θ cos φ, a sinh χ sin θ sin φ, a sinh χ cos θ, a cosh χ) (11.375) in which 0 ≤ χ ≤ ∞, 0 ≤ θ ≤ π, and 0 ≤ φ ≤ 2π.
462
Tensors and Local Symmetries
In all three cases, the corresponding Robertson-Walker metric is −1 0 0 0 0 a2 /(1 − kr2 ) 0 0 gij = (11.376) 0 0 a2 r2 0 0 0 0 a2 r2 sin2 θ in which the coordinates (t, r, θ, φ) are numbered (0, 1, 2, 3), the speed of light is c = 1, and k is a constant. One always may choose coordinates (problem 15) such that k is either 0 or ±1. This constant determines whether the spatial universe is open k = −1, flat k = 0, or closed k = 1. The scale factor a, which in general is a function of time a(t), tells us how space expands and contracts. These coordinates are called comoving because a point at rest (fixed r, θ, φ) sees the same Doppler shift in all directions. Since the metric (11.376) is diagonal, its inverse g ij also is diagonal with entries −1, (1 − kr)a−2 , a−2 r−2 , and a−2 r−2 sin−2 θ. We may use the formula (11.233) to compute the affine connections Γkij = Γkji . Thus for ` = 1, 2, 3 Γ0`` = 21 g 0k (g`k,` + g`k,` − g``,k ) = 12 g 00 (g`0,` + g`0,` − g``,0 ) = 12 g``,0
(11.377)
so that Γ011 =
aa˙ , 1 − kr2
Γ022 = aa˙ r2 ,
and
Γ022 = aa˙ r2 sin2 θ.
(11.378)
in which the dot means a time-derivative. The other Γ0ij ’s vanish. Similarly, for ` = 1, 2, or 3 Γ`0` = 12 g `k (g0k,` + g`k,0 − g0`,k ) = 12 g `` (g0`,` + g``,0 − g0`,` ) a˙ = 12 g `` g``,0 = = Γ``0 for fixed `. a
(11.379)
One may further show that the remaining non-zero Γ’s are Γ122 = −r (1 − kr2 ) Γ133 = −r (1 − kr2 ) sin2 θ 1 Γ212 = Γ313 = = Γ221 = Γ331 r Γ233 = − sin θ cos θ Γ323 = cos θ = Γ332 .
(11.380) (11.381) (11.382)
Our formulas for the Ricci tensor (11.331) and the curvature tensor (11.329)
11.53 Cosmology
463
give n R00 = R0n0 = [∂0 + Γ0 , ∂n + Γn ]n0 .
(11.383)
Clearly the commutator of Γ0 with itself vanishes, and one may use (11.378– 11.382) to check that 2 a˙ (11.384) [Γ0 , Γn ]n0 = Γn0 k Γkn 0 − Γnn k Γk0 0 = 3 a and that ∂0 Γnn 0
2 a˙ a ¨ a˙ = 3 ∂0 =3 −3 a a a
(11.385)
while ∂n Γn0 0 = 0. So the 00-component of the Ricci tensor is a ¨ R00 = 3 . (11.386) a Similarly, one may show that the other non-zero components of Ricci’s tensor are A R22 = −r2 A and R33 = −r2 A sin2 θ (11.387) R11 = − 1 − kr2 in which A = a¨ a +2a˙ 2 +2k. The Ricci scalar (11.332) (aka, scalar curvature) is 6 R = g ab Rba = − 2 a¨ a + a˙ 2 + k . (11.388) a In co-moving coordinates such as those of the Robertson-Walker metric (11.376) ui = (1, 0, 0, 0), and so the energy-momentum tensor (11.353) is just ρ 0 0 0 0 pg11 0 0 . Tij = (11.389) 0 0 pg22 0 0 0 0 pg33 Its trace is T = g ij Tij = −ρ + 3p.
(11.390)
Thus using the metric (11.376) relation g00 = −1, our formula (11.386)for R00 , and these expressions (11.389 & 11.390) for Tij and T , we find that the 00 Einstein equation (11.359) becomes the second-order equation a ¨ 4πG =− (ρ + 3p) . (11.391) a 3 This apparently linear equation in general is non-linear because ρ and 3p
464
Tensors and Local Symmetries
depend upon a. It tells us that the sum ρ + 3p determines the acceleration a ¨ of the scale factor a(t). When negative, it accelerates the expansion of the universe. Because of the isotropy of the metric, the three non-zero spatial Einstein equations (11.359) give us only the one relation 2 a ¨ k a˙ + 2 2 = 4πG (ρ − p) . (11.392) +2 a a a Using the 00-equation (11.391) to eliminate the second derivative a ¨, we have 2 a˙ 8πG k = ρ− 2 a 3 a
(11.393)
which is a first-order non-linear equation. It and the second-order equation (11.391) are known as the Friedmann equations. The LHS of the first-order Friedmann equation (11.393) is the square of the Hubble constant a˙ H= (11.394) a (an oxymoron since a(t) and a(t) ˙ are different functions of time). In terms of H, Friedmann’s first-order equation (11.393) is H2 =
k 8πG ρ − 2. 3 a
(11.395)
The ratio of the energy density ρ to the critical energy density ρc = 3H 2 /(8πG) is called Ω ρ 8πG Ω= = . (11.396) ρc 3H 2 From (11.395), we see that Ω is Ω=1+
k k = 1 + 2. (aH)2 a˙
(11.397)
Thus Ω = 1 both in a flat universe (k = 0) and as aH → ∞. One use of inflation is to expand a by 1026 and so to force Ω to unity. Something like inflation is needed because in a universe in which the energy density is due to matter and/or radiation, the present value of Ω Ω0 = 1.003 ± 0.01
(11.398)
is unlikely. To see why, we note that conservation of energy ensures that a3 times the matter density ρm is constant. Radiation red-shifts by a, so energy
11.53 Cosmology
465
conservation implies that a4 times the radiation density ρr is constant. Such energy densities times an form a constant E ρan = E
(11.399)
where n = 3 for matter and 4 for radiation. In terms of E and n, Friedmann’s first-order equation (11.393) is E 8πG 2 ρa − k = n−2 − k 3 a In the small-a limit, we have a˙ 2 =
a(n−2)/2 da = Edt
(11.400)
(11.401)
which we integrate to a ∼ t2/n so that a˙ ∼ t2/n−1 . Now the first-order Friedmann equation (11.397) implies that 1 t radiation 2−4/n |Ω − 1| = 2 ∝ t = . (11.402) 2/3 t matter a˙ That is, the ratio Ω deviates from unity with the passage of time. So without inflation (or some other way of vastly expanding the scale factor), the present value of this ratio Ω0 = 1.003 ± 0.010 could be so close to unity after 13.8 billion years only if the ratio Ω at t = 1 second had been unity to within one part in 1015 . Manipulating our relation (11.397) between Ω and aH, we see that k . (11.403) Ω−1 So Ω > 1 implies k = 1, and Ω < 1 implies k = −1, and as Ω → 1 the product aH → ∞, which is the essence of flatness since curvature vanishes as the scale factor a → ∞. Staying for the moment with a universe without inflation and with an energy density composed of radiation and/or matter, we note that the firstorder equation (11.400) tells us that for a closed (k = 1) universe, in the limit a → ∞ we’d have a˙ 2 → −1 which is impossible. Thus a closed universe eventually collapses, which is incompatible with the flatness (11.403) implied by the present value Ω0 = 1.003±0.010. The first-order equation Friedmann (11.393) tells us that 3k ρ a2 ≥ . (11.404) 8πG So in a closed universe (k = 1), the energy density ρ is positive, and ρ → ∞ as it collapses a → 0. In open (k < 0) and flat (k = 0) universes, the same Friedmann equation (11.393) tells us that if ρ is positive, then a˙ 2 > 0, which (aH)2 =
466
Tensors and Local Symmetries
means that a˙ never vanishes. Hubble told us that a˙ > 0 now. So if our universe is open or flat, then it always expands. Due to the expansion of the universe, the wave-length of radiation grows with the scale factor a(t). A photon emitted at time t and scale factor a(t) with wave-length λ(t) will be seen now at time t0 and scale factor a(t0 ) to have a longer wave-length λ(t0 ) λ(t0 ) a(t0 ) = =z+1 λ(t) a(t)
(11.405)
in which the redshift z is the ratio z=
∆λ λ(t0 ) − λ(t) = . λ(t) λ
(11.406)
And since H = a/a ˙ = da/(adt), we have dt = da/(aH). And z = a0 /a − 1 implies dz = −a0 da/a2 , we find dt = −
dz (1 + z)H(z)
(11.407)
which relates time intervals to redshift intervals. This computation is trickier for macroscopic intervals, but an on-line calculator is available (Wright, 2006). The 0-component of the energy-momentum conservation law (11.355) is 0 = (T a0 );a = ∂a T a0 + Γaac T c0 − T ac Γc0a = −∂0 T00 − Γaa0 T00 − g cc Tcc Γc0c a˙ a˙ a˙ = − ρ˙ − 3 ρ − 3p = − ρ˙ − 3 (ρ + p) . a a a
(11.408)
or dρ 3 = − (ρ + p) . da a
(11.409)
The energy density ρ is composed of fractions ρk each contributing its own partial pressure pk according to its own equation of state pk = wk ρk
(11.410)
in which wk is a constant. In terms of these components, the energymomentum conservation law (11.409) is X dρk k
da
=−
3 X (1 + wk ) ρk a k
(11.411)
467
11.53 Cosmology
with solution ρ=
X
ρk0
k
a a0
−3(1+wk ) .
(11.412)
Simple cosmological models take the energy density and pressure each to have a single component with p = wρ, and in this case −3(1+w) a ρ = ρ0 . (11.413) a0 Example: If w = −1/3, then ρ = ρ0
a 2 0
a
.
(11.414)
The factor 1/a2 is common to all the terms in Friedmann’s first-order equation (11.393), and so it cancels, leaving a˙ 2 =
8πG ρ0 a20 − k. 3
(11.415)
The RHS of this equation is a non-negative constant, and so the scale factor a(t) grows linearly with the time t. Using a calendar with a(0) = 0, we find a(t) =
8πG ρ0 a20 − k 3
1/2 t.
(11.416)
Example—The Instant of Inflation: Inflation occurs when the ground state of the theory has a positive and constant energy density ρ > 0. It follows that the internal energy of the universe is proportional to its volume U = ρ V . The pressure p is then given by the thermodynamic relation p=−
∂U = −ρ. ∂V
(11.417)
Comparison with the more general equation of state (11.410) tells us that in this case w = −1.
(11.418)
The second-order Friedmann equation (11.391) now is a ¨ 4πG 8πGρ =− (ρ + 3p) = ≡ g2. a 3 3
(11.419)
By it and the first-order Friedmann equation (11.393) and by choosing t = 0
468
Tensors and Local Symmetries
as the time at which the scale factor a is minimal or zero, one may show (problem 21) that in a closed universe, k = 1, a(t) =
cosh g t . g
(11.420)
Similarly, in an open universe, k = −1, we have a(t) =
sinh g t . g
(11.421)
Finally, in a flat universe, k = 0, there are two solutions with an arbitrary scale factor a(0) a(t) = a(0) exp(±g t).
(11.422)
They can’t be added together due to the non-linearity of the first-order equation (11.393). In two non-flat cases, k = ±1, the scale of the universe is fixed at any time. Studies of the cosmic microwave background radiation suggest that inflation did occur in the very early universe—on a time scale of much less than a second. What is the origin of the vacuum energy density ρ that drove inflation? Current theories attribute the high vacuum energy density to the departure of the mean value hφi of at least one scalar field φ from the mean value that minimizes the energy density of the vacuum. When the field’s mean value returned to its equilibrium value, the vacuum energy was released as radiation in a Big Bang. Example—The Era of Radiation: Until a redshift of z & 3000 or somewhat less than 65,000 years after inflation, our universe was dominated by radiation (Frieman et al., 2008). During The First Three Minutes (Weinberg, 1988) of the era of radiation, the quarks and gluons formed hadrons, which decayed into protons and neutrons. As the neutrons decayed (τ = 885.7 s), they and the protons formed the light elements—principally hydrogen, deuterium, and helium in a process called big-bang nucleosynthesis. We can guess the value of w for radiation by noticing that the energymomentum tensor of the electromagnetic field (in suitable units) 1 T ab = F ac F bc − g ab Fcd F cd 4
(11.423)
1 T = T aa = F ac Fa c − δaa Fcd F cd = 0. 4
(11.424)
is traceless
11.53 Cosmology
469
But by (11.390) its trace must be T = 3p − ρ. So for radiation 1 1 ρ and so w = . (11.425) 3 3 The relation (11.413) between the energy density and the scale factor then is a 4 0 ρ = ρ0 . (11.426) a p=
The energy drops both with the volume a3 and with the scale factor a due to a redshift; so it drops as 1/a4 . In view of (11.426), the quantity 8πGρa4 (11.427) 3 is a constant. The Friedmann equations (11.391 & 11.392) now are C2 =
a ¨ 4πG 8πGρ =− (ρ + 3p) = − . a 3 3
(11.428)
or a ¨=−
C2 a3
(11.429)
and C2 . (11.430) a2 For a flat universe (k = 0), (11.430) tells us that with a suitable calendar a˙ 2 + k =
a(t) = (2C t)1/2 . For a k = 1 (closed) universe, we have s t 2 a(t) = C 1 − 1 − C
(11.431)
(11.432)
again we chose a calendar with a(0) = 0. The scale factor for an open (k = −1) universe then goes (6.213 & 6.214) as s t 2 a(t) = C 1+ − 1. (11.433) C Example—The Era of Matter: A universe composed only of dust or non-relativistic collisionless matter has no pressure, so p = w = 0. So by (11.413), the energy density falls with the volume a 3 0 ρ = ρ0 . (11.434) a
470
Tensors and Local Symmetries
As the scale factor a(t) increases, the matter energy density, which falls as 1/a3 , eventually dominates the radiation energy density, which falls as 1/a4 . This happened in our universe somewhat less than 65,000 years after inflation. Were baryons most of the matter, the era of radiation dominance would have lasted for a few hundred thousand years. But the kind of matter that we know about, which interacts with photons, is only about 17% of the total; the rest is called dark matter, and it shortened the era of radiation dominance by nearly 2 million years. Since ρ ∝ 1/a3 , the quantity D2 =
4πGρa3 3
(11.435)
is a constant. For a matter-dominated universe, the Friedmann equations (11.391 & 11.392) then are 4πG 4πGρ a ¨ =− (ρ + 3p) = − . a 3 3
(11.436)
or a ¨=−
D2 a2
(11.437)
and a˙ 2 + k = 2
D2 . a
(11.438)
For a flat universe, k = 0, we get 3D 2/3 . a(t) = √ t 2 For a closed universe, k = 1, using (6.217) we integrate r D2 a˙ = 2 −1 a
(11.439)
(11.440)
to p t − t0 = − a(2D2 − a) − D2 arcsin(1 − D−2 a).
(11.441)
With a suitable calendar and choice of t0 , one may parametrize this solution in terms of the development angle φ(t) as a(t) = D2 [1 − cos φ(t)] t = D2 [φ(t) − sin φ(t)] .
(11.442)
11.53 Cosmology
For an open universe, k = −1, using (6.220) we integrate r D2 +1 a˙ = 2 a
471
(11.443)
to n o 1/2 1/2 t−t0 = a(2D2 + a) −D2 ln 2 a(2D2 + a) + 2a + 2D2 . (11.444) The conventional parametrization is a(t) = D2 [cosh φ(t) − 1] t = D2 [sinh φ(t) − φ(t)] .
(11.445)
Transparency: Some 300,000 years after inflation, at a redshift of z ∼ 1250, the temperature dropped to a level at which the photons could no longer ionize hydrogen. Ordinary matter became a gas of neutral atoms rather than a plasma of ions and electrons, and the universe suddenly became transparent to light. This moment, often called recombination, should be called transparency. The Accelerating Universe: Somewhat more than 8.8 billion years after inflation, at a redshift of z & 0.5, the matter density falling as 1/a(t)3 dropped below the very small but positive value of the energy density ρ0 of the vacuum. The present time is 13.8 billion years after inflation. So for the past 5 billion years, this constant energy density, called dark energy, has accelerated the expansion of the universe approximately as (11.421) p (11.446) a(t) = a(tm ) exp t 8πGρ0 /3 in which tm = 8.8 × 109 years. It is somewhat embarrassing to relativists that on the largest scales, space is flat. At least to the precision of the measurements currently (2008) available, the universe is flat: k = 0. So the evolution of the scale factor a(t) is given by the k = 0 equations (11.421, 11.431, 11.439, & 11.446) for a flat universe. During the brief era of inflation, the scale factor a(t) grew as p a(t) = a(0) exp t 8πGρi /3 (11.447) in which ρi is the positive energy density that drove inflation. During the 65,000 years of the era of radiation, a(t) grew as p 1/2 a(t) = 2 t 8πGρ(t0r )a4 (t0r )/3 + a(ti ) (11.448) where ti is the time at the end of inflation, and t0r is any time during the era
472
Tensors and Local Symmetries
of radiation (recall ρ(t)a4 (t) is a constant during that era). During the 8.8 billion years of the matter era, a(t) grew as h i2/3 p (11.449) a(t) = (t − tr ) 3πGρ(t0m )a(t0m ) + a3/2 (tr ) where tr is the time at the end of the radiation era, and t0m is any time in the matter era. Over the past 5 billion years of the era of vacuum dominance, a(t) has been growing exponentially p a(t) = a(tm ) exp t 8πGρv /3 (11.450) in which tm is the time at the end of the matter era, and ρv is the density of dark energy, which while vastly less than the energy density ρi that drove inflation, nevertheless amounts to 76% of the total energy density. 11.54 Gauge Theory So far in this chapter, we have considered changes in points dp(x) = ei (x) dxi
(11.451)
in space or space-time due to changes in their coordinates. In this section, we will discuss field points f (x) = ea (x) ψ a (x)
(11.452)
in an internal space and changes in field points df (x) = ea (x) dψ a (x)
(11.453)
due to changes in their coordinates, which are fields ψ a (x). The interesting case is when the physically accessible fields f lie in a curved manifold of n fields that is embedded in a flat space of higher dimension N > n. In this case, the n vectors ea have N > n components. 11.55 Abelian Gauge Theory The simplest such theory occurs for n = 1; it has a single field ψ and a single complex vector e of unit length n = 1 e∗ · e =
N X
e∗α eα = 1.
(11.454)
α=1
The field point f = ψe
(11.455)
473
11.55 Abelian Gauge Theory
is invariant under multiplication of the field ψ(x) by a phase factor exp(iqθ(x)) that depends upon the space-time point x ψ 0 (x) = eiqθ(x) ψ(x)
(11.456)
as long as we also multiply the vector e(x) e0 (x) = e−iqθ(x) e(x)
(11.457)
by exp(−iqθ(x)). In electrodynamics, this symmetry is called gauge invariance of the second kind. The invariance group is U (1), the group of 1 × 1 unitary matrices — all phase factors exp(iqθ(x)). The covariant derivative of the contravariant vector field f = ψ e is by analogy with Eq.(11.200) is ψ;j = e∗ · (ψe),j = ψ,j + Γj ψ
(11.458)
where the affine connection Γj (rather undecorated with indices) is Γj = e∗ · e,j
(11.459)
by analogy with Eq.(11.192) or by direct differentiation of ψ e. The affine connection Γj is purely imaginary because e∗ · e = 1 whence 0 = (e∗ · e),j = e∗,j · e + e∗ · e,j = (e∗ · e,j )∗ + e∗ · e,j .
(11.460)
So one usually writes Γj = −iq Aj
(11.461)
where Aj is an abelian gauge field such as the electromagnetic field Aj = (i/q) Γj = (i/q) e∗ · e,j .
(11.462)
Under the gauge transformations (11.456 & 11.457) the electromagnetic field Aj (x) changes by A0j (x) = (i/q) e0∗ · e0,j = (i/q) e∗ eiqθ(x) · e−iqθ(x) e ,j
= (i/q) e∗ · e,j + (i/q) e∗ · e − iqθ(x),j = Aj (x) + θ(x),j
(11.463)
The conventional way of writing the covariant derivative (11.458) is Dj ψ = ψ;j = ψ,j − iq Aj ψ.
(11.464)
Since ψe is invariant under the gauge transformations (11.456 & 11.457), the covariant derivative Dj ψ transforms as (Dj ψ)0 = (ψ;j )0 = e0∗ · ψ 0 e0 ,j = eiqθ(x) e∗ · (ψe),j = eiqθ(x) Dj ψ. (11.465)
474
Tensors and Local Symmetries
That is, it transforms like the field ψ.
11.56 Non-Abelian Gauge Theory Now we consider a theory in which the field points f (x) = ea (x) ψ a (x)
(11.466)
are linear combinations of n > 1 fields ψ a and n complex orthonormal vectors ea δab = e∗a · eb =
N X
α e∗α a · eb
(11.467)
α=1
where N > n > 1. The field point f (x) is invariant under a matrix transformation ψ 0 a (x) = Uab (x) ψ b (x)
(11.468)
as long as the vectors ea (x) are replaced by e0a (x) = (U −1 T )ab eb (x)
(11.469)
for then e0a (x) ψ 0 a (x) = eb (x) (U −1 T )ab Uac (x) ψ c (x) −1 = eb (x) Uba Uac (x) ψ c (x)
= eb (x) δbc ψ c (x) = eb (x) ψ b (x).
(11.470)
To keep the new vectors e0a (x) orthonormal, we require that the matrix U (x) be unitary U † (x) = U −1 (x).
(11.471)
But every compact Lie group G has finite-dimensional unitary representations D(g), so we may set the matrix U (x) equal to one of them U (x) = D(g(x)).
(11.472)
Thus, the invariance group can be any compact Lie group. In particular, the formula (11.469) for e0a (x) simplifies to ∗ e0a (x) = Uab eb (x).
(11.473)
Now either by direct differentiation or by analogy with Eq.(11.200), the
11.56 Non-Abelian Gauge Theory
475
covariant derivative of the contravariant vector field f = ψ a ea is ψ;jb = e∗b · (ψ a ea ),j = e∗b · ea ψ,ja + e∗b · ea,j ψ a = δab ψ,ja + e∗b · ea,j ψ a = ψ,jb + e∗b · ea,j ψ a = ψ,jb + Γbja ψ a
(11.474)
in which Γbja = e∗b · ea,j
(11.475)
analogously to Eq.(11.192). The affine connection Γbja is no longer symmetric in its two lower indices. But it is anti-hermitian because the orthonormality (11.467) of the vectors ea implies 0 = (e∗a · eb ),j = e∗a,j · eb + e∗a · eb,j .
(11.476)
Thus, the matrix Aj defined by (Aj )ba = i Γbja = i e∗b · ea,j
(11.477)
A† = A
(11.478)
is hermitian
since by (11.476) (A†j )ab = (AT j )∗ab = (Aj )∗ba = (i e∗b · ea,j )∗ = −i e∗a,j · eb = i e∗a · eb,j = (Aj )ab . (11.479) In non-abelian gauge theory, one usually writes the covariant derivative (11.474) as Dj ψ = Djb ψ = Djba ψ a = ψ;jb = ψ,jb + Γbja ψ a = ψ,jb − iAjba ψ a
(11.480)
or as Dj = ∂j − iAj
(11.481)
in which Aj is the matrix of gauge fields Ajba , which contains a charge or coupling constant q as a factor. The distinction between upper and lower internal indices is meaningless since the vectors ea are orthonormal, as shown by (11.467). Under a non-abelian gauge transformation, the fields and vectors transform as in Eqs.(11.468 & 11.469). By its definition (11.474), the covariant derivative ψ;jb is the dot-product of e∗b with the space-time derivative of the
476
Tensors and Local Symmetries
invariant field point f = ψ a ea . Since f is invariant under U (n) gauge transformations, the covariant derivative ψ;jb varies as e∗b (Djb ψ(x))0 = e0b ∗ (x) · (ψ a (x) ea (x))0,j = e0b ∗ (x) · (ψ a (x) ea (x)),j †−1 = Ubc (x) e∗c (x) · (ψ a (x) ea (x)),j
= Ubc (x) e∗c (x) · (ψ a (x) ea (x)),j = Ubc (x) Djc ψ(x)
(11.482)
that is, like ψb (x) or covariantly. This rule looks better in matrix notation Dj0 = U Dj U −1 .
(11.483)
Since the vectors transform as −1T −1 e0a (x) = Uab (x) eb (x) = eb (x) Uba (x) = eb (x) U † (x)ba
(11.484)
it follows that the matrix of gauge fields transforms as ∗ † −1 0 ed Uda · e = i e U A0jba = ie0∗ c a,j b cb ,j −1 −1 = ee∗c ed,j Ubc Uda + iδcd Ubc Uda,j −1 −1 + iUbc Uca,j . = Ubc Ajcd Uda
(11.485)
This relation looks better in matrix notation with U (x) = D(g(x)) A0j (x) = D(g(x)) Aj (x) D−1 (g(x)) + iD(g(x)) D−1 (g(x)),j .
(11.486)
The nonabelian field-strength tensor is Fij = [Di , Dj ] = [∂i − iAi , ∂j − iAj ] = −i (Aj,i − Ai,j ) − [Ai , Aj ]. (11.487) Since the covariant derivative Dj transforms covariantly (11.483), so does Fij Fij0 = U Fij U −1 .
(11.488)
The trace of its square is the invariant TrFij F ij
(11.489)
which is used to form the invariant action of a nonabelian gauge theory along with the Fermi term ψ γ j ∂j + m ψ = Ψ γ j Dj + m Ψ (11.490) in which ψ = ea ψ a and Ψ is a conventional vector with components ψ a . Abelian and non-abelian theories are independent of the choice of gauge. The action of such a theory is invariant under gauge transformations of the
477
11.57 Higher Gauge Symmetry
fields by a unitary matrix U (x) that depends upon the space-time point x. The action contains derivatives of the fields and would not be invariant without the covariant derivatives (11.464) and (11.474) that transform like the fields themselves, as in Eqs.(11.465) and (11.482). The interactions of electrodynamics and of the standard model of chromodynamics and electroweak dynamics all arise from the gauge fields Aj and Ajab that in turn are used to achieve invariance under gauge transformations that vary with the space-time point x. Symmetry requires these interactions, just as invariance under general coordinate transformations requires a metric gij whose derivatives represent gravitational fields.
11.57 Higher Gauge Symmetry This section may be skipped on a first reading. The requirement (11.467) that the vectors ea (x) be orthonormal is gratuitous, as far as I know. If we relax it, then these vectors generate an internal metric gab (x) = e∗a (x) · eb (x)
(11.491)
that is not merely the identity matrix. We assume that gab is non-singular, and we will use it and its inverse g ab to lower and to raise indices, as in ea (x) = g ab (x) eb (x).
(11.492)
The field point f (x) = ψ a (x) ea (x) is invariant under a matrix transformation ψ 0 a (x) = Z ab (x) ψ b (x)
(11.493)
as long as the vectors ea (x) are replaced by e0a (x) = (Z −1 T )ab (x) eb (x)
(11.494)
for then f (x) = ea (x) ψ a (x) e0a (x) ψ 0 a (x) = eb (x) (Z −1 T )ab (x) Z ac (x) ψ c (x) = eb (x) (Z −1 )b a (x) Z ac (x) ψ c (x) = eb (x) δbc ψ c (x) = eb (x) ψ b (x)
(11.495)
is invariant. The metric transforms as 0
0 gab (x) = ea∗ (x) · e0b (x) = (Z −1† )ac (x) (Z −1 T )b d (x) e∗c (x) · ed (x)
= (Z −1† (x))ac (Z −1 T (x))b d gcd (x)
(11.496)
478
Tensors and Local Symmetries
and the invariance group now is GL(n, C), the group of all n×n non-singular complex matrices. The dual vectors e∗a (x) are orthonormal to the vectors eb (x) e∗a (x) · eb (x) = δba .
(11.497)
To maintain this relation after a GL(n, C) transformation δba = e0∗a (x) · e0b (x) = e0∗a (x) · (Z −1 T )b d (x) ed (x)
(11.498)
the dual vector e∗a (x) must transform as e0∗a (x) = Z ac (x) e∗c (x)
(11.499)
for then, leaving out the explicit x-dependence, we have e0∗a · e0b = Z ac e∗c · (Z −1 T )b d ed = Z ac (Z −1 T )b d δdc = Z ac (Z −1 T )b c = Z ac (Z −1 )c b = δba
(11.500)
Now either by direct differentiation or by analogy with Eq.(11.200), the covariant derivative of the contravariant vector field f = ψ a ea is ψ;jb = e∗b · (ψ a ea ),j = ψ,jb + e∗b · ea,j ψ a = ψ,jb + Γbja ψ a
(11.501)
in which Γbja = e∗b · ea,j
(11.502)
analogously to Eq.(11.192). As in Eq.(11.482) for unitary transformations, under a GL(n, C) transformation, the point ψ a ea remains invariant, and so the covariant derivative transforms as e∗b or like ψ b (ψ;jb )0 = e0∗b · (ψ a ea ),j = Z bc e∗c · (ψ a ea ),j = Z bc ψ;jc
(11.503)
that is, covariantly. So the form ψ a∗ gab ψ;jb is an internal scalar and a covariant vector ∂xi a∗ (ψ a∗ gab ψ;jb )0 = ψ gab ψ;ib . (11.504) ∂x0j 11.58 Problems 1. Show that the matrix (11.38) satisfies the Lorentz condition (11.37). 2. The LHC collides 7 TeV protons against 7 TeV protons for a total collision energy of 14 TeV. Suppose one used a linear accelerator to fire a beam of protons at a target of protons at rest at one end of the accelerator. What energy would you need to see the same physics as at the LHC?
479
11.58 Problems
3. Use Gauss’s law and the Maxwell-Amp`ere law (11.79) to show that the microscopic (total) current 4-vector j = (cρ, j) obeys the continuity equation ρ˙ + ∇ · j = 0. 4. In rectangular coordinates, use the Levi-Civita identity (1.528) to derive the curl-curl equations (11.82. 5. Use the flat-space formula (11.174) to compute the change dp due to dρ, dφ, and dz to derive the expressions (11.175) for the orthonormal basis ˆ and zˆ. vectors ρ, ˆ φ, 6. Similarly, derive (11.180) from (11.179). 7. Using the formulas (11.180) for the basis vectors of spherical coordinates in terms of those of rectangular coordinates, compute the derivatives of ˆ and φ ˆ with respect to the variables r, θ, and φ. Your the unit vectors rˆ, θ, formulas should express these derivatives in terms of the basis vectors rˆ, ˆ and φ. ˆ (b) Using the formulas of (a), derive the formula (11.266) for θ, the laplacian ∇ · ∇. 8. Consider the torus with coordinates θ, φ labeling the arbitrary point p = (cos φ(R + r sin θ), sin φ(R + r sin θ), r cos θ)
(11.505)
in which R > r. Both coordinates run from 0 to 2π. (a) Find the basis vectors eθ and eφ . (b) Find the metric tensor and its inverse. 9. For the same torus, (a) find the dual vectors eθ and eφ and (b) find the non-zero connections Γijk where i, j, & k take the values θ & φ. 10. For the same torus, (a) Find the two Christoffel matrices Γθ and Γφ , θ , Rφ , (b) find their commutator [Γθ , Γφ ], and (c) find the elements Rθθθ θφθ θ , and Rφ Rφθφ φφφ of the curvature tensor.
11. Find the curvature scalar R of the torus with points (11.505). Hint: In these four problems, you may imitate the corresponding calculation for the sphere in Sec. 11.48. 12. For the points (11.373), derive the metric (11.376) with k = 1. Don’t forget to relate dχ to dr. 13. For the points (11.374), derive the metric (11.376) with k = 0. 14. For the points (11.375), derive the metric (11.376) with k = −1. Don’t forget to relate dχ to dr. 15. Suppose the constant k in the Roberson-Walker metric (11.372 or 11.376) is some number other than 0 or ±1. Find a coordinate transformation
480
Tensors and Local Symmetries
such that in the new coordinates, the Roberson-Walker metric has k = k/|k| = ±1. 16. Derive the affine connections in Eq.(11.380). 17. Derive the affine connections in Eq.(11.381). 18. Derive the affine connections in Eq.(11.382). 19. Derive the spatial Einstein equation (11.392) from (11.359, 11.376, 11.387, 11.389, & 11.390). 20. Derive the relation (11.413) between the energy density ρ and the RobertsonWalker scale factor a(t) from the conservation law (11.409) and the equation of state p = wρ. 21. Use the Friedmann equations (11.391 & 11.393) with ρ = −p and k = 1 to derive (11.420) subject to the boundary condition that a(t) has its minimum at t = 0. 22. Use the Friedmann equations (11.391 & 11.393) with w = −1 and k = −1 to derive (11.421) subject to the boundary condition that a(0) = 0. 23. Use the Friedmann equations (11.391 & 11.393) with w = −1 and k = 0 to derive (11.422). Show why a linear combination of the two solutions (11.422) does not work. 24. Use the Friedmann equations (11.391 & 11.393) with w = 1/3 and k = 0 to derive (11.431) subject to the boundary condition that a(0) = 0. 25. Derive Eq.(11.490).
12 Forms
Many treatments of differential forms are confusing. Two clear discussions are V. I. Arnold’s (Arnold, 1989) and B. Schutz’s (Schutz, 1980); they form the basis of this chapter.
12.1 Exterior Forms 1-Forms: A 1-form is a linear function ω that maps vectors into numbers. Thus, if A and B are vectors in Rn and z and w are numbers, then ω(zA + wB) = z ω(A) + w ω(B).
(12.1)
The n coordinates x1 . . . xn are 1-forms; they map a vector A into its coordinates: x1 (A) = A1 , . . . , xn (A) = An . Every 1-form may be expanded in terms of these basic 1-forms as ω = B1 x1 + · · · + Bn xn
(12.2)
so that ω(A) = B1 x1 (A) + · · · + Bn xn (A) = B 1 A1 + · · · + B n An = (B, A) = B · A.
(12.3)
Thus, every 1-form is associated with a (dual) vector, in this case B. 2-Forms: A 2-form is a function that maps pairs of vectors into numbers linearly and skew-symmetrically. Thus, if A, B, and C are vectors in Rn and z and w are numbers, then ω 2 (zA + wB, C) = z ω 2 (A, C) + w ω 2 (B, C) ω 2 (A, B) = − ω 2 (B, A).
(12.4)
482
Forms
Usually, one drops the superscript-2 and writes the addition of two 2-forms as (ω1 + ω2 )(A, B) = ω1 (A, B) + ω2 (A, B).
(12.5)
Example: The oriented area of the parallelogram defined by two 2vectors A and B is the determinant A1 A2 . (12.6) ω(A, B) = B1 B2 This 2-form maps the ordered pair of vectors (A, B) into the oriented area (which is ± the usual area) of the parallelogram they describe. To check that this 2-form gives the area to within a sign, rotate the coordinates so that the 2-vector A runs from the origin along the x-axis. Then A2 = 0, and the 2-form gives A1 B2 which is the base A1 of the parallelogram times its height B2 . Example: The triple scalar product of three 3-vectors A1 A2 A3 2 ωA (B, C) = A · B × C = B1 B2 B3 = ω 3 (A, B, C) (12.7) C1 C2 C3 is both a 2-form that depends upon the vector A and also a 3-form that maps the triplet of vectors A, B, C into the signed volume of their parallelepiped. k-Forms: A k-form (or an exterior form of degree k) is a linear function of k vectors that is antisymmetric. For vectors A1 . . . Ak and numbers z and w ω(zA01 + wA001 , A2 , . . . Ak ) = z ω(A01 , A2 , . . . Ak ) + w ω(A001 , A2 , . . . Ak ) (12.8) and the interchange of any two vectors makes a minus sign ω(A2 , A1 , . . . Ak ) = −ω(A1 , A2 , . . . Ak ).
(12.9)
Exterior Product of Two 1-Forms: The 1-form ω1 maps the vectors A and B into the numbers ω1 (A) and ω1 (B), and the 1-form ω2 does the same thing with 1 → 2. The value of the exterior product ω1 ∧ ω2 on the two vectors A and B is the 2-form defined by the 2 × 2 determinant ω1 (A) ω2 (A) = ω1 (A)ω2 (B) − ω2 (A)ω1 (B) (12.10) ω1 ∧ ω2 (A, B) = ω1 (B) ω2 (B) or, more formally, ω1 ∧ ω2 = ω1 ⊗ ω2 − ω2 ⊗ ω1 .
(12.11)
12.2 Differential Forms
483
The most general 2-form on Rn is a linear combination of the basic 2-forms xi ∧ xj X ω2 = aik xi ∧ xk . (12.12) 1≤i t2 e−iT H φ(x, t1 )φ(x, t2 )e−iT H = e−i(T −t1 )H φ(x, 0)e−i(t1 −t2 )H φ(x, 0)e−i(T −t2 )H . (17.124) So by following the logic that led to the finite-temperature formula (17.116) and to the path integral (17.121), we find for the matrix element of this time-ordered product sandwiched between two factors of exp(−iT H) the path-integral formula Z φ00 hφ00 |e−iT H T [φ(x1 )φ(x2 )] e−iT H |φ0 i = N φ(x1 ) φ(x2 ) eiS[φ] Dφ (17.125) φ0
in which the integration and the factor N are as in the path integral (17.121). The time-ordered product of any combination of fields is given by Z 00 −iT H −iT H 0 hφ |e T [φ(x1 ) . . . φ(xn )] e |φ i = N φ(x1 ) . . . φ(xn ) eiS[φ] Dφ. (17.126) Like the position eigenstates |q 0 i of quantum mechanics, the eigenstates |φ0 i are states of infinite energy that overlap a wide variety of states. But we often are interested in the ground state |0i and in states of a few particles. To form such matrix elements, we multiply both sides of equations (17.121 & 17.126) by h0|φ00 ihφ0 |0i and integrate over φ0 and φ00 . Since the ground state
610
Path Integrals
is a normalized eigenstate of the hamiltonian H|0i = E0 |0i with eigenvalue E0 , we find from (17.121) Z h0|φ00 ihφ00 |e−i2T H |φ0 ihφ0 |0iDφ00 Dφ0 = h0|e−i2T H |0i (17.127) Z = e−i2T E0 = N h0|φ00 ieiS[φ] hφ0 |0i DφDφ00 Dφ0 and from (17.126) e
−2iT E0
Z
h0|T [φ(x1 ) . . . φ(xn )] |0i = N h0|φ00 iφ(x1 ) . . . φ(xn ) eiS[φ] hφ0 |0i Dφ
(17.128) in which we suppressed the differentials The mean value in the ground state of a time-ordered product of field operators is then the ratio of these path integrals Z h0|φ00 i φ(x1 ) . . . φ(xn ) eiS[φ] hφ0 |0i Dφ Z h0|T [φ(x1 ) . . . φ(xn )] |0i = (17.129) h0|φ00 i eiS[φ] hφ0 |0i Dφ Dφ00 Dφ0 .
in which both E0 and N cancel. Here the integration is over all fields φ that go from φ(x, −t) = φ0 (x) to φ(x, t) = φ00 (x) and over φ0 (x) and φ00 (x) as well.
17.11 Perturbation Theory Field theories with hamiltonians that are quadratic in their fields like (17.102) with P = 0 Z h i 1 2 H0 = π (x) + (∇φ(x))2 + m2 φ2 (x) d3 x (17.130) 2 are soluble. Their fields evolve in time as φ(x, t) = eitH0 φ(x, 0)e−itH0 .
(17.131)
The mean value in the ground state of H0 of a time-ordered product of these fields is by (17.129) a ratio of path integrals Z h0|φ00 i φ(x1 ) . . . φ(xn ) eiS0 [φ] hφ0 |0i Dφ Z h0|T [φ(x1 ) . . . φ(xn )] |0i = h0|φ00 i eiS0 [φ] hφ0 |0i Dφ (17.132)
611
17.11 Perturbation Theory
in which the action S0 [φ] is quadratic in the fields Z h i 1 ˙2 φ (x) − (∇φ(x))2 − m2 φ2 (x) d4 x S0 [φ] = 2 Z 1 = − ∂a φ(x)∂ a φ(x) − m2 φ2 (x) d4 x. 2
(17.133)
The path integrals in the ratio (17.132) are gaussian and doable. The Fourier transforms Z Z d4 p −ipx 4 ˜ ˜ φ(p) = e φ(x) d x and φ(x) = eipx φ(p) (17.134) (2π)4 turn the space-time derivatives in the action into a quadratic form Z 1 d4 p 2 ˜ S0 [φ] = − |φ(p)| (p2 + m2 ) (17.135) 2 (2π)4 ˜ in which p2 = p2 − p02 , and φ(−p) = φ˜∗ (p) by (3.28) since the field φ is real. 0 The initial hφ |0i and final h0|φ00 i wave-functions produce the i in the Feynman propagator (5.227). Although its exact form doesn’t matter here, the wave-function hφ0 |0i of the ground state of H0 is the exponential (16.49)
1 hφ |0i = N exp − 2 0
Z
|φ˜0 (p)|2
p
p2
+
m2
d3 p (2π)3
in which φ˜0 (p) is the spatial Fourier transform Z 0 ˜ φ (p) = e−ip·x φ0 (x) d3 x
(17.136)
(17.137)
and N is a normalization factor which will cancel in ratios of path integrals. Apart from −2i ln N , which we will not keep track of, the wave-functions add to the action S0 [φ] the term Z d3 p i p 2 ˜ t)|2 + |φ(p, ˜ −t)|2 p + m2 |φ(p, (17.138) ∆S0 [φ] = 2 (2π)3 in which we envision taking the limit t → ∞ with φ(x, t) = φ00 (x) and φ(x, −t) = φ0 (x). The identity (Weinberg, 1995, pp. 386–388) Z ∞ f (+∞) + f (−∞) = lim f (t) e−|t| dt (17.139) →0+
−∞
allows us to write ∆S0 [φ] as Z Z ∞ 3 i p 2 ˜ t)|2 e−|t| dt d p . p + m2 ∆S0 [φ] = lim |φ(p, →0+ 2 (2π)3 −∞
(17.140)
612
Path Integrals
To first order in , the change in the action is Z Z ∞ 3 i p 2 2 ˜ t)|2 dt d p |φ(p, p +m ∆S0 [φ] = lim →0+ 2 (2π)3 −∞ Z p 4 i 2 d p ˜ = lim p2 + m2 |φ(p)| . →0+ 2 (2π)4
(17.141)
The modified action is therefore Z d4 p p 1 2 ˜ S0 [φ, ] = S0 [φ] + ∆S0 [φ] = − |φ(p)| p2 + m2 − i p2 + m2 2 (2π)4 Z 4 d p 1 2 ˜ (17.142) |φ(p)| p2 + m2 − i = − 2 (2π)4 since the square-root is positive. In terms of the modified action, our formula (17.132) for the time-ordered product is the ratio Z φ(x1 ) . . . φ(xn ) eiS0 [φ,] Dφ Z h0|T [φ(x1 ) . . . φ(xn )] |0i = . (17.143) eiS0 [φ,] Dφ This formula (17.143) expresses the mean value in the state |0i of the timeordered product of the exponential of a space-time integral of a classical (aka c-number, aka external) current j(x) as the ratio Z R 4 ei jφ d x eiS0 [φ,] Dφ h R i 4 Z (17.144) Z0 [j] ≡ h0| T ei j(x) φ(x) d x |0i = eiS0 [φ,] Dφ which leads us to define the modified action S0 [φ, , j] in the presence of the classical current j(x) as Z S0 [φ, , j] = S0 [φ, ] + j(x) φ(x) d4 x. (17.145) Since the state |0i is normalized, the mean value Z0 [0] is unity, Z0 [0] = 1. In terms of the Fourier transform ˜j(p) of the external current Z ˜j(p) = e−ipx j(x) d4 x
(17.146)
(17.147)
17.11 Perturbation Theory
613
the action S0 [φ, , j] is Z i 4 1 h ˜ ˜ − φ˜∗ (p)˜j(p) d p . |φ(p)|2 p2 + m2 − i − ˜j ∗ (p)φ(p) S0 [φ, , j] = − 2 (2π)4 (17.148) If we now change variables to ˜ ˜ − ˜j(p)/(p2 + m2 − i) ψ(p) = φ(p)
(17.149)
then the action S0 [φ, , j] becomes (problem 9) 4 Z ˜j ∗ (p)˜j(p) 1 d p 2 2 2 ˜ S0 [φ, , j] = − |ψ(p)| p + m − i − 2 2 2 (p + m − i) (2π)4 4 Z ˜∗ ˜ j (p)j(p) d p 1 = S0 [ψ, ] + . (17.150) 2 2 2 (p + m − i) (2π)4 And since Dφ = Dψ, our formula (17.144) gives simply Z i |˜j(p)|2 d4 p Z0 [j] = exp . 2 p2 + m2 − i (2π)4 Going back to position space, one finds (problem 10) Z i 0 0 4 4 0 Z0 [j] = exp j(x) ∆(x − x ) j(x ) d x d x 2
(17.151)
(17.152)
in which ∆(x − x0 ) is Feynman’s propagator (5.227) ∆(x − x0 ) =
Z
0
eip(x−x ) d4 p . 2 2 p + m − i (2π)4
(17.153)
The functional derivative (chapter 16) of Z0 [j], defined by (17.144), is Z 1 δZ0 [j] 4 = h0| T φ(x) exp i j(x)φ(x)d x |0i (17.154) i δj(x) while that of equation (17.152) is Z 1 δZ0 [j] = Z0 [j] ∆(x − x0 ) j(x0 ) d4 x0 . i δj(x)
(17.155)
Thus the second functional derivative of Z0 [j] evaluated at j = 0 gives 1 δ 2 Z0 [j] 0 h0| T φ(x)φ(x ) |0i = 2 = −i ∆(x − x0 ). (17.156) i δj(x)δj(x0 ) j=0
614
Path Integrals
Similarly, one may show (problem 11) that δ 4 Z0 [j] 1 h0| T [φ(x1 )φ(x2 )φ(x3 )φ(x4 )] |0i = 4 i δj(x1 )δj(x2 )δj(x3 )δj(x4 ) j=0 = −∆(x1 − x2 )∆(x3 − x4 ) − ∆(x1 − x3 )∆(x2 − x4 ) − ∆(x1 − x4 )∆(x2 − x3 ).
(17.157)
Suppose now that we add a potential V = P (φ) to the free hamiltonian (17.130). Scattering are matrix elements of the time-ordered R amplitudes 4 exponential exp(−i P (φ) d x). Our formula (17.143) for the mean value in the ground state |0i of the free hamiltonian H0 of any time-ordered product of fields leads us to Z R 4 e−i P (φ) d x eiS0 [φ,] Dφ h R i 4 Z . (17.158) h0|T e−i P (φ) d x |0i = eiS0 [φ,] Dφ Using (17.156 & 17.157), we can cast this expression into the magical form Z Z δ 4 4 h0|T exp −i P (φ) d x |0i = exp −i P d x Z0 [j] . iδj(x) j=0 (17.159) The generalization of the path-integral formula (17.143) to the ground state |Ωi of an interacting theory with action S is Z φ(x1 ) . . . φ(xn ) eiS[φ,] Dφ Z hΩ|T [φ(x1 ) . . . φ(xn )] |Ωi = (17.160) eiS[φ,] Dφ in which a term like iφ2 is added to make the modified action S[φ, ]. To compute scattering amplitudes, one must use these techniques to make initial and final states of a few particles and to generate the nonlinear part of the hamiltonian. But to keep this book at a reasonable length, I will end with only a few sentences about fermionic path integrals. 17.12 Application to Quantum Electrodynamics In the Coulomb gauge ∇ · A = 0, the hamiltonian for quantum electrodynamics is Z 1 2 1 3 2 H = Hm + (17.161) 2 π + 2 (∇ × A) − A · j d x + VC
17.12 Application to Quantum Electrodynamics
615
in which Hm is the matter hamiltonian, and VC is the Coulomb term Z 0 j (x, t)j 0 (y, t) 3 3 1 d xd y. (17.162) VC = 2 4π|x − y| The operators A and π are canonically conjugate, but they satisfy the Coulomb-gauge conditions ∇·A=0
and ∇ · π = 0.
(17.163)
One may show (Weinberg, 1995, pp. 413–418) that in this theory, the analog of equation (17.160) is Z O1 . . . On eiSC δ[∇ · A] DA Dψ Z (17.164) hΩ|T [O1 . . . On ] |Ωi = iSC e δ[∇ · A] DA Dψ in which the Coulomb-gauge action is Z Z 2 4 1 ˙2 1 SC = A − (∇ × A) + A · j + L d x − VC dt m 2 2
(17.165)
and the functional delta-function δ[∇ · A] =
Y
δ(∇ · A(x))
(17.166)
x
enforces the Coulomb-gauge condition. The term Lm is the action density of the matter field ψ. We now get tricky. We introduce a new field A0 (x) and consider the factor Z F =
Z 1 0 −1 0 2 4 ∇A − ∇4 j d x DA0 exp i 2
(17.167)
which is just a number independent of the charge density j 0 , as we can see by shifting the new field A0 so as to cancel the term involving j 0 . The cute thing about F is that by integrating by parts, we can write it as Z Z 1 1 0 −1 0 4 0 2 0 0 F = exp i ∇A − A j − j 4 j d x DA0 2 2 Z Z Z 1 0 2 0 0 4 = exp i ∇A − A j d x + i VC dt DA0 . (17.168) 2 So when we multiply the numerator and denominator of the amplitude (17.164 by the numerical factor F , we find that the pesky Coulomb term VC
616
Path Integrals
cancels, and we have Z hΩ|T [O1 . . . On ] |Ωi =
0
O1 . . . On eiS δ[∇ · A] DA Dψ Z 0 eiS δ[∇ · A] DA Dψ
(17.169)
where now DA includes all four components Aµ and Z 2 1 ˙2 1 1 0 A − (∇ × A)2 + ∇A0 + A · j − A0 j 0 + Lm d4 x. (17.170) S = 2 2 2 Since the delta-function δ[∇ · A] enforces the Coulomb-gauge condition, we can add to the action S 0 the term − A˙ · ∇A0 and so make it gauge invariant Z 1 ˙ 1 S= (A − ∇A0 )2 − (∇ × A)2 + A · j − A0 j 0 + Lm d4 x 2 2 Z 1 = − Fµν F µν − Aµ jµ + Lm d4 x. (17.171) 4 Thus, at this point we have Z hΩ|T [O1 . . . On ] |Ωi =
O1 . . . On eiS δ[∇ · A] DA Dψ Z eiS δ[∇ · A] DA Dψ
(17.172)
in which S is the gauge-invariant action (17.171), and the integral is over all fields. The only relic of the Coulomb gauge is the gauge-fixing delta functional δ[∇ · A]. We now make the gauge transformation A0µ (x) = Aµ (x) + ∂µ Λ(x) ψ 0 (x) = eiqΛ(x) ψ(x)
(17.173)
and replace the fields Aµ (x) and ψ(x) everywhere in the ratio (17.169) of 0 path integrals by their gauge transforms (17.173) A0µ (x) and R ∞ψ (x). This of variables changes nothing; it is like replacing −∞ f (x)dx by Rchange ∞ −∞ f (y)dy. So we just have hΩ|T [O1 . . . On ] |Ωi = hΩ|T [O1 . . . On ] |Ωi0
(17.174)
in which the prime means that all fields have been replaced by their gauge transforms. We’ve seen that the action S is gauge invariant. So is the measure DA Dψ, and we now restrict ourselves to operators O1 . . . On that are gauge invariant. So in the right-hand side of equation (17.174), the replacement of the fields
17.12 Application to Quantum Electrodynamics
617
by their gauge transforms affects only the term δ[∇ · A] that enforces the Coulomb-gauge condition Z O1 . . . On eiS δ[∇ · A + 4Λ) DA Dψ Z hΩ|T [O1 . . . On ] |Ωi = . (17.175) eiS δ[∇ · A + 4Λ) DA Dψ We now have two choices. If we integrate over all gauge functions Λ(x) in both the numerator and the denominator of this ratio (17.175), then apart from over-all constants that cancel, the mean value in the vacuum of the time-ordered product is the ratio Z O1 . . . On eiS DA Dψ Z (17.176) hΩ|T [O1 . . . On ] |Ωi = iS e DA Dψ in which we integrate over all matter fields, gauge fields, and gauges. That is, we do not fix the gauge. The analogous formula for the euclidean time-ordered product is Z O1 . . . On e−Se DA Dψ Z hΩ|Te [O1 . . . On ] |Ωi = (17.177) e−Se DA Dψ in which the euclidean action Se is the space-time integral of the energy density. This formula is quite general; it holds in non-abelian gauge theories and is important in lattice gauge theory. Our second choice is to multiply the numeratorR and the denominator of the ratio (17.175) by the exponential exp[−iα/2 ( − 4Λ)2 d4 x] and then integrate over Λ(x) separately in the numerator and denominator. This operation just multiplies the numerator and denominator by the same constant factor, which cancels. But if before integrating over all gauge transformations Λ(x), we shift RΛ(x) so that 4Λ decreases by A˙ 0 , then the exponential factor is exp[−iα/2 (A˙ 0 − 4Λ)2 d4 x]. Now when we integrate over Λ(x), the delta-function δ(∇ · A + 4Λ) replaces R 4Λ by − ∇ · A in the inserted exponential, converting it to exp[−iα/2 (A˙ 0 + ∇ · A)2 d4 x]. The result is to replace the gauge-invariant action S (17.171) by the gauge-fixed action Z 1 α Sα = − Fµν F µν − (∂µ Aµ )2 − Aµ jµ + Lm d4 x. (17.178) 4 2 By following steps analogous to those the led to (17.153), one may show
618
Path Integrals
(problem 12) that in Feynman’s gauge, α = 1, the photon propagator is Z 4 ηµν iq·(x−y) d q h0|T [Aµ (x)Aν (y)] |0i = − i4µν (x − y) = − i e . q 2 − i (2π)4 (17.179)
17.13 Fermionic Path Integrals In our brief introduction (1.11–1.14) to Grassmann variables, we learned that because θ2 = 0
(17.180)
the most general function f (θ) of a single Grassmann variable θ is linear f (θ) = a + b θ.
(17.181)
So a complete integral table consists of a single equation for the integral of this most general function f (θ) = a + b θ of a single Grassmann variable: Z Z Z Z f (θ) dθ = a + b θ dθ = a dθ + b θ dθ. (17.182) R RThis equation has two unknowns, the integral dθ of unity and the integral θ dθ of θ. We choose them so that the integral of f (θ + ζ) Z Z Z Z f (θ + ζ) dθ = a + b (θ + ζ) dθ = (a + b ζ) dθ + b θ dθ (17.183) R is the same as the integral (17.183) of f (θ). Thus the integral dθ of unity must vanish, while that of θ can be any constant, which we choose to be unity. Our complete table of integrals is then Z Z dθ = 0 and θ dθ = 1. (17.184) The anti-commutation relations for a single fermionic degree of freedom ψ are {ψ, ψ † } ≡ ψ ψ † + ψ † ψ = 1,
{ψ, ψ} = 0,
and {ψ † , ψ † } = 0.
(17.185)
Because ψ has ψ † , it is conventional to introduce a variable θ∗ = θ† that anti-commutes with itself and with θ θ∗2 = 0
and {θ, θ∗ } = 0.
(17.186)
17.13 Fermionic Path Integrals
Imitating (17.184), we define integration over θ∗ as for θ Z Z ∗ dθ = 0 and θ∗ dθ∗ = 1.
619
(17.187)
We define the ground state of the system by |0i = ψ|gi for any state |gi that is not annihilated by ψ. Since ψ 2 = 0, the operator ψ annihilates the ground state ψ|0i = ψ 2 |gi = 0. The effect of the operator ψ on the state 1 ∗ 1 ∗ † † |θi = exp ψ θ − θ θ |0i = 1 + ψ θ − θ θ |0i 2 2
(17.188)
(17.189)
is ψ|θi = ψ(1 + ψ † θ − 12 θ∗ θ)|0i = ψψ † θ|0i = (1 − ψ † ψ)θ|0i = θ|0i (17.190) while that of θ on |θi is θ|θi = θ(1 + ψ † θ − 21 θ∗ θ)|0i = θ|0i.
(17.191)
The state |θi is therefore an eigenstate of ψ with eigenvalue θ ψ|θi = θ|θi. The bra corresponding to the ket |ζi is 1 ∗ ∗ hζ| = h0| 1 + ζ ψ − ζ ζ 2 and the inner product hζ|θi is (problem 13) 1 ∗ 1 ∗ ∗ † hζ|θi = h0| 1 + ζ ψ − ζ ζ 1 + ψ θ − θ θ |0i 2 2 1 1 1 = h0|1 + ζ ∗ ψψ † θ − ζ ∗ ζ − θ∗ θ + ζ ∗ ζθ∗ θ|0i 2 2 4 1 1 1 = h0|1 + ζ ∗ θ − ζ ∗ ζ − θ∗ θ + ζ ∗ ζθ∗ θ|0i 2 2 4 1 = exp ζ ∗ θ − (ζ ∗ ζ + θ∗ θ) . 2
(17.192)
(17.193)
(17.194)
One may show (problem 14) that the identity operator for the space of states c|0i + d|1i ≡ c|0i + dψ † |0i
(17.195)
620
Path Integrals
is the integral Z I=
|θihθ| dθ∗ dθ = |0ih0| + |1ih1|
(17.196)
in which the differentials anti-commute with each other and with other fermionic variables, {dθ, dθ∗ } = 0, {dθ, θ} = 0, {ψ, dθ} = 0, etc. The case of several Grassmann variables θ1 , θ2 , . . . , θn and several Fermi operators ψ1 , ψ2 , . . . , ψn is similar. The θk anti-commute among themselves {θi , θj } = 0
(17.197)
while the ψk satisfy {ψk , ψ`† } = δk` ,
{ψk , ψl } = 0,
and {ψk† , ψ`† } = 0.
(17.198)
The ground state |0i is |0i =
n Y
! ψk
|gi
(17.199)
k=1
in which |gi is any state such that the resulting |0i is not zero. The directproduct state ! " n # n Y X 1 ∗ 1 ∗ † † 1 + ψk θk − θk θk |0i |θi ≡ exp ψk θk − θk θk |0i = 2 2 k=1 k=1 (17.200) is (problem 15) a simultaneous eigenstate of each of the ψk ψk |θi = θk |θi.
(17.201)
It follows that ψ` ψk |θi = ψ` θk |θi = −θk ψ` |θi = −θk θ` |θi = θ` θk |θi
(17.202)
and so too ψk ψ` |θi = θk θ` |θi. Since the ψ’s anti-commute, their eigenvalues must also θ` θk |θi = ψ` ψk |θi = −ψk ψ` |θ = −θk θ` |θi.
(17.203)
This is why the eigenvalues of fermionic operators must be Grassmann variables.
17.13 Fermionic Path Integrals
621
The inner product hζ|θi is #" n # " n Y Y 1 ∗ 1 ∗ † ∗ hζ|θi = h0| (1 + ψ` θ` − θ` θ` ) |0i (1 + ζk ψk − ζk ζk ) 2 2 `=1 k=1 # " n X 1 † † † = exp ζk∗ θk − (ζk∗ ζk + θk∗ θk ) = eζ θ−(ζ ζ+θ θ)/2 . (17.204) 2 k=1
The identity operator is Z I=
|θihθ|
n Y
dθk∗ dθk .
(17.205)
k=1
Example 17.1 (Gaussian Grassmann Integral) For any 2 × 2 matrix A, we may compute the gaussian integral Z † (17.206) g(A) = e−θ Aθ dθ1∗ dθ1 dθ2∗ dθ2 by expanding the exponential. The only terms that survive are the ones that have exactly one of each of the four variables θ1 , θ2 , θ1∗ , and θ2∗ . Thus, the integral is the determinant of the matrix A Z 1 † 2 ∗ g(A) = θ Aθ dθ1 dθ1 dθ2∗ dθ2 2 Z = (θ1∗ A11 θ1 θ2∗ A22 θ2 + θ1∗ A12 θ2 θ2∗ A21 θ1 ) dθ1∗ dθ1 dθ2∗ dθ2 = A11 A22 − A12 A21 = det A.
(17.207)
The natural generalization to n dimensions Z n Y † e−θ Aθ dθk∗ dθk = det A
(17.208)
k=1
is true for any n × n matrix A, and if A is hermitian and invertible, then Z n Y † −1 −θ† Aθ+θ† ζ+ζ † θ e dθk∗ dθk = det A eζ A ζ . (17.209) k=1
The value of θ that makes the argument −θ† Aθ+θ† ζ +ζ † θ of the exponential stationary is θ = A−1 ζ. So a gaussian Grassmann integral is equal to its exponential evaluated at its stationary point, apart from a prefactor involving the determinant det A. This result is a fermionic echo of the bosonic relations (17.13–17.15).
622
Path Integrals
One may further extend these definitions to a Grassmann field χm (x) and an associated Dirac field ψm (x). The χm (x)’s anti-commute among themselves {χm (x), χn (x0 )} = 0
(17.210)
and the Dirac field ψm (x) satisfies the equal-time anti-commutation relations {ψm (x, t), ψn† (x0 , t)} = δmn δ(x − x0 ) {ψm (x, t), ψn (x0 , t)} = 0 † {ψm (x, t), ψn† (x0 , t)} = 0.
(17.211)
As in (17.19 & 17.103), we use eigenstates of the field ψ at time t = 0. If |0i is defined in terms of a state |gi (not annihilated by any ψm (x, 0)) as " # Y |0i = ψm (x, 0) |gi (17.212) m,x
then (problem 16) the state ! Z X 1 ∗ 3 † |χi = exp ψm (x, 0) χm (x) − χm (x)χm (x) d x |0i 2 m Z = exp ψ † χ − 12 χ† χ d3 x |0i is an eigenstate of the operator ψm (x, 0) with eigenvalue χm (x) ψm (x, 0)|χi = χm (x)|χi. The inner product of two such states is Z 1 0† 0 1 † 3 0 0† hχ |χi = exp χ χ − χ χ − χ χd x . 2 2 The identity operator is the integral Z I = |χihχ| Dχ∗ Dχ
(17.213)
(17.214)
(17.215)
in which Dχ∗ Dχ ≡
Y
dχ∗m (x)dχm (x).
(17.216)
m,x
The hamiltonian for a free Dirac field ψ of mass m is the spatial integral Z H0 = ψ (γ · ∇ + m) ψ d3 x (17.217)
623
17.13 Fermionic Path Integrals
in which ψ ≡ iψ † γ 0 , the gamma matrices satisfy {γ a , γ b } = 2 η ab
(17.218)
and the metric η is the 4×4 diagonal matrix with main diagonal (−1, 1, 1, 1). Since ψ|χi = χ|χi and hχ0 |ψ † = hχ0 |χ0† , the quantity hχ0 | exp( − iH0 )|χi is Z Z 0 −iH0 0 0 3 hχ |e |χi = hχ |χi exp − i χ (γ · ∇ + m) χ d x (17.219) Z Z 0 3 1 † 1 0† = exp 2 χ˙ χ − 2 χ χ˙ − iχ (γ · ∇ + m) χ d x in which we set χ0† − χ† = χ˙ † and χ0 − χ = χ. ˙ To first order in , we may 0 ignore the difference between χ and χ, which is of order . So dropping the primes, we have Z Z 0 −iH0 3 1 † 1 † hχ |e |χi = exp 2 χ˙ χ − 2 χ χ˙ − iχ (γ · ∇ + m) χ d x (17.220) in which the dependence upon χ0 is through the time derivatives. Putting together n = 2T / such matrix elements, integrating over all intermediate-state dyadics |χihχ| and using our formula (17.215), we find Z Z hχT |e−2iT H0 |χ−T i = exp 12 χ˙ † χ − 12 χ† χ˙ − iχ (γ ·∇ + m) χd4 x Dχ∗Dχ. (17.221) Integrating
χ˙ † χ
by parts and dropping the surface term, we have Z Z −2iT H0 † 4 hχT |e |χ−T i = exp − χ χ˙ − iχ (γ ·∇ + m) χ d x Dχ∗Dχ.
(17.222) ˙ the argument of the exponential is Since − χ† χ˙ = − iχγ 0 χ, Z Z − χ† χ˙ − iχ (γ · ∇ + m) χ d4 x = i − χγ 0 χ˙ − χ (γ · ∇ + m) χ d4 x Z = i − χ (γ µ ∂µ + m) χ d4 x. (17.223) We then have hχT |e
−2iT H0
Z |χ−T i =
Z 4 exp i L0 (χ) d x Dχ∗ Dχ
(17.224)
in which L0 (χ) = − χ (γ µ ∂µ + m) χ is the action density for a free Dirac field. Thus the amplitude is a path integral with phases given by the classical action S0 [χ] Z R Z −2iT H0 i L0 (χ) d4 x ∗ hχT |e |χ−T i = e Dχ Dχ = eiS0 [χ] Dχ∗ Dχ (17.225)
624
Path Integrals
and the integral is over all fields that go from χ(x, −T ) = χ−T (x) to χ(x, T ) = χT (x). Any normalization factor will cancel in ratios of such integrals. The logic behind our formulas (17.126) and (17.132) for the time-ordered product of bosonic fields leads here to an expression for the time-ordered product of 2n Dirac fields Z h0|χ00 i χ(x1 ) . . . χ(x2n ) eiS0 [χ] hχ0 |0i Dχ∗ Dχ Z h0|T ψ(x1 ) . . . ψ(x2n ) |0i = . 00 iS0 [χ] 0 ∗ h0|χ i e hχ |0i Dχ Dχ h0|χ00 i
hχ0 |0i
(17.226) is to insert
As in (17.143), the effect of the inner products and -terms which modify the Dirac propagators Z χ(x1 ) . . . χ(x2n ) eiS0 [χ,] Dχ∗ Dχ Z . (17.227) h0|T ψ(x1 ) . . . ψ(x2n ) |0i = eiS0 [χ,] Dχ∗ Dχ We now as in (17.144) introduce a Grassmann external current ζ(x) and define a fermionic analog of Z0 [j] Z R 4 ei ζχ+χζ d x eiS0 [χ,] Dχ∗ Dχ h R i 4 Z . (17.228) Z0 [ζ] ≡ h0| T ei ζψ+ψζ d x |0i = eiS0 [χ,] Dχ∗ Dχ
17.14 Application to Non-Abelian Gauge Theories The action of a (fairly) generic non-abelian gauge theory is Z 1 S = − Fbµν Fbµν − ψ (γ µ Dµ + m) ψ d4 x 4
(17.229)
in which the Maxwell field is Fbµν ≡ ∂µ Abν − ∂ν Abµ + g fbcd Acµ Adν
(17.230)
and the covariant derivative is Dµ ψ ≡ ∂µ ψ − ig tb Abµ ψ.
(17.231)
Here γ µ is a gamma matrix (10.284), g is a coupling constant, fbcd is a structure constant (10.62), and tb is a generator (10.56) of the Lie algebra (10.15) of the gauge group.
625
17.15 The Faddeev-Popov Trick
One may show (Weinberg, 1996, pp. 14–18) that the analog of equation (17.172) for quantum electrodynamics is Z O1 . . . On eiS δ[Ab3 ] DA Dψ Z (17.232) hΩ|T [O1 . . . On ] |Ωi = iS e δ[Ab3 ] DA Dψ in which the functional delta-function Y δ(Ab3 (x)) δ[Ab3 ] ≡
(17.233)
x,b
enforces the axial-gauge condition, and Dψ stands for Dψ ∗ Dψ. Initially, physicists had trouble computing non-abelian amplitudes beyond the lowest order of perturbation theory. Then DeWitt showed how to compute to second order (DeWitt, 1967), and Faddeev and Popov, using path integrals, showed how to compute to all orders (Faddeev and Popov, 1967). 17.15 The Faddeev-Popov Trick The path-integral tricks of Faddeev and Popov are described in (Weinberg, 1996, pp. 19–27). We start with some gauge-fixing functions fb (x) which depend upon a set of non-abelian gauge fields Abµ (x). One might have fb (x) = A3b (x) in an axial gauge or fb (x) = i∂µ Aµb (x) in a Lorentz-invariant gauge. Under an infinitesimal gauge transformation Aλbµ = Abµ + ∂µ λb + g fbcd λd Acµ
(17.234)
the gauge fields change, and so the gauge-fixing functions fb (x), which depend upon them, also change. The jacobian J of that change is λ δfb (x) Df λ J = det ≡ (17.235) δλc (y) λ=0 Dλ λ=0 which typically involves the delta-function δ 4 (x − y). Let B[f ] denote any functional of the gauge-fixing functions fb (x) such as B[f ] =
Y x,b
δ(fb (x)) =
Y
δ(A3b (x))
(17.236)
x,b
in an axial gauge, or Z Z i i 2 4 B[f ] = exp (fb (x)) d x = exp − 2 2
2 ∂µ Aµb (x)
4
d x
(17.237)
626
Path Integrals
in a Lorentz-invariant gauge. Now let’s consider a functional integral like (17.232) Z O1 . . . On eiS B[f ] J DA Dψ Z hΩ|T [O1 . . . On ] |Ωi = eiS B[f ] J DA Dψ
(17.238)
in which the operators Ok , the action functional S[A] and the differentials DA and Dψ are gauge invariant. The axial-gauge formula (17.232) is a simple example in which B[f ] = δ[Ab3 ] enforces the axial-gauge condition Ab3 (x) = 0 and the determinant J = det (δbc ∂µ δ(x − y)) is a constant, which cancels. If we translate the gauge fields by a gauge transformation Λ, then the ratio (17.244) does not change Z Λ O1Λ . . . OnΛ eiS B[f Λ ] J Λ DAΛ Dψ Λ Z hΩ|T [O1 . . . On ] |Ωi = (17.239) Λ eiS B[f Λ ] J Λ DAΛ Dψ Λ R R any more than f (y) dy is different from f (x) dx. Since the operators Ok , the action functional S[A], and the differentials DA and Dψ are gauge invariant, most of the Λ-dependence goes away Z O1 . . . On eiS B[f Λ ] J Λ DA Dψ Z hΩ|T [O1 . . . On ] |Ωi = . (17.240) eiS B[f Λ ] J Λ DA Dψ If Λλ is the gauge transformation Λ followed by the gauge transformation λ, then the jacobian J Λ is a determinant of a product of matrices which is the product of their determinants Z δfbΛλ (x) δfbΛλ (x) δΛλd (z) 4 = det d z δλc (y) λ=0 δΛλd (z) δλc (y) λ=0 Λλ δfb (x) δΛλd (z) = det det δΛλd (z) λ=0 δλc (y) λ=0 Λ δfb (x) δΛλd (z) Df Λ DΛλ = det det ≡ . (17.241) δΛd (z) δλc (y) λ=0 DΛ Dλ λ=0
J Λ = det
Now we integrate over the gauge transformation Λ with weight function
627
17.16 Ghosts
ρ(Λ) = ( DΛλ/Dλ|λ=0 )−1 and find, since the ratio (17.240 is Λ-independent Z
Df Λ DΛ DA Dψ O1 . . . On eiS B[f Λ ] DΛ Z Df Λ eiS B[f Λ ] DΛ DA Dψ DΛ
Z
O1 . . . On eiS B[f Λ ] Df Λ DA Dψ Z eiS B[f Λ ] Df Λ DA Dψ
Z
O1 . . . On eiS DA Dψ Z . iS e DA Dψ
hΩ|T [O1 . . . On ] |Ωi =
=
=
(17.242)
Thus, the mean-value in the vacuum of a time-ordered product of gaugeinvariant operators is a ratio of path integrals over all gauge fields without any gauge fixing. No matter what gauge condition f or gauge-fixing functional B[f ] we use, the resulting gauge-fixed ratio (17.244) is equal to the ratio (17.242) of path integrals over all gauge fields without any gauge fixing. All gauge-fixed ratios (17.244) give the same time-ordered products, and so we can use whatever gauge condition f or gauge-fixing functional B[f ] is most convenient. The analogous formula for the euclidean time-ordered product is Z O1 . . . On e−Se DA Dψ Z (17.243) hΩ|Te [O1 . . . On ] |Ωi = e−Se DA Dψ where the euclidean action Se is the space-time integral of the energy density. This formula is the basis for lattice gauge theory. The path-integral formulas (17.176 & 17.243) derived for quantum electrodynamics therefore also apply to non-abelian gauge theories.
17.16 Ghosts Faddeev and Popov were interested in showing how to do perturbative computations in which one does fix the gauge. To continue our description of their tricks, we return to gauge-fixed expression (17.232) for the time-ordered
628
Path Integrals
product Z hΩ|T [O1 . . . On ] |Ωi =
O1 . . . On eiS B[f ] J DA Dψ Z eiS B[f ] J DA Dψ
(17.244)
set fb (x) = i∂µ Aµb (x)
(17.245)
and use (17.237) as the gauge-fixing functional B[f ] Z Z 2 4 i i µ 2 4 (fb (x)) d x = exp − ∂µ Ab (x) d x . (17.246) B[f ] = exp 2 2 This functional adds to the action density the term −(∂µ Aµb )2 /2 which leads to a gauge-field propagator like the photon’s (17.179) Z h i ηµν δbc iq·(x−y) d4 q b c h0|T Aµ (x)Aν (y) |0i = − i4µν (x − y) = − i e . q 2 − i (2π)4 (17.247) What about the determinant J? Under an infinitesimal gauge transformation, the gauge field becomes Aλbµ = Abµ + ∂µ λb + g fbcd λd Acµ
(17.248)
and so fbλ is fbλ = i∂ µ Aλbµ = i∂ µ (Abµ + ∂µ λb + g fbdc λc Adµ ) .
(17.249)
The jacobian J then is the determinant (17.235) of the matrix δfb (x) ∂ = iδbc 2 δ 4 (x − y) + ig fbdc µ Aµd (x)δ 4 (x − y) (17.250) δλc (y) λ=0 ∂x that is ∂ µ 4 4 J = det iδbc 2 δ (x − y) + ig fbdc µ Ad (x)δ (x − y) . ∂x
(17.251)
But we’ve seen (17.208) that a determinant can be written as a fermionic path integral Z n Y † det A = e−θ A θ dθk∗ dθk . (17.252) k=1
Thus we can write the jacobian J as Z J = exp − iωb∗ 2 ωb + i∂µ ωb∗ g fbdc Aµd ωc d4 x Dω ∗ Dω
(17.253)
17.17 Problems
629
which contributes the terms − ∂µ ωb∗ ∂ µ ωb and ∂µ ωb∗ g fbdc Aµd ωc to the action density. Thus, we can do perturbation theory by using the modified action density L0 = − 41 Fbµν Fbµν − 12 ∂µ Aµb
2
− ∂µ ωb∗ ∂ µ ωb + ∂µ ωb∗ g fbdc Aµd ωc − ψ (6D + m) ψ (17.254) µ µ in which D 6 ≡ γ Dµ = γ (∂µ − igfbcd Abµ ). The ghost field ω is a mathematical device, not a physical field describing real particles, which would be spinless fermions violating the spin-statistics theorem (example 10.3).
17.17 Problems 1. Derive the multiple gaussian integral (17.8) from (5.184). 2. Derive the multiple gaussian integral (17.12) from (5.183). 3. Show that the vector Y that makes the argument of the multiple gaussian integral (17.12) stationary is given by (17.13), and that the multiple gaussian integral (17.12) is equal to its exponential evaluated at its stationary point Y apart from a prefactor involving the determinant det iS. 4. Repeat the previous problem for the multiple gaussian integral (17.11). 5. Insert a complete set of momentum dyadics |pihp|, use the inner product (17.19), do the resulting Fourier transform, and so verify the free-particle path integral (17.59). 6. Show that for the hamiltonian (17.63) of the simple harmonic oscillator the action S[qc , T, 0] of the classical path is (17.72). 7. Derive formula (17.135) for the action S0 [φ] from (17.133 & 17.134). 8. Derive identity (17.139). Split the time integral at t = 0 into two halves, use d e±t = ± e±t (17.255) dt and then integrate each half by parts. 9. Derive equation (17.150) from equations (17.147, 17.150, & 17.149). 10. Derive equations (17.152 & 17.153) from formula (17.151). 11. Derive equation (17.157) from the formula (17.152) for Z0 [j]. 12. By following steps analogous to those the led to (17.153), derive the formula (17.179) for the photon propagator in Feynman’s gauge. 13. Derive expression (17.194) for the inner product hζ|θi. 14. Derive the representation (17.196) of the identity operator I for a single fermionic degree of freedom from the rules (17.184 & 17.187) for Grassmann integration and the anti-commutation relations (17.180 & 17.186).
630
Path Integrals
15. Derive the eigenvalue equation (17.201) from the definition (17.199 & 17.200) of the eigenstate |θi and the anti-commutation relations (17.197 & 17.198). 16. Derive the eigenvalue relation (17.213) for the Fermi field ψm (x, t) from the anti-commutation relations (17.210 & 17.211) and the definitions (17.212 & 17.213).
18 The Renormalization Group
The renormalization group is collection of techniques that systematize the use of different distance and energy scales to study quantum fields, lattice fields, condensed matter, and other topics, including finance.
18.1 The Renormalization Group in Quantum Field Theory Most quantum field theories are non-linear with infinitely many degrees of freedom, and because they describe point particles, they are rife with infinities. But short-distance effects, probably the finite sizes of the fundamental constituents of matter, mitigate these infinities so that we can cope with them by ignoring what happens at very short distances and very high energies. This procedure is called renormalization. For instance, in the theory described by the Lagrange density 1 1 g L = − ∂ν φ ∂ ν φ − m2 φ2 − φ4 2 2 24
(18.1)
we can cut-off divergent integrals at some high energy Λ. The amplitude for the elastic scattering of two bosons of initial four-momenta p1 and p2 and final momenta p01 and p02 to one-loop order (Weinberg, 1996, chap. 18) then takes the simple form (Zee, 2010, chaps. III & VI) 6 g2 Λ A=g− ln + iπ + 3 (18.2) 32π 2 stu as long as the absolute values of the Mandelstam variables s = −(p1 + p2 )2 , t = −(p1 − p01 )2 , and u = −(p1 − p02 )2 , which satisfy stu > 0 and s + t + u = 4m2 , are all much larger than m2 . We define the physical coupling constant gµ , as opposed to the “bare” one g that came with L, to be the real part of
632
The Renormalization Group
the amplitude A at s = −t = −u = µ2 2 Λ 3g 2 ln + 1 . (18.3) gµ = g − 32π 2 µ2 Thus, the bare coupling constant is g = gµ + 3g 2 ln(Λ2 /µ2 ) + 1 , and using this formula in our expression (18.2) for the amplitude A, we find that the amplitude no longer involves the cut-off Λ 6 µ g2 ln + iπ . (18.4) A = gµ − 32π 2 stu This is the magic of renormalization. The physical coupling “constant” gµ is the ideal coupling at energy µ because when the Mandelstam variables are all near the renormalization point stu = µ6 , the one-loop correction is tiny, and A ≈ gµ . How does the physical coupling gµ depend upon the energy µ? The amplitude A must be independent of the renormalization energy µ, and so dgµ dA g2 6 = − =0 dµ dµ 32π 2 µ
(18.5)
which is a version of the Callan-Symanzik equation. We assume that when the cut-off Λ is big but finite, the bare and running coupling constants g and gµ are so tiny that they differ by terms of order g 2 or gµ2 . Then to lowest order in g and gµ , we can replace g 2 by gµ2 in (18.5) and arrive at the simple differential equation µ
3 gµ2 dgµ ≡ β(gµ ) = dµ 16π 2
(18.6)
which we can integrate Z E Z gE Z dgµ E dµ 16π 2 gE dgµ 16π 2 1 1 ln = = = = − 2 M 3 3 gM gE M µ gM β(gµ ) gM gµ (18.7) to find the “running” physical coupling constant at energy E gE =
1 − 3 gM
gM . ln(E/M )/16π 2
(18.8)
√ As the energy E = s rises above M , while staying below the singular value E = M exp(16π 2 /3gM ), the running coupling gE slowly increases. And so does the scattering amplitude, A ≈ gE .
18.1 The Renormalization Group in Quantum Field Theory
633
Example 18.1 (Quantum Electrodynamics) Vacuum polarization makes the amplitude for the scattering of two electrons proportional to A(q 2 ) = e2 1 + π(q 2 ) (18.9) rather than to e2 . Here e is the renormalized charge, q = p01 − p1 is the four-momentum transferred to the first electron, and Z 1 e2 q 2 x(1 − x) 2 π(q ) = 2 x(1 − x) ln 1 + dx (18.10) 2π 0 m2 represents the polarization of the vacuum. We define the running coupling constant eµ to be A(µ2 ) e2µ = A(µ2 ) = e2 1 + π(µ2 ) . (18.11) For µ2 m2 , the vacuum polarization term π(µ2 ) is (problem 1) e2 µ 5 2 π(µ ) ≈ 2 ln − . (18.12) 6π m 6 The amplitude (18.9) A(q 2 ) = e2µ
1 + π(q 2 ) 1 + π(µ2 )
(18.13)
must be independent of µ, and so 0=
e2µ d A(q 2 ) d d 2 = ≈ eµ 1 − π(µ2 ) . 2 2 dµ 1 + π(q ) dµ 1 + π(µ ) dµ
(18.14)
Thus we find 2 dπ(µ2 ) deµ deµ e2 2 1 − π(µ ) − eµ = 2eµ 1 − π(µ2 ) − e2µ 2 . 0 = 2eµ dµ dµ dµ 6π µ (18.15) Thus, since by (18.10 & 18.11) π(µ2 ) = O(e2 ) and e2µ = e2 + O(e4 ), we find to lowest order in eµ µ
e3µ deµ ≡ β(µ) = . dµ 12π 2
(18.16)
We can integrate this differential equation Z E Z eE Z eE deµ deµ 1 1 E dµ 2 2 ln = = = 12π = 6π − 2 3 M e2M eE M µ eM β(eµ ) eM eµ (18.17) and so find for the running coupling constant the formula e2E =
1−
e2M
e2M ln(E/M )/6π 2
(18.18)
634
The Renormalization Group
which shows that it slowly increases with the energy E. Thus, the finestructure constant e2µ /4π rises from α = 1/137.036 at me to e2 (45.5GeV) α 1 = = 4π 1 − 2α ln(45.5/0.00051)/3π 134.6
(18.19)
√ at s = 91 GeV. When all light charged particles are included, one finds that the fine-structure constant rises to α = 1/128.87 at E = 91 GeV. Example 18.2 (Quantum Chromodynamics) Because of the cubic interaction of the gauge fields of a non-abelian gauge theory, the running coupling constant gµ can slowly decrease with rising energy. If the gauge group is SU (3), then due to this cubic interaction and that of the ghost fields (17.254), the running coupling constant gµ is to one-loop µ 2 11gM gµ = gM 1 − . (18.20) ln 16π 2 M 3 and so satisfies the differential It differs from gM only by terms of order gM equation 3 11gµ3 11gM dgµ µ ≡ β(gµ ) = − = − (18.21) dµ 16π 2 16π 2
in which the beta-function is negative. Integrating Z E Z gG Z dgµ dµ 1 E 16π 2 gE dgµ 8π 2 1 ln = = =− = 2 − g2 M 11 gM gµ3 11 gM M µ gM β(gµ ) E (18.22) we find −1 2 11gM E 2 2 gE = gM 1 + ln (18.23) 8π 2 M which shows that as the energy E of a scattering process increases, the running coupling slowly decreases, going to zero at infinite energy, an effect called asymptotic freedom. If the gauge group is SU (N ), and the theory has nf flavors of quarks with masses below µ, then the beta function is gµ3 11N nf β(gµ ) = − 2 − (18.24) 4π 12 6 which remains negative as long as nf < 11N/2. Using this beta-function with N = 3 and again integrating, we find instead of (18.23) −1 2 (11 − 2nf /3)gM E2 2 2 gE = gM 1 + ln 2 (18.25) 16π 2 M
635
18.1 The Renormalization Group in Quantum Field Theory 0.22
Coupling strength αs (E)
0.2 0.18 0.16 0.14 0.12 0.1 20
40
60
80
100
120
140
Energy E in GeV Figure 18.1 The strong-structure constant αs (E) as given by the one-loop formula (18.27) (thin curve) and by a three-loop formula (thick curve) with Λ = 230 MeV and nf = 5 are plotted for mb E mt .
or with 2
2
Λ ≡ M exp
16π 2 − 2 (11 − 2nf /3)gM
(18.26)
we find (problem 2)
αs (E) ≡
g 2 (E) 12π = . 4π (33 − 2nf ) ln(E 2 /Λ2 )
(18.27)
This formula expresses the dimensionless strong-structure constant αs (E) appropriate to energy E in terms of the energy parameter Λ which has the dimension of energy. Some call this dimensional transmutation. For Λ = 230 MeV and nf = 5, Fig. 18.1 displays αs (E) in the range 4.19 = mb E mt = 172 GeV as given by the one-loop formula (18.27) (thin curve) and a three-loop formula (Weinberg, 1996, p. 156) (thick curve).
636
The Renormalization Group
18.2 The Renormalization Group in Lattice Field Theory Let us consider a quantum field theory on a lattice (Gattringer and Lang, 2010, chap. 3) in which the strength of the non-linear interactions depends upon a single dimensionless coupling constant g. The spacing a of the lattice regulates the infinities, which return as a → 0. The value of an observable P computed on this lattice will depend upon the lattice spacing a and on the coupling constant g, and so will be a function P (a, g) of these two parameters. The “right” value of the coupling constant is the value that makes the result of the computation be as close as possible to the physical value P . Thus, the “right” coupling constant is not a constant at all, but rather a function g(a) that varies with the lattice spacing or cut-off a. Thus as we vary the lattice spacing and go to the continuum limit in which a → 0, we must adjust the coupling function g(a) so that what we compute, P (a, g(a)), is equal to the physical value P . That is, g(a) must vary with a so as to keep P (a, g(a)) = P . But then P (a, g(a)) must remain constant as a varies, so dP (a, g(a)) = 0. da
(18.28)
Writing this condition as a dimensionless derivative a
da dP (a, g(a)) dP (a, g(a)) dP (a, g(a)) = = =0 da d ln a da d ln a
(18.29)
we arrive at the Callan-Symanzik equation dP (a, g(a)) 0= = d ln a
∂ dg ∂ + ∂ ln a d ln a ∂g
P (a, g(a)).
(18.30)
The coefficient of the second partial derivative (with a minus sign) βL (g) ≡ −
dg d ln a
(18.31)
is the lattice β-function. Since the lattice spacing a and the energy scale µ are inversely related, the lattice β-function differs from the continuum beta-function by a minus sign. In SU (N ) gauge theory, the first two terms of the lattice β-function for small g are βL (g) = −β0 g 3 − β1 g 5
(18.32)
18.3 The Renormalization Group in Condensed-Matter Physics
where for nf flavors of light quarks 11 2 1 N − nf β0 = (4π)2 3 3 1 34 2 10 N2 − 1 β1 = N − N nf − nf . (4π)4 3 3 N
637
(18.33)
In quantum chromodynamics, N = 3. Combining the definition (18.31) of the β-function with its expansion (18.32) for small g, one arrives at the differential equation dg = β0 g 3 + β1 g 5 d ln a which one may integrate Z Z d ln a = ln a + c =
(18.34)
1 β1 dg =− + 2 ln 3 5 2 β0 g + β1 g 2β0 g 2β0
β0 + β1 g 2 g2 (18.35)
to find a(g) = d
β0 + β1 g 2 g2
β1 /2β02
e−1/2β0 g
2
(18.36)
in which d is a constant of integration. The term β1 g 2 is of higher order in g, and if one drops it and absorbs a factor of β02 into a new constant of integration Λ, then one gets a(g) =
−β1 /2β02 −1/2β0 g2 1 β0 g 2 e . Λ
(18.37)
As g → 0, the lattice spacing a(g) goes to zero very fast (as long as nf < 17 for N = 3). The inverse of this relation (18.37) is −1/2 g(a) ≈ β0 ln(a−2 Λ−2 ) + (β1 /β0 ) ln ln(a−2 Λ−2 ) . (18.38) It shows that the coupling constant slowly goes to zero with a, which is a lattice version of asymptotic freedom.
18.3 The Renormalization Group in Condensed-Matter Physics The study of condensed matter is concerned mainly with properties that emerge in the bulk, such as the melting point, the boiling point, or the conductivity. So we want to see what happens to the physics when we increase the distance scale many orders of magnitude beyond the size a of an individual molecule or the distance between nearest neighbors.
638
The Renormalization Group
As a simple example, let’s consider a euclidean action in d dimensions ! Z X 1 (∂φ)2 + gn φn (18.39) S = dd x 2 n in which g2 φ2 = m2 φ2 /2 is a mass term and g4 φ4 = λφ4 /24 is a quartic self-interaction. In terms of an ultra-violet cut-off Λ = 1/a, we may define a partition function Z e−S Dφ (18.40) Z(Λ) = Λ
to be one in which the field Z φ(x) =
eikx φ(k)
Λ
dd k (2π)d
(18.41)
only has Fourier coefficients φ(k) with k 2 < Λ2 . Corresponding to each such field φ(x), we introduce a “stretched” field φL (x) = A(L) φ(x/L)
for L ≥ 1
(18.42)
in which A(L) is a scale factor that we will choose to keep the kinetic part of the action invariant. Since Z kx dd k φL (x) = A(L) φ(x/L) = A(L) exp i φ(k) (18.43) L (2π)d Λ the momenta of the stretched field are reduced by 1/L. We may define a new partition function in which we integrate over the stretched fields φL (x) Z Z Z(Λ/L) = e−S Dφ ≡ e−S DφL . (18.44) Λ/L
Λ
The kinetic action of a stretched field is Z Z A2 (L) ∂φ(x/L) 2 A2 (L) ∂φ(x/L) 2 Sk = dd x = dd (x/L) Ld 2 ∂x 2 L∂x/L (18.45) and so if we choose A(L) = L−(d−2)/2
(18.46)
then letting x0 = x/L, we find that the kinetic action Sk is invariant Z ∂φ(x0 ) 2 d 0 1 Sk = d x . (18.47) 2 ∂x0
18.3 The Renormalization Group in Condensed-Matter Physics
The full action of a stretched field is ! Z X 1 2 n d (∂φ) + gn (L)φ S(φL ) = d x 2 n
639
(18.48)
in which gd,n (L) = Ld An (L) gn = Ld−n(d−2)/2 gn .
(18.49)
The beta-function β(gn ) ≡
L dgd,n (L) = d − n(d − 2)/2 gd,n (L) dL
(18.50)
is just the exponent of the coupling “constant” gd,n (L). If it is positive, then the coupling constant gd,n (L) gets stronger as L → ∞; such couplings are called relevant. Couplings with vanishing exponents are insensitive to changes in L and are marginal. Those with negative exponents shrink with increasing L; they are irrelevant. The coupling constant gd,n,p of a term with p derivatives and n powers of the field in a space of d dimensions φ would vary as gd,n,p (L) = Ld An (L) L−p gn,p = Ld−n(d−2)/2−p gn,p .
(18.51)
Example 18.3 (QCD) In quantum chromodynamics, there is a cubic term g fabc Aa0 Abi ∂0 Aci which in effect looks like g fabc φa φb φ˙ c . Is it relevant? Well, if we stretch space but not time, then the time derivative has no effect, and d = 3. So the cubic, n = 3, grows as L3/2 g3,3,0 (L) = Ld−n(d−2)/2 gabc = L3/2 g3,0 .
(18.52)
Since this cubic term drives asymptotic freedom, its strengthening as space is stretched by the dimensionless factor L may point to a qualitative explanation of confinement. For if g3,3,0 (L) grows with distance as L3/2 , then 2 αs (L) = g3,3,0 (L)/4π grows as L3 , and so the strength αs (Lr0 )/(Lr0 )2 of the force between two quarks separated by a distance Lr0 grows linearly with the distance F (Lr0 ) =
αs (Lr0 ) L3 αs (r0 ) αs (r0 ) = = Lr0 2 2 (Lr0 ) (Lr0 ) r03
(18.53)
which is more than enough for quark confinement. On the other hand, if we stretch both space and time, then the cubic g4,3,1 (L) and quartic g4,4,0 (L) couplings are marginal.
640
The Renormalization Group
18.4 Problems 1. Show that for µ2 m2 , the vacuum polarization term (18.10) reduces to (18.12). Hint: Use ln a b = ln a + ln b when integrating. 2. Show that by choosing the energy scale Λ according to (18.26), one can derive (18.27) from (18.25). 3. Show that if we stretch both space and time, then in the notation of (18.51), the cubic g4,3,1 (L) and quartic g4,4,0 (L) couplings are marginal, that is, are independent of L.
19 Finance
19.1 Stocks A share of common stock confers ownership of a fraction of the corporation that issued the stock. A round lot is 100 shares. Corporations that are profitable are assumed to eventually pay regular dividends to their shareholders. The dividends usually are paid quarterly. The annual dividend divided by the price of a share is the rate of return. This rate generally is less than the rate that bonds pay, but while the rate of a bond is fixed, corporations that make increasing amounts of money usually increase their dividends. The worth of a share is the current value of the estimated total future stream of dividends. Of course, no one knows what the future stream of dividends is for any stock. So the worth of a share always is uncertain. Hence prices fluctuate. One way to estimate the value of a share of stock is to divide its price by how much the fraction of the corporation represented by the share earns per year—its earnings per share (EPS); this is called the P/E ratio. Stocks that rapidly increase their earnings from year to year (or are thought to do so soon) tend to have P/E ratios of 20, 30, 50, or higher. Those in sunset industries have lower P/E’s. The Standard & Poor’s 500-stock index has a P/E averaged over the past 60 years of about 16 when measured against the past 12 months’ reported earnings. When the earnings standard is switched to those forecast for the current year, the S&P average P/E drops to roughly 12. If the earnings are an average over the most recent decade, then the S&P’s 60-year average P/E is about 21. These S&P P/E’s are from Mark Gongloff, The Wall Street Journal, 19 May 2008. Another measure of the worth of a share is the ratio of its P/E to its
642
Finance
annual rate of growth of earnings expressed as a percent. This is called the PEG ratio. A PEG ratio less than unity is a good sign. But since the future value of any stock depends upon future earnings, which are not knowable, no ratio or index is a reliable guide. The best rule I know of is to restrict one’s activity to a field in which one is a professional. A clothes designer, for example, can be good at estimating the future performance of corporations that make or sell clothes. A computer scientist can be good at judging the potential of companies that make or sell computer software or hardware. A physicist can have an edge in fields of his or her expertise. Biochemists and medical doctors can do well investing in biotech and pharmaceutical companies. People buy and sell shares on the New York Stock Exchange, the NASDAQ, and other exchanges, some of which are mainly computers connected to the internet. In 2008, daily volume is a billion shares on the NYSE and twice that on the NASDAQ. Unless you have a seat on an exchange that makes a market in the stock you want to buy or sell, you must pay a broker a commission to execute your transaction for you. These commissions are unregulated, so some firms charge much more than others. It is not a trivial matter to switch from one brokerage firm to another, so it makes sense to check out many firms before settling on one. Obviously, honesty is an important criterion in the choice of a broker and a firm. Service is another criterion. You should seek an honest, well-informed broker at a firm that charges reasonable commissions or an honest broker at a firm with minimal commissions. An open limit order is an order to buy (or sell) a given number of shares at a fixed price or lower (higher). Limit orders allow one to take advantage of the inevitable fluctuations in the prices of stocks as time goes by. But if the stock you want is trading at $25 per share, and you put in a buy order at $23, you may get it at that price within a week or two—or the stock may rise to $35 and beyond without crossing $23. Another problem with limit orders to buy is that one can lose interest in the stock and forget about an open buy order—until a note arrives in the mail announcing the purchase of 200 shares of the stock at a price that once looked cheap. Limit orders to sell also can be dangerous. Last week, for instance, the corporation Energy Conversion Devices (symbol ENER), which had suffered innumerable quarters of losses, suddenly reported earnings that were much better than expected. My open order to sell 100 shares at $40 or better executed hours before I got the good news. Fortunately, ENER’s price rose to $49 so fast that my shares didn’t get sold until the price was $44, and even more fortunately, my broker passed on to me the full $44 minus the commission. But since I bought the
19.2 Mutual Funds
643
shares with an open limit order of $20, I don’t feel so bad about missing the extra $5. However, ENER closed today at $55.20.
19.2 Mutual Funds A mutual fund is a company that uses money invested by the owners of its shares to buy the shares of many corporations. There are many private companies that sell many kinds of mutual funds. A traditional mutual fund has a person or group of people who study a variety of corporations and pick those most likely to rise in price over a given span of time. If they are right or lucky, then the mutual fund and its shareholders do very well. If they are wrong, then the mutual fund still does pretty well, but the shareholders suffer. Still, because a mutual fund invests in a wide variety of stocks, its performance is moderated by the diversity of its investments. So people rarely loose their shirts in a mutual fund. But because the research staff and the gurus who pick the stocks and decide when to buy and sell also decide how much they themselves are paid, their salaries tend to be handsome and reduce the return to the shareholders. If the gurus are good or lucky, then their salaries don’t matter. The problem for investors is picking the right mutual fund. Because most traditional mutual funds don’t do any better than the Dow Jones Industrial Average or the Standard and Poors average of 500 corporations or any other index, John Bogle, while a senior at Princeton University invented index funds in his senior thesis. An index fund tries to do exactly as well as the index that it seeks to approximate. It buys shares in enough corporations that are included in the index to follow the index up and down. Since it needs minimal research and no gurus, its costs are very low. And it will do almost exactly as well as the index it tracks. So an investor who can’t pick a good mutual fund should buy an index fund. In recent years, index funds have arisen that follow narrow sections of the stock market. Vanguard, the firm Bogle started but since left, offers at least 22 index funds, including ones focusing on developed markets, emerging markets, Pacific markets, REITs, and small-cap growth among others. Some index funds are traded on stock exchanges like stocks; these are called exchange-traded funds (ETFs). The don’t charge annual fees; one only pays commissions like those for buying and selling stocks.
644
Finance
19.3 Bonds A bond is a contract to pay the bondholder a specified sum each year — the interest — and to repay in full the cost of the bond when its term ends. The ratio of the annual sum divided by the cost of the bond is the nominal interest rate of the bond. This is the actual rate that a person who bought the bond at the beginning of its term is paid. People also buy and sell bonds throughout their terms at prices that fluctuate. If the institution that issued a given bond falls on hard times, then the price of that bond will drop to a value that would tempt buyers to assume the additional risk that the institution might default by not paying the interest or the principal. A person buying that bond at the reduced price would be promised a higher actual rate of interest because his cost would be less than the initial price of the bond. The price of a bond issued by a very sound corporation also will fall when general interest rates rise, and will rise when they fall. In each case, the new price will bring the actual interest rate close to the rates of other similar bonds. Many institutions issue bonds. Corporations issue corporate bonds. They are taxable. A corporation may call, that is, may redeem any of its bonds at any time by repaying the bondholder the initial cost of the bond. Some corporate bonds carry the right of the bondholder to convert the bond into shares of the corporation’s stock within a specified period of time at a specified price or ratio; these are called convertible bonds. The U.S. government issues various kinds of bonds. These bonds are exempt from state and local taxes, and they used to be considered the safest investments. They are not callable. States, cities, and localities issue municipal bonds; their interest payments are exempt from federal taxes and also from state and local taxes, if the holder lives in the state and locality. They are not callable. In 2008, the total value of all U.S. municipal bonds was $2.5 trillion.
19.4 Derivatives An investment whose value depends on a different asset is called a derivative. In recent years, they have sprouted like weeds in wet, warm weather. The simplest derivatives are options to buy (a call) or sell (a put) a given asset at a given price within a specified period of time. Since the hardest part of trading stocks is knowing when to buy or sell, calls and puts should be used with skill and care. Many prudent long-term investors sell calls on stock they own; they thus earn an extra return on their shares even when
19.5 Inequality
645
the shares are going nowhere or down. Of course, if the price of the shares on which one as sold a call should rise dramatically during the period of the call, then one loses out on that gain. Professional investors often get word of financial events before the rest of us. Suppose company X trading at $20 per share is about to buy company Y at a bid price of $30 per share thus offering its shareholders an inducement of $10 per share over Y’s current price of $20. A smart play might be to sell 10,000 of shares of X while buying 10,000 shares of Y in the expectation that the price of a share of Y would rise to X’s bid price of $30 per share, and that X’s shares would hold steady or fall. Such trading is called arbitrage. If X’s purchase of Y is a sure thing, such arbitrage will drive Y’s price to within a fraction of the bid price of $30 per share. Such arbitrage would naturally involve selling shares of X to generate cash for the purchase of Y, and so the price of X would tend to drop. The arbitrageurs would win on both sides of their deals. Other options involve currencies and commodities. One should trade only in assets one knows very well. Extremely complicated derivatives have been invented and traded over the past decade. Most MBAs have no chance of understanding them, but they do provide jobs for mathematicians and theoretical physicists.
19.5 Inequality For a billion human beings, just eating every day is a challenge. Even in the USA, inequality is striking. In 2005, the wealthiest 1% earned 21.2% of all income, while the bottom 50% earned 12.8% (Greg Ip, The Wall Street Journal, 12 October 2007). The US system of taxation is said to be progressive, but in fact there are many loopholes that only the wealthy can use. Even worse, the FICA tax is applied only to about the first $100,000 of income. The average working stiff pays more in FICA tax than in income tax. And it is applied twice: one’s wages are taxed and one’s employer is taxed an equal amount. A person earning $1,000,000 per year pays exactly the same FICA tax as one earning $100,000. This growing inequity in wages and in taxation has robbed US consumers of their purchasing power. US consumers are broke. That is the cause of the current financial crisis.
646
Finance
19.6 Credit Suppose a man deposits $100 in a bank. He then has a credit of $100. Suppose banks are required to retain as reserves 10% of their deposits and are free to lend the other 90%. The bank getting the $100 deposit could then lend out $90 to a borrower. That borrower could then deposit his or her $90 in a bank. That bank, which could be the same as the first bank, could then lend $81 to another borrower. So far, three people have credits of $100 + $90 + $81 = $271. This multiplication of money is the miracle of credit. It is one reason why there are so many banks. More exactly and generally, if P is the original sum deposited and r is the fraction of deposits that banks must retain as reserves, then the total credit is ∞ X P 1 2 n P + P (1 − r) + P (1 − r) + · · · = P = . (1 − r) = P 1 − (1 − r) r n=0 (19.1) So the total credit produced by an initial deposit P is P/r. In the numerical example, an initial deposit of P = $100 with r = 10% can produce total credits of P/r = $1000. The lower the reserve requirement, the more credit banks can extend and the more they get as interest. A reserve requirement of 1% would lead to total credits of $10,000! Since banks can charge a higher rate of interest on money they lend than they must pay to their depositors, bank profits soar as r → 0. This prospect of soaring profits is why banks hate regulations. Incidentally, the sum of all the funds held in reserve in the various banks due to an initial deposit P is P r + (1 − r)P r + (1 − r)2 P r + · · · = P r
∞ X
(1 − r)n = P
(19.2)
n=0
P itself. The work of Didier Sornette is of particular interest. He applies some of the techniques of field theory to economics; he used the renormalization group to predict the tops of the recent oil and real-estate bubbles. Most of his papers are on the arXiv.
20 Strings
20.1 The Infinities of Quantum Field Theory Quantum field theory is plagued with infinities. Even exactly soluble theories of free fields have ground-state energies that diverge quarticly, as +Λ4 for a theory of bosons and as −Λ4 for a theory of fermions as Λ → ∞. Theories with the same number of boson and fermion fields are less divergent, and theories with unbroken supersymmetry actually have ground-state energies that vanish. But nobody knows how to make a field theory that is both realistic and finite. The reason for these infinities probably lies in the point-like nature of the elementary particles whose interactions they seek to describe. In any case, products φn (x) of quantum fields at the same point x of space-time are mathematically meaningless for n ≥ 2. One might think that the answer would be to smooth out the space-time point x, but it is hard to blur the point x in a way that is small and that also preserves the Lorentz invariance of the theory. For instance, one might try Z ZZ 2 2 φ(x)2 d4 x → φ(x)e−|m (x−y) | φ(y) d4 xd4 y (20.1) where m > 0 is some mass. But there are points vastly separated in space and in time for which the Lorentz inner product (x − y)2 = 0. So this recipe blurs too much.
20.2 The Nambu-Goto String Action What may work is to give up the idea of point particle and to step up to the next simplest choice — a one-dimensional string. Let’s use 0 ≤ σ ≤ σ1 and τi ≤ τ ≤ τf to parametrize the space-time coordinates X µ (σ, τ ) of the
648
Strings
string. Nambu and Goto suggested using as the action the area Z Z r 2 2 T0 τf σ1 ˙ X · X 0 − X˙ (X 0 )2 dτ dσ S=− c τi 0
(20.2)
in which ∂X µ X˙ µ = ∂τ
and X 0µ =
∂X µ ∂σ
(20.3)
and a Lorentz metric ηµν = diag(−1, 1, 1, . . . ) is used to form the inner products X˙ · X 0 = X˙ µ ηµν X ν0
etc.
(20.4)
This action is the area swept out by a string of length σ1 in time τf − τi . ˙ If Xdτ = dt points in the time direction and X 0 dσ = dr points in a spatial 2 = dt2 , and direction, then it is easy to see that X˙ · X 0 = 0, that − X˙ that (X 0 )2 = dr 2 . So in this simple case, the action (20.2) Z Z T0 T0 tf r1 dt dr = − (tf − ti )r1 S=− c ti 0 c
(20.5)
is the area the string sweeps out. The other term within the square-root ensures that the action is the area for all X˙ and X 0 and that it is invariant under reparametrizations σ → σ 0 and τ → τ 0 . The equation of motion for the relativistic string follows from the requirement that the action (20.2) be stationary, δS = 0. Since ∂X µ ∂(X˙ µ + δX µ ) ∂X µ ∂δX µ δ X˙ µ = δ = − = ∂τ ∂τ ∂τ ∂τ
(20.6)
and similarly δX 0µ =
∂δX µ ∂σ
(20.7)
we may express the change in the action in terms of derivatives of the Lagrange density r 2 2 T0 ˙ L=− X · X 0 − X˙ (X 0 )2 . (20.8) c as Z
τfZ σ1
δS = τi
0
∂L ∂δX µ ∂L ∂δX µ + dτ dσ. ∂X 0µ ∂σ ∂ X˙ µ ∂τ
(20.9)
20.2 The Nambu-Goto String Action
649
Its derivatives, which we’ll call Pµτ and Pµσ , are Pµτ =
∂L T0 (X˙ · X 0 )Xµ0 − (X 0 )2 X˙ µ = − r 2 2 c ∂ X˙ µ 0 ˙ X ·X − X˙ (X 0 )2
(20.10)
Pµσ =
∂L T0 (X˙ · X 0 )X˙ µ − (X 0 )2 Xµ0 r . = − 2 2 ∂X 0µ c 2 X˙ · X 0 − X˙ (X 0 )
(20.11)
and
In terms of them, the change in the action is τ Z τZ f σ1 ∂Pµ ∂Pµσ ∂ ∂ µ τ µ σ µ δS = dτ dσ. δX Pµ + δX Pµ − δX + ∂τ ∂σ ∂τ ∂σ τi 0 (20.12) µ The total τ -derivative integrates to a term involving the variation δX which we require to vanish at the initial and final values of τ . So we drop that term and find that the net change in the action is τ Z τf Z τZ f σ1 µ σ σ1 ∂Pµ ∂Pµσ µ δX Pµ 0 dτ − δX δS = + dτ dσ. (20.13) ∂τ ∂σ τi τi 0 Thus the equations of motion for the string are ∂Pµτ ∂Pµσ + =0 ∂τ ∂σ
(20.14)
but the action is stationary only if the boundary condition δX µ (τ, σ1 )Pµσ (τ, σ1 ) − δX µ (τ, 0)Pµσ (τ, 0) = 0
(20.15)
is satisfied for all τ . In effect, this is a condition on the ends of open strings; closed strings satisfy it automatically. Usually the boundary condition (20.15) is interpreted as 2D = 2(d + 1) conditions — one for each end σ∗ of the string and each dimension µ of space-time: δX µ (τ, σ∗ )Pµσ (τ, σ∗ ) = 0
no sum over µ.
(20.16)
A Dirichlet boundary condition fixes a spatial component at an end of the string by X˙ i (τ, σ∗ ) = 0
(20.17)
or equivalently by δX µ (τ, σ∗ ) = 0. The time component X 0 can not have a
650
Strings
vanishing τ derivative, so it must obey a free-endpoint boundary condition Pµσ (τ, σ∗ ) = 0
(20.18)
which also may apply to any dimension and any end. 20.3 Regge Trajectories Pµτ (τ, σ)
The quantity defined as derivative (20.10) turns out to be the momentum density of the string. The angular momentum M12 of a string rigidly rotating in the x, y plane is Z σ1 X1 P2τ (τ, σ) − X2 P1τ (τ, σ)dσ. (20.19) M12 (τ ) = 0
In a parametrization of the string with τ = t and dσ proportional to the energy density dE of the string, the x, y coordinates of the string are πσ πct πct σ1 ~ cos , sin cos . (20.20) X(t, σ) = π σ1 σ1 σ1 The x, y components of the momentum density are ~ T0 πσ πct πct T0 ∂ X τ ~ = cos − sin , cos . P (t, σ) = c ∂t c σ1 σ1 σ1 The angular momentum (20.19) is then given by the integral Z σ1 T0 σ1 πσ σ 2 T0 M12 = cos2 dσ = 1 . π c 0 σ1 2πc
(20.21)
(20.22)
Now the parametrization dσ ∝ dE implies that σ1 ∝ E, and in fact the energy of the string is E = T0 σ1 . Thus the angular momentum J = |M12 | of a classical relativistic string is proportional to the square of its total energy J=
E2 . 2πT0 c
(20.23)
This rule is obeyed by many meson and baryon resonances. The nucleon and five baryon resonances fit it with nearly the same value of the string tension T0 ≈ 0.92 GeV/fm
(20.24)
as shown by Figs. 20.1, which displays the Regge trajectories of the N and ∆ resonances on a single curve. Other N and ∆ resonances, however, do not fall on this curve.
651
20.4 Quantized Strings 6
Angular Momentum J/~
∆(2420) 5 N (2220) 4 ∆(1950) 3 N (1680) 2 ∆(1232) 1 N (938) 0 1
1.5
2
2.5
2
Energy E = mc (GeV) of Baryon Resonance Figure 20.1 The angular momentum and energy of the nucleon and five baryon resonances approximately fit the curve J/~ = T0 E 2 with string tension T0 = 0.92 GeV/fm.
A string theory of hadrons took off in 1968 when Gabriel Veneziano published his amplitude for π + π scattering as a sum of three Euler beta functions (Veneziano, 1968). But after eight years of intense work, this effort was largely abandoned with the discovery of quarks at SLAC and the promise of QCD as a theory of the strong interactions. In 1974, Jo¨el Scherk and John H. Schwarz proposed increasing the string tension by 38 orders of magnitude so as to use strings to make a quantum theory that included gravity (Scherk and Schwarz, 1974). They identified the graviton as an excitation of the closed string. 20.4 Quantized Strings The coordinates X µ may be quantized with commutation relations, most easily in light-cone coordinates. The resulting relativistic bosonic string must live in a space-time of exactly D = d + 1 = 26 dimensions with a tachyon. But if one adds fermionic variables ψ1µ (τ, σ) and ψ2µ (τ, σ) in a supersymmetric way, then the tachyon goes away, and the number of spacetime dimensions drops to exactly 10. There are five distinct superstring theories — types I, IIA, and IIB; E8 ⊗ E8 heterotic; and SO(32) heterotic. All five may be related to a single theory in 11 dimensions called M-theory,
652
Strings
which is not a string theory. M-theory contains membranes (2-branes) and 5-branes, which are not D-branes. 20.5 D-Branes Dirichlet boundary conditions (20.17) require the ends of a string to be attached to spatial manifolds, which are called D-branes after Dirichlet. If the manifold to which the string is stuck has p dimensions, then it’s called a Dp-brane. The figure (20.2) shows a string whose ends are free to move only within a D2-brane. Dp-branes offer a natural way to explain the extra six dimensions required in a universe of superstrings. One imagines that the ends of all the strings are free to move only in our four-dimensional space-time; the strings are stuck on a D3-brane, which is the three-dimensional space of our physical universe. The tension of the superstring then keeps it from wandering far enough into the extra six spatial dimensions for us ever to have noticed. This explanation conflicts, however, with the generally accepted interpretation of gravity in string theory. In that standard view, the graviton and the spin-3/2 gravitino are modes of the closed string, are not attached to any D-brane, and propagate in all d space dimensions. Thus if the extra space dimensions were huge, then the gravitational force, which is mainly due to the graviton, would fall off with the distance r as 1/rd−1 = 1/r8 for superstrings (1/r24 for bosonic strings), both much faster than 1/r2 . 20.6 String-String Scattering Strings interact by joining and by breaking. Figure (20.3) shows two open strings joining to form one open string and then breaking into three open strings. Figure 20.4 shows two closed strings joining to form one closed string and then breaking into two closed strings. Because the fundamental things of string theory are extended objects, string theory is intrinsically free of ultraviolet divergences. It provides a finite theory of quantum gravity. Readers may learn much more about strings by reading Barton Zwiebach’s excellent textbook A First Course in String Theory (Zwiebach, 2004). 20.7 Riemann Surfaces and Moduli A homeomorphism is a map that is one to one and continuous with a continuous inverse. A Riemann surface is a two-dimensional real manifold
20.7 Riemann Surfaces and Moduli
653
Figure 20.2 A string stuck on a D2-brane.
whose open sets Uα are mapped onto open sets of the complex plane C by homeomorphisms zα whose transition functions zα ◦ zβ are analytic on
654
Strings
the intersections Uα ∩ Uβ . Two Riemann surfaces are equivalent if they are related by a continuous analytic map that is one to one and onto. A parameter that distinguishes a Riemann surface from other, inequivalent Riemann surfaces is called a modulus. Some Riemann surfaces have several moduli; others have one modulus; others none at all. Some moduli are continuous parameters, and others are discrete.
20.7 Riemann Surfaces and Moduli
Figure 20.3 A space-time diagram of 2 → 3 string-string scattering for open strings.
655
656
Strings
Figure 20.4 A space-time diagram of 2 → 2 string-string scattering for closed strings.
References
Aitken, A. C. 1959. Determinants and Matrices. Edinburgh and London: Oliver and Boyd. Alberts, Bruce, Johnson, Alexander, Lewis, Julian, Raff, Martin, Roberts, Keith, and Walter, Peter. 2008. Molecular Biology of the Cell. 5th edn. New York City: Garland Science. Page 246. Arnold, V. I. 1989. Mathematical Methods of Classical Mechanics. 2d edn. New York: Springer. 5th printing. Chap. 7. Autonne, L. 1915. Sur les Matrices Hypohermitiennes et sur les Matrices Unitaires. Ann. Univ. Lyon, Nouvelle S´erie I, Fasc. 38, 1–77. Bigelow, Matthew S., Lepeshkin, Nick N., and Boyd, Robert W. 2003. Superluminal and Slow Light Propagation in a Room-Temperature Solid. Science, 301(5630), 200–202. Bordag, Michael, Klimchitskaya, Galina Leonidovna, Mohideen, Umar, and Mostepanenko, Vladimir Mikhaylovich. 2009. Advances in the Casimir Effect. Oxford, UK: Oxford University Press. Bouchaud, Jean-Philippe, and Potters, Marc. 2003. Theory of Financial Risk and Derivative Pricing. 2d edn. Cambridge, UK: Cambridge University Press. Boyd, Robert W. 2000. Nonlinear Optics. 2d edn. Academic Press. Cahill, Peter, and Cahill, Kevin. 2006. Learning about Spin-One-Half Fields. Eur. J. Phys., 27, 29–47. Cantelli, F. P. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 221–424. Carroll, Sean. 2003. Spacetime and Geometry: An Introduction to General Relativity. Benjamin Cummings. Cohen-Tannoudji, Claude, Diu, Bernard, and Lalo¨e, Frank. 1977. Quantum Mechanics. Hermann & John Wiley. Courant, Richard. 1937. Differential and Integral Calculus, Vol. I. New York: Interscience. Courant, Richard, and Hilbert, David. 1955. Methods of Mathematical Physics, Vol. I. New York: Interscience. Creutz, Michael. 1983. Quarks, Gluons, and Lattices. Cambridge University Press. DeWitt, Bryce S. 1967. Quantum Theory of Gravity. II. The Manifestly Covariant Theory. Phys. Rev., 162(5), 1195–1239. Faddeev, L. D., and Popov, V. N. 1967. Feynman diagrams for the Yang-Mills field. Phys. Lett. B, 25(1), 29–30.
658
References
Feller, William. 1966. An Introduction to Probability Theory and Its Applications. Vol. II. Wiley. Feller, William. 1968. An Introduction to Probability Theory and Its Applications. 3d edn. Vol. I. Wiley. Feynman, Richard P., and Hibbs, A. R. 1965. Quantum Mechanics and Path Integrals. New York: McGraw-Hill. Frieman, Joshua A., Turner, Michael S., and Huterer, Dragan. 2008. Dark Energy and the Accelerating Universe. Ann. Rev. Astron. Astrophys., 46, 385–432. arXiv:0803.0982v1 [astro-ph]. Gattringer, Christof, and Lang, Christian B. 2010. Quantum Chromodynamics on the Lattice: An Introductory Presentation. Springer’s (Lecture Notes in Physics). Gehring, G. M., Schweinsberg, A., Barsi, C., Kostinski, N., and Boyd, R. W. 2006. Observation of Backwards Pulse Propagation Through a Medium with a Negative Group Velocity. Science, 312(5775), 895–897. Gelfand, Israel M. 1961. Lectures on Linear Algebra. New York: Interscience. Gell-Mann, Murray. 1994. The Quark and the Jaguar. New York: W. H. Freeman. Gell-Mann, Murray. 2008. Plectics. Lectures at the University of New Mexico. Georgi, H. 1999. Lie Algebras in Particle Physics. 2d edn. Reading, MA: Perseus Books. Glauber, Roy J. 1963a. Coherent and Incoherent States of the Radiation Field. Phys. Rev., 131(6), 2766–2788. Glauber, Roy J. 1963b. The Quantum Theory of Optical Coherence. Phys. Rev., 130(6), 2529–2539. Glivenko, V. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 92–99. Gnedenko, B. V. 1968. The Theory of Probability. New York, NY: Chelsea Publishing Co. Holland, John H. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press. Ince, E. L. 1956. Integration of Ordinary Differential Equations. 7th edn. Edinburgh: Oliver and Boyd, Ltd. Chap. 1. James, F. 1994. RANLUX: A Fortran implementation of the high-quality pseudorandom number generator of L¨ uscher. Comp. Phys. Comm. , 79, 110. Knuth, Donald E. 1981. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 2d edn. Reading, MA: Addison-Wesley. Kolmogorov, Andrei Nikolaevich. 1933. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari, 4, 83–91. Langevin, Paul. 1908. Sur la th´eorie du mouvement brownien. Comptes Rend. Acad. Sci. Paris, 146, 530–533. Lifshitz, E. M. 1956. The Theory of Molecular Attractive Forces between Solids. Sov. Phys. JETP, 2. L¨ uscher, M. 1994. A portable high-quality random number generator for lattice field theory simulations. Comp. Phys. Comm. , 79, 100. Metropolis, Nicholas, Rosenbluth, Arianna W., Rosenbluth, Marshall N., Teller, Augusta H., and Teller, Edward. 1953. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys., 21(6), 1087–1092. Milonni, Peter W., and Shih, M.-L. 1992. Source theory of the Casimir force. Phys. Rev. A, 45(7), 4241–4253.
References
659
Morse, Philip M., and Feshbach, Herman. 1953. Methods of Theoretical Physics. Vol. I. New York: McGraw-Hill. Parsegian, Adrian. 1969. Energy of an Ion crossing a Low Dielectric Membrane: Solutions to Four Relevant Electrostatic Problems. Nature, 221, 844–846. Pathria, R. K. 1972. Statistical Mechanics. Oxford: Pergamon. Ch. 13. Pearson, Karl. 1900. On the Criterion that a Given System of Deviations from the Probable in the Case of Correlated System of Variables is such that it can be Reasonably Supposed to have Arisen from Random Sampling. Phil. Mag., 50(5), 157–175. Riley, Ken, Hobson, Mike, and Bence, Stephen. 2006. Mathematical Methods for Physics and Engineering. 3d edn. Cambridge: Cambridge University Press. Roe, Byron P. 2001. Probability and Statistics in Experimental Physics. New York: Springer. Saito, Mutsuo, and Matsumoto, Makoto. 2007. http://www.math.sci.hiroshimau.ac.jp/ m-mat/MT/emt.html. Sakurai, J. J. 1982. Advanced Quantum Mechanics. 1st edn. Addison Wesley. 9th printing. Pages 62–63. Scherk, Jo¨el, and Schwarz, John H. 1974. Dual Models for Non-Hadrons. Nucl. Phys., B81, 118. Schmitt, Lothar M. 2001. Theory of Genetic Algorithms. Theoretical Computer Science, 259, 1–61. Schutz, Bernard. 1980. Geometrical methods of mathematical physics. Cambridge, UK: Cambridge University Press. Schwinger, Julian, Deraad, Lester L., Jr., Milton, Kimball A., and Tsai, Wu-yang. 1998. Classical Electrodynamics. Westview Press. Smirnov, N. V. 1939. Estimation of the deviation between empirical distribution curves for two independent random samples. Bull. Moscow State Univ., 2(2), 3–14. Stakgold, Ivar. 1967. Boundary Value Problems of Mathematical Physics, Vol. I. New York: Macmillan. Titulaer, U. M., and Glauber, R. J. 1965. Correlation Functions for Coherent Fields. Phys. Rev., 140(3B), B676–682. Veneziano, Gabriel. 1968. Construction of a Crossing-Symmetric Regge-Behaved Amplitude for Linearly Rising Regge Trajectories. Nuovo Cim., 57A, 190. Vose, Michael D. 1999. The Simple Genetic Algorithm: Foundations and Theory. Cambridge, MA: MIT Press. Weinberg, S. 1995. The Quantum Theory of Fields. Vol. I Foundations. Cambridge, UK: Cambridge University Press. Weinberg, S. 1996. The Quantum Theory of Fields. Vol. II Modern Applications. Cambridge, UK: Cambridge University Press. Weinberg, Steven. 1972. Gravitation and Cosmology. John Wiley & Sons. Weinberg, Steven. 1988. The First Three Minutes. New York City: Basic Books. Whittaker, E. T., and Watson, G. N. 1927. A Course of Modern Analysis. 4th edn. Cambridge, UK: Cambridge University Press. Wright, Ned. 2006. A Cosmology Calculator for the World Wide Web. Publications of the Astronomical Society of the Pacific, 118(850), 1711–1715. www.astro.ucla.edu/ wright/CosmoCalc.html. Zee, Anthony. 2010. Quantum Field Theory in a Nutshell. 2d edn. Princeton University Press.
660
References
Zwiebach, Barton. 2004. A First Course in String Theory. Cambridge University Press.
Cambridge, UK:
Index
lapack, 73 analytic continuation, 194–196 dimensional regularization, 196 analytic function and Cauchy’s integral theorem, 173–178 branch of, 207 Cauchy’s inequality, 188 definition of, 172 entire, 173, 192 essential singularity, 192 harmonic function, 184 holomorphic, 192 isolated singularity, 192 meromorphic, 192 multivalued, 207 pole, 192 simple pole, 192 Taylor series, 186 angular momentum lowering operators, 378 raising operators, 378 spin, 377 anti-linear, 39 anti-unitary, 39 Arrays, 3 arrays, 3–4 asymptotic freedom, 248–249, 636–639 basis, 7, 16 bodies falling in air, 259 Boltzmann distribution, 60–62 Callan-Symanzik equation, 248–249, 636–639 Cartan subalgebra, 386 Cauchy’s principal value, 210 Feynman’s propagator, 212–216 trick, 211 closed, 359 compact, 359 complex arithmetic, 2 complex-variable theory, 172–235
analytic continuation, 194–196 analyticity, 172 and dispersion relations, 222–226 and the fundamental theorem of algebra, 189 applications to string theory, 232–234 radial order, 232 calculus of residues, 196–198 Cauchy’s inequality, 188 Cauchy’s integral formula, 178–181 Cauchy’s integral theorem, 173–178 and Stoke’s theorem, 182 Cauchy’s principal value, 210–216 Cauchy-Riemann conditions, 181–184 conformal mapping, 226 contour integral with cut, 208 cuts, 206 essential singularity, 192 ghost contours, 198–206 harmonic functions, 184 isolated singularity, 192 Laurent series, 189–193 Liouville’s theorem, 188 logarithms, 206 method of steepest descent, 227–229 phase and group velocities, 229–232 pole, 192 residue, 192 roots, 207 simple pole, 192 conformal mapping, 226 convergence in the mean, 144–145 convolutions, 131–135 correlation functions Glauber, 78–79 decuplet of baryon resonances, 386 degenerate eigenvalue, 44 delta function for periodic functions, 84 for real periodic functions, 84
662 that vanish at 0 and L, 105 on interval L, 92 density operators, 60–62 determinants, 24–31 and anti-symmetry, 26 and Levi-Civita symbol, 26 and linear dependence, 27 and linear independence, 27 and permutations, 28 and the inverse of a matrix, 29 cofactors, 26 invariances of, 26 Laplace expansion, 26 minors, 25 product rule for, 30 3 × 3, 25 2 × 2, 25 differential equations, 236–305 problems, 304–305 terminal velocity of mice and men, 260 diffusion, 130–131 Fick’s law, 130 dimensional regularization, 196 Dirac notation, 18–20 bra, 19 bracket, 19 examples, 20 for identity operator, 20 ket, 19 outer products, 19 tricks, 57 Dirac’s delta function, 298 dispersion relations, 222–226 and causality, 222 of Kramers and Kronig, 224–226 division algebra, 387, 390 double-factorials, 147, 149 eigenfunctions Bessel inequality, 297 complete, 289 orthonormal Gramm-Schmidt procedure, 289 Schwarz inequality, 297 eigenvalue algebraic multiplicity, 44 geometric multiplicity, 44 simple, 44 eigenvalues degenerate, 44 non-degenerate, 44 electrostatic potential multipole expansion, 300 electrostatics dielectrics, 158–164 emission rate from a fluorophore, 261 entropy, 60–62
Index factorials, 147, 149 Feynman’s propagator, 212–216 as a Green’s function, 213 Fourier series, 82–108 better convergence of integrated series, 96 complex, 82–87 for real functions, 84 convergence, 94–96 example of smooth,real non-periodic function, 85 for a scalar field, 129 for real functions, 87–89 example of Gibbs overshoot, 89 Gibbs overshoot, 89 non-relativistic Strings, 104 Parseval’s identity, 93 periodic boundary conditions, 105–107 Born von Karman, 107 poorer convergence of differentiated series, 96 Quantum-Mechanical Examples, 96–102 several variables, 94 stretched intervals, 90–94 the 2πs, 85 the interval, 85 Fourier transform of a gaussian, 198 scalar wave equation, 127 solution of diffusion equation, 130 Fourier transforms, 114–136 and convolutions, 135 and Green’s functions, 135 gaussian, 116–117 of real functions, 117 problems, 141 function continuous, 95 fundamental theorem of algebra, 189 Gamma function, 148 gamma function, 147–149, 195–196 Gauss’s law, 297 gaussian integrals, 591–593 Gell-Mann’s SU(3) matrices, 384 Grassmann numbers, 2 Green’s function for laplacian, 133 of Helmholtz’s equation, 299 of Helmholtz’s modified equation, 299 of Poisson’s equation, 299 Legendre-polynomial expansion, 300 Green’s functions, 297–303 and eigenfunctions, 301 Feynman’s propagator, 297 of a self-adjoint operator, 302 Poisson’s equation, 298 Group Lie Cartan subalgebra, 386
663
Index group, 355–407 O(n), 356 SO(n), 356 Z2 , 365 Z3 , 366 Zn , 365 abelian, 356 auto-morphism, 362 inner, 362 outer, 362 block-diagonal representation of, 358 center of, 360 characters, 363 compact, 359 completely reducible representation of, 358 conjugacy class, 360 continuous, 355–357 definition, 355 equivalent representation of, 357 finite, 357, 365–368 multiplication table, 365 regular representation of, 366 irreducible representation of, 358 isomorphism, 361 Lie, 355–357, 368–407 adjoint representation, 381 Cartan subalgebra of SU(3), 384 Cartan’s classification of all compact Lie groups, 390 Casimir operator, 375, 382 compact, 368 defining representation of SU(2), 377 definition, 369 exponential parametrization, 370 generator of, 369 generators of adjoint representation, 381 Jacobi identity, 380 non-compact, 368 rotations, 373–375 structure constants of, 370 SU(2) tensor operator, 382 SU(2) tensor product, 378 SU(3), 384 SU(3) generators, 384 SU(3) structure constants, 384 symplectic group Sp(2n), 387–390 symplectic group Sp(2n,R), 389 Lie algebra, 368–407 of SU(2), 375 Lie algebra of problems, 404 Lorentz, 356 matrix, 356 morphism, 361 nonabelian, 356 of permutations, 368 order of, 357, 365 Poincar´ e, 356
reducible representation of, 358 representation of, 357 dimension, 357 in Hilbert space, 358 rotation, 356 Schur’s lemma, 362–363 semi-simple, 383 similarity transformation, 357 simple, 383 subgroup, 359–362 coset of, 360 factor group, 361 invariant, 360 left coset of, 360 normal, 360 quotient coset space, 361 right coset of, 360 trivial, 359 symmetry anti-linear, anti-unitary representation of, 358 linear, unitary representation of, 358 tensor product, 364 addition of angular momenta, 364 transformation, 356 translation, 356 unitary representation of, 357 Heaviside step function, 107 Helmholtz’s equation cylindrical coordinates, 242 rectangular coordinates, 241 spherical coordinates, 242 Hilbert spaces, 13–14 homogeneous functions, 255–258 Euler’s theorem, 256 virial theorem, 256–257 identity operator, 20–22 in Dirac notation, 20 infinite series, 142–171 asymptotic, 156–158 WKB & Dyson, 158 Bernoulli numbers and polynomials, 155 binomial series, 152 binomial theorem, 151, 152 Cauchy’s criterion, 143 Cauchy’s root test, 143 comparison test, 143 d’Alembert’s ratio test, 144 geometric power series, 146 Intel test, 144 logarithmic series, 152–153 of functions convergence, 144–145 convergence in the mean, 144–145 uniform convergence, 145 problems, 169 Riemann zeta function, 153
664 Taylor series, 150 inner products, 11–14 degenerate, 12 hermitian, 12 non-degenerate, 12 semi-definite, 12 inner-product spaces, 13–14 integral equations, 307–315 Fredholm type, 307–312 numerical solutions, 309–312 problems, 314–315 Volterra type, 307–308 invariant subspace, 41 Jacobi identity, 380 Kramers-Kronig Relations, 224–226 Lagrange multipliers, 39–41, 60–62 Laplace transforms, 136–141 differentiation of, 138 examples, 136 inversion of, 140–141 least squares, method of, 32–35 linear algebra, 1–81 linear dependence, 15–16 and determinants, 27 linear independence, 15–16 and completeness, 15 and determinants, 27 matrices, 4–7 adjoint, 5 characteristic equations of, 43 and fundamental theorem of algebra, 43 CKM, 70 congruency transformation, 55 defective, 44 density, 77 diagonal form of square non-defective matrix, 45 hermitian, 5, 50–55 complete and orthonormal eigenvectors, 53 degenerate eigenvalues, 50, 51 eigenvalues of, 50 eigenvectors of, 50 unitary transformation to diagonal form of, 53 imaginary and anti-symmetric, 54 inverses of, 29 non-singular, 45 normal, 55–57 diagonalization by unitary transformation, 55 orthogonal, 6, 38 Pauli, 5, 75, 377 positive, 7 rank of, 72 example, 72 rank-nullity theorem, 65
Index real and symmetric, 54 similarity transformation of, 45 singular-value decomposition, 62–72 example, 69 quark mass matrix, 69 square, 42–45 eigenvaluess of, 42 eigenvectors of, 42 functions of, 47 trace, 4 unitary, 6, 38 matrix upper triangular, 31 method of steepest descent, 227–229 metric spaces, 13–14 Moore-Penrose pseudoinverse, 70–72 numbers complex, 1 irrational, 1 natural, 1 rational, 1 real, 1 octet of baryons, 385 octet of pseudoscalar mesons, 385 octonians, 390 operators, 22–24 adjoint of, 35 anti-linear, 39 anti-unitary, 39 orthogonal, 38 rotations, 24 translations, 24 unitary, 37–38 optical theorem, 226 ordinary differential equations, 236–239 differential operators of definite parity, 267 Dirichlet boundary conditions, 278 even and odd differential operators, 266–267 exact, 250–255 Boyle’s law, 251 condition of integrability, 250, 252 human population growth, 251 integrating factors, 254 integration, 253–254 van der Waals’s equation, 251 first order, 246–262 exact, 250–255 separable, 246–250 first-order self adjoint, 281 Frobenius’s series solutions, 265–268 Fuch’s theorem, 268 indicial equation, 265 recurrence relations, 265 hermitian operators, 281 hidden separability, 249 homogeneous first-order, 257–258
Index homogeneous functions, 255–258 linear, 236–239 general solution, 238 homogeneous, 237 inhomogeneous, 238 order of, 236 linear dependence of solutions, 237 linear independence of solutions, 237 linear, first-order, 258–262 exact, 259 integrating factor, 259 second-order eigenfunctions, 287–290 eigenvalues, 287–290 essential singularity of, 264 Green’s functions, 302 irregular singular point of, 264 making operators self adjoint, 278 non-essential singular point of, 264 regular singular point of, 264 second solution, 269–270 self adjoint, 274–290 singular points at infinity, 264 singular points of, 264 weight function, 287 Why not three solutions?, 270 wronskians and self-adjoint operators, 279 self adjoint, 274–290 self-adjoint operators, 279 separable, 246–250 general integral, 246 logistic equation, 247 Zipf’s law, 246 singular points of Legendre’s equation, 264 Sturm-Liouville problem, 279, 282–290 systems of, 262–264 Friedmann’s equations, 262–264 the wronskian, 268 von Neumann boundary conditions, 278 Wronski’s determinant, 268 outer products, 18 example, 18 in Dirac’s notation, 19 partial differential equations, 239–246 general solution, 240 homogeneous, 239 inhomogeneous, 240 linear, 239–246 separable, 240–243, 348 Helmholtz’s equation, 240 wave equations, 243–246 of spin-one-half fields, 245 of spinless bosons, 244 of the photon, 244 path integrals, 591–630 correlation functions imaginary time, 604–605 euclidean, 593–595
fermionic, 618–624 fields euclidean, 605–608 finite temperature, 605–608 imaginary time, 605–608 perturbation theory, 610–618 real time, 608–618 finite temperature, 593–595 free particle, 598–600 imaginary time, 600 harmonic oscillator imaginary time, 602–603 imaginary time, 593–595 Minkowski, 596–598 real time, 596–598 Pauli matrices, 377 permutations and determinants, 28 phase and group velocities, 229–232 slow, fast, and backwards light, 230 pre-Hilbert spaces, 13–14 problems on linear algebra, 79–81 pseudoinverse, 70–72 quaternions, 386–387 R-C circuit, 260 renormalization group, 248–249, 636–639 scalar field, 127 Schwarz inequality, 14–15 examples, 14 see-saw mechanism, 54 sequence of functions convergence, 95 convergence in the mean, 95 uniform convergence, 95 simple eigenvalue, 44 slow, fast, and backwards light, 230 and Kramers-Kronig relations, 231 spin and statistics, 377 states coherent, 79 Stoke’s theorem, 182 Stokes’s theorem, 182 Sturm-Liouville equation, 279, 282–290 SU(3) and quarks, 385 subspace, 358 invariant, 358 proper, 358 symmetry quantum, 39 systems of linear equations, 32–35 third-harmonic microscopy, 199 total cross-section, 226 vector space, 16 dimension of, 16 vectors, 7–9 basis, 7, 16, 18
665
666 complete, 15 direct product, 73–76 example, 75 eigenvalues, 41 example, 41 eigenvectors, 41, 44 example, 41 eigenvectors of square matrix, 44 orthonormal, 16–18 Gramm-Schmidt method, 17 span, 15 tensor product, 73–76 violation of CP symmetry, 70 Virasoro’s algebra, 234, 235
Index
View more...
Comments