Encyclopedia of Mathematical Physics Vol.2 D-H Ed. Fran Oise Et Al
February 18, 2017 | Author: David Iveković | Category: N/A
Short Description
Download Encyclopedia of Mathematical Physics Vol.2 D-H Ed. Fran Oise Et Al...
Description
D Deformation Quantization A C Hirshfeld, Universita¨t Dortmund, Dortmund, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction Deformation quantization is an alternative way of looking at quantum mechanics. Some of its techniques were introduced by the pioneers of quantum mechanics, but it was first proposed as an autonomous theory in a paper in Annals of Physics (Bayen et al. 1978). More recent reviews treat modern developments (HH I 2001, Dito and Sternheimer 2002, Zachos 2002). Deformation quantization concentrates on the central physical concepts of quantum theory: the algebra of observables and their dynamical evolution. Because it deals exclusively with functions of phase-space variables, its conceptual break with classical mechanics is less severe than in other approaches. It formulates the correspondence principle very precisely which played such an important role in the historical development. Although this article deals mainly with nonrelativistic bosonic systems, deformation quantization is much more general. For inclusion of fermions and the Dirac equation see (Hirshfeld et al. 2002b). The fermionic degrees of freedom may, in special cases, be obtained from the bosonic ones by supersymmetric extension (Hirshfeld et al. 2004). For applications to field theory, see Hirshfeld et al. (2002). For the relation to Hopf algebras see Hirshfeld et al. (2003), and to geometric algebra, see Hirshfeld et al. (2005). The observables of a physical system, such as the Hamilton function, are smooth real-valued functions on phase space. Physical quantities of the system at some time, such as the energy, are calculated by evaluating the Hamilton function at the point x0 = (q0 , p0 ) in phase space that characterizes the state of the system at this time (we assume for the moment, a one-particle system). The mathematical expression for this operation is Z E ¼ Hðq; pÞð2Þ ðq q0 ; p p0 Þ dq dp ½1
where (2) is the two-dimensional Dirac delta function. The observables of the dynamical system are functions on the phase space, the states of the system are positive functionals on the observables (here the Dirac delta functions), and we obtain the value of the observable in a definite state by the operation shown in eqn [1]. In general, functions on a manifold are multiplied by each other in a pointwise manner, that is, given two functions f and g, their product fg is the function ðfgÞðxÞ ¼ f ðxÞgðxÞ
½2
bi
bi
bi
bi
bi
bi
bi
In the context of classical mechanics, the observables build a commutative algebra, called the commutative ‘‘classical algebra of observables.’’ In Hamiltonian mechanics there is another way to combine two functions on phase space in such a way that the result is again a function on the phase space, namely by using the Poisson bracket n X @f @g @f @g ff ; ggðq; pÞ ¼ @qi @pi @pi @qi q;p i¼1 ! ¼ f @q@p @p@p g ½3 in an abbreviated notation. The notation can be further abbreviated by using x to represent points of the phase-space manifold, x = (x1 , . . . , x2n ), and introducing the Poisson tensor ij , where the indices i, j run from 1 to 2n. In canonical coordinates ij is represented by the matrix 0 In ¼ ½4 In 0 where In is the n n identity matrix. Then eqn [3] becomes ff ; ggðxÞ ¼ ij @i f ðxÞ @j gðxÞ
½5
where @i = @=@xi . For a general observable, f_ ¼ ff ; Hg
½6
2 Deformation Quantization
Because transforms like a tensor with respect to coordinate transformations, eqn [5] may also be written in noncanonical coordinates. In this case the components of need not be constants, and may depend on the point of the manifold at which they are evaluated. But in Hamiltonian mechanics, is still required to be invertible. A manifold equipped with a Poisson tensor of this kind is called a symplectic manifold. In general, the tensor is no longer required to be invertible, but it nevertheless suffices to define Poisson brackets via eqn [5], and these brackets are required to have the properties 1. {f , g} = {g, f }, 2. {f , gh} = {f , g}h þ g{f , h}, and 3. {f , {g, h}} þ {g, {h, f }} þ {h, {f , g}} = 0. Property (1) implies that the Poisson bracket is antisymmetric, property (2) is referred to as the Leibnitz rule, and property (3) is called the Jacobi identity. The Poisson bracket used in Hamiltonian mechanics satisfies all these properties, but we now abstract these properties from the concrete prescription of eqn [3], and a Poisson manifold (M, ) is defined as a smooth manifold M equipped with a Poisson tensor , whose components are no longer necessarily constant, such that the bracket defined by eqn [5] has the above properties. It turns out that such manifolds provide a better context for treating dynamical systems with symmetries. In fact, they are essential for treating gaugefield theories, which govern the fundamental interactions of elementary particles.
Quantum Mechanics and Star Products The essential difference between classical and quantum mechanics is Heisenberg’s uncertainty relation, which implies that in the latter, states can no longer be represented as points in phase space. The uncertainty is a consequence of the noncommutativity of the quantum mechanical observables. That is, the commutative classical algebra of observables must be replaced by a noncommutative quantum algebra of observables. In the conventional approach to quantum mechanics, this noncommutativity is implemented by representing the quantum mechanical observables by linear operators in Hilbert space. Physical quantities are then represented by eigenvalues of these operators, and physical states are related to the operator eigenfunctions. Although these entities are somehow related to their classical counterparts, to which they are supposed to reduce in an appropriate limit, the precise relationship has remained obscure, one hundred years after the beginnings of quantum
mechanics. Textbooks refer to the correspondence principle, which guided the pioneers of the subject. Attempts to give this idea a precise formulation by postulating a specific relation between the classical Poisson brackets of observables and the commutators of the corresponding quantum mechanical operators, as undertaken, for example, by Dirac and von Neumann, encountered insurmountable difficulties, as pointed out by Groenewold in 1946 in an unjustly neglected paper (Groenewold 1948). In the same paper Groenewold also wrote down the first explicit representation of a ‘‘star product’’ (see eqn [11]), without however realizing the potential of this concept for overcoming the difficulties that he wanted to resolve. In the deformation quantization approach, there is no such break when going from the classical system to the corresponding quantum system; we describe the quantum system by using the same entities that are used to describe the classical system. The observables of the system are described by the same functions on phase space as their classical counterparts. Uncertainty is realized by describing physical states as distributions on phase space that are not sharply localized, in contrast to the Dirac delta functions which occur in the classical case. When we evaluate an observable in some definite state according to the quantum analog of eqn [1] (see eqn [24]), values of the observable in a whole region contribute to the number that is obtained, which is thus an average value of the observable in the given state. Noncommutativity is incorporated by introducing a noncommutative product for functions on phase space, so that we get a new noncommutative quantum algebra of observables. The systematic work on deformation quantization stems from Gerstenhaber’s seminal paper, where he introduced the concept of a star product of smooth functions on a manifold (Gerstenhaber 1964). For applications to quantum mechanics, we consider smooth complex-valued functions on a Poisson manifold. A star product f g of two such functions is a new smooth function, which, in general, is described by an infinite power series: bi
bi
f g ¼ fg þ ðihÞC1 ðf ; gÞ þ Oðh2 Þ 1 X ðihÞn Cn ðf ; gÞ ¼
½7
n¼0
The first term in the series is the pointwise product given in eqn [2], and (ih) is the deformation parameter, which is assumed to be varying continuously. If h is identified with Planck’s constant, then what varies is really the magnitude of the
Deformation Quantization
action of the dynamical system considered in units of h: the classical limit holds for systems with large action. In this limit, which we express here as h ! 0, the star product reduces to the usual product. In general, the coefficients Cn will be such that the new product is noncommutative, and we consider the noncommutative algebra formed from the functions with this new multiplication law as a deformation of the original commutative algebra, which uses pointwise multiplication of the functions. The expressions Cn (f , g) denote functions made up of the derivatives of the functions f and g. It is obvious that without further restrictions of these coefficients, the star product is too arbitrary to be of any use. Gerstenhaber’s discovery was that the simple requirement that the new product be associative imposes such strong requirements on the coefficients Cn that they are essentially unique in the most important cases (up to an equivalence relation, as discussed below). Formally, Gerstenhaber required that the coefficients satisfy the following properties: P P 1. jþk = n Cj (Ck (f , g), h) = jþk = n Cj (f , Ck (g, h)), 2. C0 (f , g) = fg, and 3. C1 (f , g) C1 (g, f ) = {f , g}. Property (1) guarantees that the star product is associative: (f g) h = f (g h). Property (2) means that in the limit h ! 0, the star product f g agrees with the pointwise product fg. Property (3) has at least two aspects: (i) mathematically, it anchors the new product to the given structure of the Poisson manifold and (ii) physically, it provides the connection between the classical and quantum behavior of the dynamical system. Define a commutator by using the new product:
For physical applications we usually require the ¯ where f¯ star product to be Hermitean: f g = g¯ f, denotes the complex conjugate of f. The star products considered in this article have this property. For a given Poisson manifold, it is not clear a priori if a star product for the smooth functions on the manifold actually exists, that is, whether it is at all possible to find coefficients Cn that satisfy the above list of properties. Even if we find such coefficients, it it still not clear that the series they define through eqn [7] yields a smooth function. Mathematicians have worked hard to answer these questions in the general case. For flat Euclidian spaces, M = R2n , a specific star product has long been known. In this case, the components of the Poisson tensor ij can be taken to be constants. The coefficient C1 can then be chosen antisymmetric, so that C1 ðf ; gÞ ¼ 12ij ð@i f Þð@j gÞ ¼ 12 ff ; gg
½8
Property (3) may then be written as lim
1
h h! 0 i
½f ; g ¼ ff ; gg
½9
Equation [9] is the correct form of the correspondence principle. In general, the quantity on the lefthand side of eqn [9] reduces to the Poisson bracket only in the classical limit. The source of the mathematical difficulties that previous attempts to formulate the correspondence principle encountered was related to trying to enforce equality between the Poisson bracket and the corresponding expression involving the quantum mechanical commutator. Equation [9] shows that such a relation in general only holds up to corrections of higher order in h.
½10
by property (3) above. The higher-order coefficients may be obtained by exponentiation of C1 . This procedure yields the Moyal star product (Moyal 1949): ih ij ! f M g ¼ f exp @i@j g ½11 2 bi
In canonical coordinates, eqn [11] becomes ðf M gÞðq; pÞ ¼ f ðq; pÞ exp
¼ ½f ; g ¼ f g g f
3
! ! ih ð@ q @ p @ p @ q Þ gðq; pÞ 2
1 mþn X ih ð1Þm m n ð@p @q f Þð@pn @qm gÞ 2 m!n! m;n¼0
½12
½13
We now come to the question of uniqueness of the star product on a given Poisson manifold. Two star products and 0 are said to be ‘‘c-equivalent’’ if there exists an invertible transition operator T ¼ 1 þ hT1 þ ¼
1 X
n Tn h
½14
n¼0
where the Tn are differential operators that satisfy f 0 g ¼ T 1 ððTf Þ ðTgÞÞ
½15
It is known that for M = R 2n all admissible star products are c-equivalent to the Moyal product. The concept of c-equivalence is a mathematical one (c stands for cohomology (Gerstenhaber 1964)); it does not by itself imply any kind of physical equivalence, as shown below. bi
4 Deformation Quantization
Another expression for the Moyal product is a kind of Fourier representation: ðf M gÞðq; pÞ Z 1 dq1 dq2 dp1 dp2 f ðq1 ; p1 Þgðq2 ; p2 Þ ¼ 2 h 2 2 ðpðq1 q2 Þ þ qðp2 p1 Þ exp i h þ ðq2 p1 q1 p2 Þ
½16
The c-equivalent star products correspond to different quantization schemes. Having chosen a quantization scheme, the quantities of interest for the quantum system may be calculated. It turns out that different quantization schemes lead to different spectra for the observables. The choice of a specific quantization scheme can only be motivated by further physical requirements. In the simple example we discuss below, the classical system is completely specified by its Hamilton function. In more general cases, one may have to decide what constitutes a sufficiently large set of good observables for a complete specification of the system (Bayen et al. 1978). A state is characterized by its energy E; the set of all possible values for the energy is called the spectrum of the system. The states are described by distributions on phase space called projectors. The state corresponding to the energy E is denoted by E (q, p). These distributions are normalized: Z 1 E ðq; pÞdq dp ¼ 1 ½20 2h bi
Equation [16] has an interesting geometrical interpretation. Denote points in phase space by vectors, for example, in two dimensions: q1 q2 q r¼ ; r1 ¼ ; r2 ¼ ½17 p p1 p2 Now, consider the triangle in phase space spanned by the vectors r r1 and r r2 . Its area (symplectic volume) is Aðr; r1 ; r2 Þ
and idempotent:
¼ 12ðr r1 Þ ^ ðr r2 Þ ¼ 12½pðq2 q1 Þ þ qðp1 p2 Þ þ ðq1 p2 q2 p1 Þ
½18
which is proportional to the exponent in eqn [16]. Hence, we may rewrite eqn [16] as ðf gÞðrÞ Z 4i ¼ dr1 dr2 f ðr1 Þgðr2 Þ exp Aðr; r1 ; r2 Þ h
ðE E0 Þðq; pÞ ¼ E;E0 E ðq; pÞ
The fact that the Hamilton function takes the value E when the system is in the state corresponding to this energy is expressed by the equation ðH E Þðq; pÞ ¼ EE ðq; pÞ
½19
Deformation Quantization The properties of the star product are well adapted for describing the noncommutative quantum algebra of observables. We have already discussed the associativity and the incorporation of the classical and semiclassical limits. Note that the characteristic nonlocality feature of quantum mechanics is also explicit. In the expression for the Moyal product given in eqn [13], the star product of the functions f and g at the point x = (q, p) involves not only the values of the functions f and g at this point, but also all higher derivatives of these functions at x. But for a smooth function, knowledge of all the derivatives at a given point is equivalent to the knowledge of the function on the entire space. In the integral expression of eqn [16], we also see that knowledge of the functions f and g on the whole phase space is necessary to determine the value of the star product at the point x.
½21
½22
Equation [22] corresponds to the time-independent Schro¨dinger equation, and is sometimes called the ‘‘-genvalue equation.’’ The spectral decomposition of the Hamilton function is given by X Hðq; pÞ ¼ EE ðq; pÞ ½23 E
where the summation sign may indicate an integration if the spectrum is continuous. The quantum mechanical version of eqn [1] is Z 1 ðH E Þðq; pÞdq dp E¼ 2h Z 1 Hðq; pÞ E ðq; pÞdq dp ¼ ½24 2h where the last expression may be obtained by using eqn [16] for the star product. The time-evolution function for a time-independent Hamilton function is denoted by Exp(Ht), and the fact that the Hamilton function is the generator of the time evolution of the system is expressed by ih
d ExpðHtÞ ¼ H ExpðHtÞ dt
½25
Deformation Quantization
This equation corresponds to the time-dependent Schro¨dinger equation. It is solved by the star exponential: 1 X 1 it n ExpðHtÞ ¼ ðHÞn ½26 n! h n¼0 n
where (H ) = H Hffl{zfflfflfflfflfflfflfflfflfflfflffl H |fflfflfflfflfflfflfflfflfflfflffl ffl} . Because each state n times
of definite energy E has a time evolution exp (iEt=h), the complete time-evolution function may be written in the form X ExpðHtÞ ¼ E eiEt=h ½27 E
This expression is called the ‘‘Fourier–Dirichlet expansion’’ for the time-evolution function. Questions concerning the existence and uniqueness of the star exponential as a C1 function and the nature of the spectrum and the projectors again require careful mathematical analysis. The problem of finding general conditions on the Hamilton function H which ensure a reasonable physical spectrum is analogous to the problem of showing, in the conventional approach, that the symmetric ˆ is self-adjoint and finding its spectral operator H projections.
As an example of the above procedure, we treat the simple one-dimensional harmonic oscillator characterized by the classical Hamilton function p2 m!2 2 þ q 2m 2
In terms of the holomorphic variables rffiffiffiffiffiffiffi m! p
a¼ qþi ; 2 m! rffiffiffiffiffiffiffi m! p
a ¼ qi 2 m!
½28
½29
the Hamilton function becomes H ¼ !a a
½30
Our aim is to calculate the time-evolution function. We first choose a quantization scheme characterized by the normal star product !
h @ a @ a
f N g ¼ f e
g
½31
½a; a ¼ h
a ¼ a aþ h a N
½33
N
Equation [25] for this case is ih
d ExpN ðHtÞ ¼ ðH þ h!a@ a ÞExpN ðHtÞ dt
½34
with the solution ExpN ðHtÞ ¼ eaa=h exp ei!t aa=h
½35
By expanding the last exponential in eqn [35], we obtain the Fourier–Dirichlet expansion 1 X 1 n n in!t a a e n h n! n¼0
ExpN ðHtÞ ¼ eaa=h
½36
From here, we can read off the energy eigenvalues and the projectors describing the states by comparing coefficients in eqns [27] and [36]: ðNÞ
0 ¼ eaa=h ðNÞ ¼ n
½37
1 1 ðNÞ an an ¼ n an N 0 N an n 0 h n! n h En ¼ nh!
ðNÞ
a N 0
½38 ½39
½32
¼0
½40
The spectral decomposition of the Hamilton function (eqn [23]) is in this case 1 X 1 H¼ nh! n eaa=h an an ¼ !aa ½41 h n! n¼0 We now consider the Moyal quantization scheme. If we write eqn [12] in terms of holomorphic coordinates, we obtain ! ! h ½42 f M g ¼ f exp ð@ a @ a @ a @ a Þ g 2 Here, we have h a M a ¼ aa þ ; 2
a M a ¼ aa
h 2
½43
and again ½a; a ¼ h M
we then have a N a ¼ a a;
so that
Note that the spectrum obtained in eqn [39] does not include the zero-point energy. The projector onto the ground state (N) satisfies 0
The Simple Harmonic Oscillator
Hðq; pÞ ¼
5
½44
The value of the commutator of two phase-space variables is fixed by property (3) of the star product,
6 Deformation Quantization
and cannot change when one goes to a c-equivalent star product. The Moyal star product is c-equivalent to the normal star product with the transition operator ~~
T ¼ eðh=2Þ@a @a
½45
We can use this operator to transform the normal product version of the -genvalue equation, eqn [22], into the corresponding Moyal product version according to eqn [15]. The result is h ðMÞ H M n ¼ ! a M a þ ðMÞ 2 M n ¼ h! n þ 12 nðMÞ ½46 with ðMÞ
0
ðNÞ
¼ T0
¼ nðMÞ ¼ TðNÞ n
¼ 2e2aa=h
1 n ðMÞ a M 0 M an nn h
½47 ½48
satisfies The projector onto the ground state (M) 0 ðMÞ
a M 0
¼0
We now have, for the spectrum, En ¼ n þ 12 h!
½49
½50
which is the textbook result. We conclude that for this problem, the Moyal quantization scheme is the correct one. The use of the Moyal product in eqn [25] for the star exponential of the harmonic oscillator leads to the following differential equation for the time evolution function: i h
d ExpM ðHtÞ dt ! ðh!Þ2 ð h!Þ2 2 @H H@H ¼ H ExpM ðHtÞ 4 4
½51
The solution is ExpM ðHtÞ ¼
1 exp cosð!t=2Þ
!t 2H tan ½52 i h! 2
This expression can be brought into the form of the Fourier–Dirichlet expansion of eqn [27] by using the generating function for the Laguerre polynomials: X 1 1 zs exp sn ð1Þn Ln ðzÞ ½53 ¼ 1þs 1þs n¼0 with s = ei!t . The projectors then become 4H nðMÞ ¼ 2ð1Þn e2H=h! Ln h!
which is equivalent to the expression already found in eqn [48].
Conventional Quantization One usually finds the observables characterizing some quantum mechanical system by starting from the corresponding classical system, and then, either by guessing or by using some more or less systematic method, and finding the corresponding representations of the classical quantities in the quantum system. The guiding principle is the correspondence principle: the quantum mechanical relations are supposed to reduce somehow to the classical relations in an appropriate limit. Early attempts to systematize this procedure involved finding an assignment rule that associates to each phasespace function f a linear operator in Hilbert space fˆ = (f ) in such a way that in the limit h ! 0, the quantum mechanical equations of motion go over to the classical equations. Such an assignment cannot be unique, because even though an operator that is a ˆ and Pˆ reduces to a function of the basic operators Q unique phase-space function in the limit h ! 0, there are many ways to assign an operator to a given phase-space function, due to the different orderings ˆ and Pˆ that all reduce to the of the operators Q original phase-space function. Different ordering procedures correspond to different quantization schemes. It turns out that there is no quantization scheme for systems with observables that depend on the coordinates or the momenta to a higher power than quadratic which leads to a correspondence between the quantum mechanical and the classical equations of motion, and which simultaneously strictly maintains the Dirac–von Neumann requireˆ g] ment that (1=ih)[f, ˆ $ {f , g}. Only within the framework of deformation quantization does the correspondence principle acquire a precise meaning. A general scheme for associating phase-space functions and Hilbert space operators, which includes all of the usual orderings, is given as follows: the operator (f ) corresponding to a given phase-space function f is Z ^ ^ ðf Þ ¼ ~f ð; ÞeiðQþPÞ eð;Þ d d ½55 ˆ P) ˆ are the where f˜ is the Fourier transform of f, and (Q, Schro¨dinger operators that correspond to the phasespace variables (q, p); (, ) is a quadratic form: ð; Þ ¼
½54
h ð2 þ 2 þ 2i Þ 4
½56
Different choices for the constants (, , ) yield different operator ordering schemes.
Deformation Quantization
The relation between operator algebras and star products is given by ðf ÞðgÞ ¼ ðf gÞ
½57
where is a linear assignment of the kind discussed above. Different assignments, which correspond to different operator orderings, correspond to c-equivalent star products. It demonstrates that the quantum mechanical algebra of observables is a representation of the star product algebra. Because in the algebraic approach to quantum theory all the information concerning the quantum system may be extracted from the algebra of observables, specifying the star product completely determines the quantum system. The inverse procedure of finding the phase-space function that corresponds to a given operator fˆ is, for the special case of Weyl ordering, given by Z f ðq; pÞ ¼ hq þ 12j^f jq 12ieip=h d ½58 When using holonomic coordinates, it is convenient to work with the coherent states ^ ajai ¼ ajai;
h aj^ ay ¼ h aj a
½59
These states are related to the energy eigenstates of the harmonic oscillator 1 yn a j0i jni ¼ pffiffiffiffi ^ n!
½60
by 1 X an pffiffiffiffi jni; n! n¼0 1 X an 1 pffiffiffiffi hnj haj ¼ e2aa=h n! n¼0
n ðqÞj
2
½64
and the integral over the position gives the probability distribution in momentum space: Z 1 2 ~ ðMÞ ½65 n ðq; pÞdq ¼ hpjnihnjpi ¼ j n ðpÞj 2h The normalization is Z 1 ðMÞ n ðq; pÞdq dp ¼ 1 2h
½66
which is the same as eqn [20]. Applying these relations to the ground-state projector of the harmonic oscillator, eqn [47] shows that this is a minimum-uncertainty state. In the classical limit h ! 0, it goes to a Dirac -function. The expectation value of the Hamiltonian operator is Z Z 1 ^ n jqidq ðH M nðMÞ Þðq; pÞdq dp ¼ hqjH^ 2h ^ n Þ ¼ trðH^ ½67
Quantum Field Theory
½62
For holomorphic coordinates, it is easy to show 1 1 h ajnihnjai ¼ n ð aaÞn eaa=h n h h n!
¼ hqjnihnjqi ¼ j
½61
In normal ordering, we obtain the phase space function f (a, a) corresponding to the operator fˆ by just taking the matrix element between coherent states:
ðNÞ aÞ ¼ n ða;
normal star product. In the density matrix formalism, we say that the projection operator is that of a pure state, which is characterized by the property of being idempotent: ^2n = ^n (compare eqn [21]). The integral of the projector over the momentum gives the probability distribution in position space: Z 1 ðMÞ n ðq; pÞdp 2h Z 1 hq þ =2jnihnjq =2ieip=h d dp ¼ 2h
which should be compared to eqn [24].
1
jai ¼ e2aa=h
f ða; aÞ ¼ h ajf ð^ a; ^ ay Þjai
7
½63
in agreement with eqn [38] for the normal star product projectors. The star exponential Exp(Ht) and the projectors n are the phase-space representations of the timeˆ h) and the projection evolution operator exp (iHt= operators ^n = jnihnj, respectively. Weyl ordering corresponds to the use of the Moyal star product for quantization and normal ordering to the use of the
A real scalar field is given in terms of the coefficients a(k), a(k) by ¯ Z h i d3 k ðxÞ ¼ aðkÞeikx þ aðkÞeikx ½68 3=2 pffiffiffiffiffiffiffiffi 2!k ð2Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where h!k = h2 k2 þ m2 is the energy of a singlequantum of the field. The corresponding quantum field operator is Z h i d3 k ikx y ikx ^ ^ a ðkÞe þ a ðkÞe ðxÞ ¼ ½69 p ffiffiffiffiffiffiffiffi ð2Þ3=2 2!k ˆ where a(k), aˆ y (k) are the annihilation and creation operators for a quantum of the field with momentum hk. The Hamiltonian is Z H ¼ d3 kh!k ^ay ðkÞ^aðkÞ ½70
8 Deformation Quantization
ˆ N(k) = aˆ y (k)a(k) is interpreted as the number operator, and eqn [70] is then just the generalization of eqn [39], the expression for the energy of the harmonic oscillator in the normal ordering scheme, for an infinite number of degrees of freedom. Had we chosen the Weyl-ordering scheme, it would have resulted in (by the generalization of eqn [50]) an infinite vacuum energy. Hence, requiring the vacuum energy to vanish implies the choice of the normal ordering scheme in free field theory. In the framework of deformation quantization, this requirement leads to the choice of the normal star product for treating free scalar fields: only for this choice is the star product well defined. Currently, in realistic physical field theories involving interacting relativistic fields we are limited to perturbative calculations. The objects of interest are products of the fields. The analog of the Moyal product of eqn [11] for systems with an infinite number of degrees of freedom is ðx1 Þ ðx2 Þ ðxn Þ " # Z 1X 4 4 ðx yÞ ¼ exp d xd y 2 i 0 F ðxÞ ¼ ½74 ðxÞ for x0 < 0 where þ (x), (x) are the propagators for the positive and negative frequency components of the field, respectively. In operator language F ðx yÞ ¼ T ððxÞðyÞÞ N ððxÞðyÞÞ
½75
where T indicates the time-ordered product of the fields and N the normal-ordered product. Because the second term in eqn [75] is a normal-ordered product with vanishing vacuum expectation value, the Feynman propagator may be simply characterized as the vacuum expectation value of the time-ordered product of the fields. The antisymmetric part of the positive frequency propagator is the Schwinger function: þ ðxÞ þ ðxÞ ¼ þ ðxÞ þ ðxÞ ¼ ðxÞ
½76
The fact that going over to a c-equivalent product leaves the antisymmetric part of the differential operator in the exponent of eqn [71] invariant suggests that the use of the positive frequency propagator instead of the Schwinger function merely involves the passage to a c-equivalent star product. This is indeed easy to verify. The time-ordered product of the operators is obtained by replacing the Schwinger function (x y) in eqn [72] by the c-equivalent positive frequency propagator þ (x y), restricting the time integration to x0 > y0 , as in eqn [74], and symmetrizing the integral in the variables x and y, which brings in the negative frequency propagator (x y) for times x0 < y0 . Then eqn [71] becomes Wick’s theorem, which is the basic tool of relativistic perturbation theory. In operator language T ððx1 Þ; . . . ; ðxn ÞÞ Z 1 d4 x d4 y F ðx yÞ ¼ exp 2 ðxÞ ðyÞ N ððx1 Þ; . . . ; ðxn ÞÞ
½77
Another interesting relation between deformation quantization and quantum field theory has been uncovered by studies of the Poisson–Sigma model. This model involves a set of scalar fields Xi which map a two-dimensional manifold 2 onto a Poisson space M, as well as generalized gauge fields Ai , which are 1-forms on 2 mapping to 1-forms on M. The action is given by Z SPS ¼ ðAi dXi þ ij Ai Aj Þ ½78 2 ij
where is the Poisson structure of M. A remarkable formula was found (Cattaneo and Felder 2000): Z ðf gÞðxÞ ¼ DXDAf ðXð1ÞÞgðXð2ÞÞeiSPS =h ½79 bi
where f, g are functions on M, is Kontsevich’s star product (Kontsevich 1997), and the functional integration is over all fields X that satisfy the boundary condition X(1) = x. Here 2 is taken to be a disk in R 2 ; 1, 2, and 1 are three points on its circumference. By expanding the functional integral in eqn [79] according to the usual rules of perturbation theory, one finds that the coefficients of the powers of h reproduce the graphs bi
Deformation Quantization and Representation Theory
and weights that characterize Kontsevich’s star product. For the case in which the Poisson tensor is invertible, we can perform the Gaussian integration in eqn [79] involving the fields Ai . The result is ðf gÞðxÞ Z Z i i j ij dX dX ¼ DXf ðXð1ÞÞgðXð2ÞÞ exp ½80 h Equation [80] is formally similar to eqn [16] for the Moyal product, to which the Kontsevich product reduces in the symplecticR case. Here ij = (ij )1 is the symplectic 2-form, and ij dXi dXj is the symplectic volume of the manifold M. To make this relationship exact, one must integrate out the gauge degrees of freedom in the functional integral in eqn [79]. Since the Poisson-sigma model represents a topological field theory there remains only a finite-dimensional integral, which coincides with the integral in eqn [80]. See also: Deformations of the Poisson Bracket on a Symplectic Manifold; Deformation Quantization and Representation Theory; Deformation Theory; Fedosov Quantization; Noncommutative Geometry from Strings; Operads; Quantum Field Theory: A Brief Introduction; Schro¨dinger Operators.
Further Reading Bayen F, Flato M, Fronsdal C, Lichnerowicz A, and Sternheimer D (1978) Deformation theory and quantization I, II. Annals of Physics (NY) 111: 61–110, 111–151.
9
Cattaneo AS and Felder G (2000) A path integral approach to the Kontsevich quantization formula. Communications in Mathematical Physics 212: 591–611. Dito G and Sternheimer D (2002) Deformation Quantization: genesis, developments, and metamorphoses. In: Halbout G (ed.) Deformation Quantization, IRMA Lectures in Mathematical Physics, vol. I, pp. 9–54, (Walter de Gruyter, Berlin, 2002), math.QA/02201168. Gerstenhaber M (1964) On the deformation of rings and algebras. Annals of Mathematics 79: 59–103. Groenewold HJ (1946) On the principles of elementary quantum mechanics. Physica 12: 405–460. Hirshfeld A and Henselder P (2002a) Deformation quantization in the teaching of quantum mechanics. American Journal of Physics 70(5): 537–547. Hirshfeld A and Henselder P (2002b) Deformation quantization for systems with fermions. Annals of Physics 302: 59–77. Hirshfeld A and Henselder P (2002c) Star products and perturbative field theory. Annals of Physics (NY) 298: 382–393. Hirshfeld A and Henselder P (2003) Star products and quantum groups in quantum mechanics and field theory. Annals of Physics 308: 311–328. Hirshfeld A, Henselder P, and Spernat T (2004) Cliffordization, spin, and fermionic star products. Annals of Physics (NY) 314: 75–98. Hirshfeld A, Henselder P, and Spernat T (2005) Star products and geometric algebra. Annals of Physics (NY) 317: 107–129. Kontsevich M (1997) Deformation quantization of Poisson manifolds, q-alg/9709040. Moyal JE (1949) Quantum mechanics as a statistical theory. Proceedings of Cambridge Philosophical Society 45: 99–124. Zachos C (2001) Deformation quantization: quantum mechanics lives and works in phase space, hep-th/0110114.
Deformation Quantization and Representation Theory S Waldmann, Albert-Ludwigs-Universita¨t Freiburg, Freiburg, Germany ª 2006 Elsevier Ltd. All rights reserved.
The Quantization Problem Though quantum theory for the classical phase space R2n is well established by means of what usually is called canonical quantization, physics demands to go beyond R2n : On the one hand, systems with constraints lead by phase-space reduction to classical phase spaces different from R2n ; in general one ends up with a symplectic or even Poisson manifold. Thus, one needs to quantize geometrically nontrivial phase spaces. On the other hand, field theories and thermodynamical systems require to pass from R2n to infinitely many degrees of freedom, where one faces additional analytical difficulties. Both types of difficulties combine
for gauge field theories and gravity, whence it is clear that quantization is still one of the most important issues in mathematical physics. One possibility (among many others) is to use the structural similarity between the classical and quantum observable algebras. In both cases the observables constitute a complex -algebra: in the classical case it is commutative with the additional structure of a Poisson bracket, whereas in the quantum case the algebra is noncommutative. In deformation quantization, one tries to pass from the classical observables to the quantum observables by a deformation of the algebraic structures.
From Canonical Quantization to Star Products Let us briefly recall canonical quantization and the ordering problem. In order to ‘‘quantize’’ classical
10
Deformation Quantization and Representation Theory
observables like the polynomials on R2n to qk , pl , one assigns the operators qk 7! %ðqk Þ ¼ Qk ¼ ðq 7! qk ðqÞÞ
½1
h @ ðqÞ pl 7! %ðpl Þ ¼ Pl ¼ q 7! i @ql
½2
for k, l = 1, . . . , n, defined on a suitable domain in n L2 (Rn , dn q). For simplicity, we choose C1 0 (R ) as domain. The well-known ordering problem is encountered if one wants to also quantize higher polynomials. One convenient (although not the only) possibility is Weyl’s total symmetrization rule, that is, for a monomial like q2 p we take the quantization %Weyl ðq2 pÞ ¼ 13 ðQ2 P þ QPQ þ PQ2 Þ @ i hq ¼ i hq2 @q
h N ¼ exp 2i
f ?g¼
1 X
r Cr ðf ; gÞ
½8
for f , g 2 C1 (M)[[]] such that ½4
1. C0 (f , g) = fg and C1 (f , g) C1 (g, f ) = i{f , g}, 2. 1 ? f = f = f ? 1, and 3. Cr is a bidifferential operator. If in addition f ? g = g ? f , then ? is called Hermitian.
and
@2 ¼ i @q @pi
Using [4] one can easily extend %Weyl to all functions f 2 C1 (R2n ) which are polynomial in the momentum variables only and have an arbitrary smooth dependence on the position variables. This Poisson subalgebra of C1 (R 2n ) certainly covers all classical observables of physical interest. Denoting these observables by Pol(T R n ), one obtains a linear isomorphism ffi
%Weyl : PolðT Rn Þ ! DiffopðRn Þ
½5
into the differential operators with smooth coefficients, called Weyl symbol calculus. Other orderings would result in a different linear isomorphism like [5], for example, the standard ordering is obtained by simply omitting the operator N in [4]. Using [5], one can pull back the operator product of Diffop(R n ) to obtain a new product ?Weyl for Pol(T Rn ), that is ½6 f ?Weyl g ¼ %Weyl1 ð%Weyl ðf Þ%Weyl ðgÞÞ which is called the Weyl–Moyal star product. Explicitly, one has f ?Weyl g ¼ exp
Definition 1 A formal star product on a Poisson manifold (M, ) is an associative C[[]]-bilinear product
r¼0
with
bi
½3
This can be written in the more explicit form: %Weyl ðf Þ 1 X 1 h r @ r ðNf Þ @r ¼ r! i @pi1 @pir p¼0 @qi1 @qir r¼0
where (f g) = fg is the commutative product. Clearly, for f , g 2 Pol(T Rn ) the exponential series terminates after finitely many terms. If one now wants to extend further to all smooth functions, then [7] is only a formal power series in h. Since on a manifold one does not have a priori a nice distinguished class of functions like Pol(T R n ), one indeed has to generalize in this direction if a geometric framework is desired. This observation and the simple fact, that ?Weyl satisfies all the following properties, lead to the definition of a formal star product by Bayen et al. (1978):
ih @ @ @ @ 2 @qk @pk @pk @qk
Clearly, ?Weyl defines a Hermitian star product for R2n . The first condition is called the correspondence principle in deformation quantization and the formal parameter = corresponds to Planck’s constant h once P a convergence scheme is established. r If S = id þ 1 r = 1 Sr is a formal series of differential operators with Sr 1 = 0 for r 1, then it is easy to see that f ?0 g ¼ S1 ðSf ? SgÞ
½9
defines again a star product which is Hermitian if ? is Hermitian and if in addition Sf = Sf . In particular, the operator N, as before, serves for the transition from ?Weyl to the standard-ordered star product ?Std obtained the same way from the standard-ordered quantization. Thus, [9] can be seen as the abstract notion of changing the ordering prescription, even if no operator representation has been specified. Two star products related by such an equivalence transformation are called equivalent and -equivalent in the Hermitian case. One main advantage of formal deformation quantization is that one has very strong existence and classification results: Theorem 2 On every Poisson manifold there exists a star product.
!! f g
½7
bi
The above theorem was first shown by deWilde and Lecomte (1983) for the symplectic case and
Deformation Quantization and Representation Theory 11 bi
independently by Fedosov (1985) and Omori, Maeda, and Yoshioka (1991). In 1997, Kontsevich was able to prove the general Poisson case by showing his profound formality theorem. The full classification of star products up to equivalence was first obtained for the symplectic case by Nest and Tsygan (1995) and independently by Deligne (1995), Bertelson, Cahen, and Gutt (1997), and Weinstein and Xu (1997). The general Poisson case again follows from Kontsevich’s formality. In particular, in the symplectic case, star products are classified by their characteristic class
! : A ! C is called positive if !(a a) 0. For formal deformation quantization, things are slightly more subtle as now one has to consider C[[]]-linear functionals ! : ðC1 ðMÞ½½; ?Þ ! C½½
bi
½11
bi
bi
c : ? 7! cð?Þ 2
½! þ H2deRham ðM; CÞ½½ i
½10
As conclusion one can state that for the price of formal power series in h one obtains in formal deformation quantization a very general and wellunderstood picture of the observable algebra for the quantum version of any classical system described kinematically by a Poisson manifold. It turns out that already in this framework one can discuss dynamics as well by use of a Heisenberg equation formulated with ?. Moreover, the quantization of symmetries described by Hamiltonian Lie group or Lie algebra actions has been extensively studied. For a physical theory of quantization, however, there are still at least two ingredients missing. On the one hand, one has to overcome the formal power series expansion in h. This problem is, in principle, on the same footing as any perturbative approach to quantum theory and thus no easy answer can be expected to hold in general. In particular examples, however, such as the Weyl– Moyal star product, it can easily be solved. These issues together with the corresponding questions about a spectral calculus are best studied in the framework of Rieffel’s strict deformation quantization based on a more C -algebraic formulation of the deformation problem. On the other hand, the observable algebra is not enough to describe a quantum system: one also needs to have a notion for the states. It turns out that already in the formal framework one has a physically reasonable notion of states as discussed by Bordemann and Waldmann (1998). bi
where ? is assumed to be a Hermitian star product in the following. Then the positivity is understood in the sense of formal power where a 2 R[[]] is P series r called positive if a = 1 a with ar0 > 0. Thus, r r = r0 we can make sense out of the following requirement: Definition 3 Let ? be a Hermitian star product on M. A C[[]]-linear functional ! : C1 (M)[[]] ! C[[]] is called positive with respect to ? if !ðf ? f Þ 0
½12
and it is called a state if, in addition, !(1) = 1. In fact, !(f ) is interpreted as the expectation value of the observable f in the state !. The positivity [12] ensures that the usual uncertainty relations between expectation values hold. Sometimes it is convenient to consider positive functionals only defined on a (proper) -ideal in C1 (M)[[]], for instance, C1 0 (M)[[]]. Since in some situations one wants more general formal series than just power series, it is convenient to embed the above definition of states into a larger and more algebraic context: consider an ordered ring R, that is, a commutative, associative, unital ring R together with a distinguished subset P R (the positive elements) such that R is the _ [P, _ disjoint union P[{0} and we have P P P and P þ P P. Then C = R(i) denotes the ring extension by a square root i of 1 and consider -algebras A over C. Clearly, this generalizes the cases R = R, where C = C, as well as R = R[[]], where C = C[[]]. In this way, one provides a framework where C -algebras, -algebras over C, and formal Hermitian star products can be treated on the same footing. It is clear that the definition of a positive functional immediately extends to ! : A ! C for such a ring C. Example 4 (i) For the Wick star product on R2n ffi Cn , defined by
States and Representations The notion of states in deformation quantization is adapted from the C -algebraic world and based on the notion of positive functionals. Recall that for a -algebra A over C a linear functional
f ? Wick g ¼
1 X ð2Þr r¼0
r!
@rf @zi1 @zir
@rg @z @zir i1
½13
the -functional : f 7! f (0) is positive. Note, however, that is not positive for ?Weyl .
12
Deformation Quantization and Representation Theory
(ii) For the Weyl–Moyal star product ?Weyl the Schro¨dinger functional Z !ðf Þ ¼ f ðq; p ¼ 0Þdn q ½14 Rn
2n defined on the -ideal C1 0 (R )[[]], is positive. (iii) For any connected symplectic manifold (M, !) and any Hermitian star product ?, there exists a unique normalized trace functional
tr : C1 0 ðMÞ½½!C½½ trðf ? gÞ ¼ trðg ? f Þ
½15
with zeroth order equal to the integration over M with respect to the Liouville measure = !n . Then this trace is positive as well, tr(f ? f ) 0. Having a notion for states as expectation-value functionals is still not enough to formulate quantum theory. One main feature of quantum states, the superposition principle, is not yet implemented. In particular, forming convex combinations like ! = c1 !1 þ c2 !2 , with c1 , c2 0 and c1 þ c2 = 1, does not give a superposition of !1 and !2 but a mixed stated. Hence, one needs an additional linear structure on the states whence we look for a -representation of the observable algebra A on a pre-Hilbert space H over C such that the states !1 , !2 can be written as vector states !i (a) = hi , (a)i i for some unit vectors 1 , 2 2 H. Then one can build superpositions of the vectors 1 , 2 in the usual way. While this is the well-known argument in any quantum theory based on the observable algebras, for deformation quantization one first has to make sense out of the above notions, since now R = R[[]] is only an ordered ring. This can actually be done in a consistent way as demonstrated and exemplified by Bordemann, Bursztyn, Waldmann, and others. We recall the basic results: A pre-Hilbert space H over C is a C-module with a C-sesquilinear inner product h , i : H H ! C such that h, i = h , i and h, i > 0 for 6¼ 0. This makes sense since R is ordered. An operator A : H1 ! H2 is called adjointable if there exists an operator A : H2 ! H1 such that hA, i2 = h, A i1 for all 2 H1 , 2 H2 . The set of adjointable operators is denoted by B(H1 , H2 ), and B(H) = B(H, H) turns out to be a -algebra over C. This allows one to define a -representation of A on H to be a -homomorphism : A ! B(H). An intertwiner T between two -representations (H1 , 1 ) and (H2 , 2 ) is an operator T 2 B(H1 , H2 ) with T1 (a) = 2 (a)T for all a 2 A. This defines the category -Rep(A) of -representations of A. Let us now recall that a positive linear functional ! can be written as an expectation value for a vector
state in some representation. This is the well-known Gelfand–Naimark–Segal (GNS) construction from operator algebra theory which can be transferred to this purely algebraic context (Bordemann and Waldmann 1998). First recall that any positive linear functional ! : A ! C satisfies the Cauchy– Schwarz inequality bi
!ða bÞ !ða bÞ !ða aÞ!ðb bÞ
½16
and !(a b) = !(b a). If A is unital, which will always be assumed for simplicity, then !(a ) = !(a) follows. Then J! ¼ fa 2 A j !ða aÞ ¼ 0g
½17
is a left ideal in A, the so-called Gel’fand ideal, and hence H! = A=J ! is a left A-module with module structure denoted by ! (a) b = ab , where b 2 H! denotes the equivalence class of b 2 A. Finally, h b , c i = !(b c) turns H! into a pre-Hilbert space and ! becomes a -representation, the GNS representation with respect to !. Moreover, 1 2 H! is a cyclic vector, b = ! (b) 1 , with the property !ðaÞ ¼ h
1 ; ! ðaÞ 1 i
½18
These properties characterize the GNS representation (H! , ! , 1 ) up to unitary equivalence. Example 5 We can now apply this construction to the three basic examples and obtain the following well-known representations as GNS representations: (i) The GNS representation corresponding to the -functional and the Wick star product is (unitarily equivalent to) the formal Bargmann–Fock representation. Here H = C[[y1 , . . . , y n ]][[]] with inner product h; i ¼
1 X ð2Þr r¼0
@ r ð0Þ r! @y @yir i1
@r ð0Þ @yi1 @yir
½19
and is explicitly given by ðf Þ ¼
1 X ð2Þr r;s¼0
@ rþs f ð0Þ r!s! @zi1 @zir @zj1 @zjs
yj 1 yj s
@r @y @yir
½20
i1
yi In particular, (zi ) = 2@=@yi and (zi ) = are the annihilation and creation operators and [20] gives the Wick (or normal) ordering. This basic example has been extended to arbitrary Ka¨hler manifolds by Bordemann and Waldmann (1998). bi
Deformation Quantization and Representation Theory 13
(ii) The Weyl–Moyal star product ?Weyl and the Schro¨dinger functional ! as in [14] give the usual Schro¨dinger representation as GNS repren sentation. We obtain H! = C1 0 (R )[[]] with inner product Z ðqÞ ðqÞ dn q ½21 h; i ¼ Rn
and ! (f ) = %Weyl (f ) as in [4] with h replaced by . The Schro¨dinger representation as a particular case of a GNS representation has been generalized to arbitrary cotangent bundles including representations on sections of line bundles over the configuration space (Dirac’s representation for magnetic monopoles) by Bordemann, Neumaier, Pflaum, and Waldmann (1999, 2003). In this context, the WKB expansion can also be formulated. (iii) For the positive trace tr, the GNS pre-Hilbert is simply the space Htr = C1 0 (M)[[]] with inner product hf , gi = tr(f ? g). The corresponding GNS representation is the left regular representation tr (f )g = f ? g. Note that in this case the commutant of the representation is (anti-)isomorphic to the observable algebra and given by all the right multiplications. Thus, tr is highly reducible and the size of the commutant indicates a ‘‘thermodynamical’’ interpretation of this representation. Indeed, one can take this GNS representation, and more general for arbitrary KMS functionals, as a starting point of a preliminary version of a Tomita–Takesaki theory for deformation quantization as shown by Waldmann (1999). After these fundamental examples, we now reconsider the question of superpositions: in general, two (pure) states !1 , !2 cannot be realized as vector states inside a single irreducible representation. One encounters superselection rules. Usually, for instance, in algebraic quantum field theory, the existence of superselection rules indicates the presence of charges. In particular, it is not sufficient to consider one single representation of the observable algebra A. Instead, one has to investigate (as good as possible) all superselection sectors of the representation theory -Rep(A) of A and find physically motivated criteria to select distinguished representations. In usual quantum mechanics on R2n , this turns out to be rather simple, thanks to the (nontrivial) uniqueness theorem of von Neumann: one has a unique irreducible representation of the Weyl algebra up to unitary equivalence. In infinite dimensions or in topologically nontrivial situations, however, von Neumann’s theorem does not apply and one indeed has superselection rules.
In deformation quantization, some parts of these superselection rules have been understood well: again, for cotangent bundles T Q, one can classify the unitary equivalence classes of Schro¨dinger-like representations on C1 0 (Q)[[]] by topological classes of nontrivial vector potentials. Thus, one arrives at the interpretation of the Aharonov–Bohm effect as superselection rule where theclassification is essentially given by H1deRham (Q, C) 2i H1deRham (Q, Z).
General Representation Theory Although it is very much desirable to determine the structure and the superselection sectors in -Rep(A) completely, this is only achievable in the very simplest examples. Moreover, for formal star products, many artifacts due to the purely algebraic nature have to be expected: the Bargmann–Fock and Schro¨dinger representation in Example 5 are unitarily inequivalent and thus define a superselection rule, even the pre-Hilbert spaces are nonisomorphic. However, these artifacts vanish immediately when one imposes the suitable convergence conditions together with appropriate topological completions (von Neumanns’s theorem). Given such problems, it is very difficult to find ‘‘hard’’ superselection rules which indeed have physical significance already at the formal level. Nevertheless, the example of the Aharonov–Bohm effect shows that this is possible. In any case, new techniques for investigating -Rep(A) have to be developed. It turns out that comparing -Rep(A) with some other -Rep(B) is much simpler but still gives some nontrivial insight in the structure of the representation theory. Here the Morita theory provides a highly sophisticated tool. The classical notion of Morita equivalence as well as Rieffel’s more specialized strong Morita equivalence for C -algebras have been transferred to deformation quantization and, more generally, to -algebras A over C = R(i) by Bursztyn and Waldmann (2001). The aim is to construct functors bi
F : -RepðAÞ! -RepðBÞ
½22
which allow us to compare these categories and determine whether they are equivalent. But even if they are not equivalent, functors such as [22] are interesting. As example, one considers the situation of classical phase space reduction M V Mred as it is present in every constraint system or gauge theory. Suppose one succeeded with the (highly nontrivial) problem of quantizing both classical phase spaces in a reasonable way whence one has quantum observable algebras A and Ared . Then, of course, a relation between -Rep(A) and -Rep(Ared ) is of
14
Deformation Quantization and Representation Theory
particular physical interest although one cannot expect both representation theories to be equivalent: A contains additional but physically irrelevant structure leading to possibly ‘‘more’’ representations. To get a clear picture of the Morita theory, one has to extend the notion of -representations to the following framework: for an auxiliary -algebra D over C, one defines a pre-Hilbert right D-module to be a right D-module H together with a C-sesquilinear D-valued inner product h , i : H H ! D such that h, i and h, di = h, id for d 2 D and such that h , i is completely positive. This means (hi , j i) 2 Mn (D)þ for all 1 , . . . , n , where, in general, an algebra element a 2 A is called positive, a 2 Aþ , if !(a) 0 for all positive linear functionals ! : A ! C. Then one defines B(H) analogously as for preHilbert spaces leading to a definition of a -representation of A on a pre-Hilbert right Dmodule H. The corresponding category of -representations is denoted by -RepD (A). Clearly, elements in -RepD (A) are in particular (A, D)-bimodules. The advantage is that now one has a tensor ^ taking care of the inner products as well. product For -algebras A, B, C, one has a functor ^ : -RepB ðCÞ -RepA ðBÞ ! -RepA ðCÞ
½23
which, on objects, is essentially given by B . In fact, for F 2 -RepB (C) and E 2 -RepA (B), one defines on the (C, A)-bimodule F B E an A-valued inner product by hx , y i = h, hx, yi i, which turns out to be well defined and completely positive ^ E is F B E equipped with this again. Then F inner product modulo its possibly nonempty degeneracy space. ^ one By fixing one of the arguments of , obtains the functor of Rieffel induction of -representations RE : -RepD ðAÞ ! -RepD ðBÞ
½24
^ for where E 2 -RepA (B) is fixed and RE (H) = E H H 2 -RepD (A). The idea of strong Morita equivalence is then to search for such bimodules E where RE gives an equivalence of categories. In detail, this is accomplished by the following definition, where, for simplicity, only unital -algebras are considered. Definition 6 A (B, A)-bimodule E Morita equivalence bimodule if it completely positive inner products such that both inner products are that
is called a strong is equipped with h , iA and h , iB full, in the sense
C-spanfhx; yiA jx; y 2 Eg ¼ A
½25
and analogously for h , iB , and compatible, in the sense that hb x; yiA ¼ hx; b yiA ;
hx a; yiB ¼ hx; y a iB
hx; yiB z ¼ x hy; ziA
½26
½27
In this case, A and B are called strongly Morita equivalent. It turns out that this is indeed an equivalence relation and that strong Morita equivalence implies the equivalence of the representation theories: Theorem 7 For unital -algebras over C, strong Morita equivalence is an equivalence relation. Theorem 8 If E is a strong Morita equivalence bimodule, then RE as in [24] is an equivalence of categories. Example 9 The fundamental example in Morita theory is that a unital -algebra A is strongly Morita equivalent to the matrices Mn (A) via the (Mn (A), A)bimodule An where the inner product is hx, yiA = P n i = 1 xi yi and h , iMn (A) is uniquely determined by the compatibility condition [27]. An efficient way to encode the whole Morita theory of unital -algebras over C is to collect all strong Morita equivalence bimodules modulo isometric isomorphisms of bimodules. Then the tensor ^ makes this into a ‘‘large’’ groupoid product whose units are the -algebras themselves. This socalled Picard groupoid Pic then encodes everything one can say about strong Morita equivalence. In particular, the orbits of this groupoid are precisely the strong Morita equivalence classes of -algebras. The isotropy groups are the Picard groups Pic(A) which generalize the (outer) automorphism groups.
Strong Morita Equivalence of Star Products This section considers star products from the viewpoint of the Morita equivalence. Here one can show that for A = (C1 (M)[[]], ?), the possible candidates of equivalence bimodules are formal power series of sections 1 (E)[[]] of vector bundles E ! M. This follows as, on the one hand, strong Morita equivalence is compatible with the classical limit = 0 in the sense that it implies strong Morita equivalence of the classical limits. On the other hand, any (classical or quantum) equivalence bimodule is finitely generated and projective as right A-module. Thus, by the Serre–Swan theorem one obtains the sections of a vector bundle in the
Deformation Quantization and Representation Theory 15
classical limit. Now one can show that every vector bundle can uniquely (up to equivalence) be deformed such that 1 (E)[[]] becomes a right A-module. Thus, the only thing to be computed is which deformation ?0 is induced by this deformation of E for the endomorphisms 1 (End(E))[[]], since one can show that then the result will always be a strong Morita equivalence bimodule. The inner products come from deformations of a Hermitian fiber metric on E. Since every vector bundle E ! M can be deformed in this manner in an essentially unique way, we arrive at a general global construction of a noncommutative field theory where the fields are sections of E endowed with a deformed bimodule structure. In the case where M is even a symplectic manifold, a simple extension of Fedosov’s construction of a star product ? gives a rather explicit formula for the deformed bimodule structure of 1 (E)[[]] including a construction of the deformation (1 (End(E))[[]], ?0 ) which acts from the left. As usual in Fedosov’s approach, the construction depends functorially on the choice of a connection rE for E. Returning to the question of strong Morita equivalence of star products, we see that the vector bundle E has to be a line bundle L since only in this case we have 1 (End(E)) ffi C1 (M). Since the deformation of the Hermitian fiber metric is always possible and since two equivalent Hermitian star products are always -equivalent, one can show that strong Morita equivalence is already implied by ring-theoretic Morita equivalence (the converse is true in general). Theorem 10 Star products are strongly Morita equivalent if and only if they are Morita equivalent. An analogous statement holds for C -algebras, known as Beer’s theorem (1982). In the symplectic case, the characteristic class c(?0 ) of the induced star product ?0 can be computed explicitly leading to the following classification by Bursztyn and Waldmann (2002): bi
Theorem 11 Let ?, ?0 be star products on a symplectic manifold M. Then ?0 is (strongly) Morita equivalent to ? if and only if there exists a symplectomorphism such that
cð?0 Þ cð?Þ 2 2iH2deRham ðM; ZÞ
½28
A similar result in the general Poisson case was given by Jurcˇo, Schupp, and Wess (2002) based on Kontsevich’s formality theorem. This approach is motivated by a careful investigation of noncommutative (scalar) field theories.
Finally, it is worth mentioning that [28] has a very simple physical interpretation. Consider again a cotangent bundle T Q with a topologically nontrivial configuration space Q, for example, R3 n{0}. Then there is a canonical Weyl-type star product ?Weyl depending on the choice of a connection r and an integration density > 0, generalizing [7] to a curved situation. Now let B be a magnetic field, modeled as a closed 2-form on Q. Minimal coupling leads to a new star product ?BWeyl describing an electrically charged particle moving in Q in the external field B. Then the two star products ?Weyl and ?BWeyl are (strongly) Morita equivalent if and only if the magnetic field satisfies Dirac’s integrality condition for the (possibly nontrivial) magnetic charges described by B. Thus, Dirac’s condition is responsible for the very strong statement that the quantizations with and without magnetic field are Morita equivalent. In particular, the -representation theories of ?Weyl and ?BWeyl are equivalent. Even more specifically, using B to construct a line bundle L ! Q one obtains the result that Dirac’s -representation of ?BWeyl on 1 0 (L)[[]] is precisely the Rieffel induction of the Schro¨dinger representation of ?Weyl on C1 0 (Q)[[]]. See also: Aharonov–Bohm Effect; Algebraic Approach to Quantum Field Theory; Deformation Quantization; Deformation Theory; Deformations of the Poisson Bracket on a Symplectic Manifold; Fedosov Quantization.
Further Reading Bayen F, Flato M, Frønsdal C, Lichnerowicz A, and Sternheimer D (1978) Deformation theory and quantization. Annals of Physics 111: 61–151. Bertelson M, Cahen M, and Gutt S (1997) Equivalence of star products. Class. Quant. Grav. 14: A93–A107. Bieliavsky P, Dito G, Maeda Y, and Waldmann S (2002) The deformation quantization homepage. http://idefix.physik.unifreiburg.de/ star/en/index.html (regularly updated online bibliography and short introductory articles). Bordemann M and Waldmann S (1998) Formal GNS construction and states in deformation quantization. Communications in Mathematical Physics 195: 549–583. Bordemann M, Neumaier N, Pflaum MJ, and Waldmann S (2003) On representations of star product algebras over cotangent spaces on Hermitian line bundles. Journal of Functional Analysis 199: 1–47. Bursztyn H and Waldmann S (2001) Algebraic Rieffel induction, formal Morita equivalence and applications to deformation quantization. Journal of Geometrical Physics 37: 307–364. Bursztyn H and Waldmann S (2002) The characteristic classes of Morita equivalent star products on symplectic manifolds. Communications in Mathematical Physics 228: 103–121. Deligne P (1995) De´formations de l’alge`bre des fonctions d’une varie´te´ symplectique: comparaison entre Fedosov et DeWilde, Lecomte. Sel. Math. New Series 1(4): 667–697.
16
Deformation Theory
DeWilde M and Lecomte PBA (1983) Existence of star-products and of formal deformations of the Poisson Lie algebra of arbitrary symplectic manifolds. Letters in Mathematical Physics 7: 487–496. Dito G and Sternheimer D (2002) Deformation quantization: genesis, developments and metamorphoses. In: Halbout G (ed.) Deformation Quantization, IRMA Lectures in Mathematics and Theoretical Physics, vol. 1, pp. 9–54. Berlin: Walter de Gruyter. Fedosov BV (1986) Quantization and the index. Soviet Physics Doklady 31(11): 877–878. Gutt S (2000) Variations on deformation quantization. In: Dito G and Sternheimer D (eds.) Confe´rence Moshe´ Flato 1999. Quantization, Deformations, and Symmetries, Mathematical Physics Studies, vol. 21, pp. 217–254. Dordrecht: Kluwer Academic. Jurco B, Schupp P, and Wess J (2002) Noncommutative line bundles and Morita equivalence. Letters of Mathematical Physics 61: 171–186. Kontsevich M (2003) Deformation quantization of Poisson manifolds. Letters in Mathematical Physics 66: 157–216.
Nest R and Tsygan B (1995) Algebraic index theorem. Communications in Mathematical Physics 172: 223–262. Omori H, Maeda Y, and Yoshioka A (1991) Weyl manifolds and deformation quantization. Advanced Mathematics 85: 224–255. Waldmann S (2002) On the representation theory of deformation quantization. In: Halbout G (ed.) Deformation Quantization, IRMA Lectures in Mathematics and Theoretical Physics, vol. 1, pp. 107–133. Berlin: Walter de Gruyter. Waldmann S (2005) States and representation theory in deformation quantization. Reviews of Mathematical Physics 17: 15–75. Weinstein A and Xu P (1998) Hochschild cohomology and characteristic classes for star-products. In: Khovanskij A, Varchenko A, and Vassiliev V (eds.) Geometry of Differential Equations, pp. 177–194. Dedicated to VI Arnol’d on the occasion of his 60th birthday. Providence, RI: American Mathematical Society.
Deformation Theory M J Pflaum, Johann Wolfgang Goethe-Universita¨t, Frankfurt, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction and Historical Remarks In mathematical deformation theory one studies how an object in a certain category of spaces can be varied as a function of the points of a parameter space. In other words, deformation theory thus deals with the structure of families of objects like varieties, singularities, vector bundles, coherent sheaves, algebras, or differentiable maps. Deformation problems appear in various areas of mathematics, in particular in algebra, algebraic and analytic geometry, and mathematical physics. According to Deligne, there is a common philosophy behind all deformation problems in characteristic zero. It is the goal of this survey to explain this point of view. Moreover, we will provide several examples with relevance for mathematical physics. Historically, modern deformation theory has its roots in the work of Grothendieck, Artin, Quillen, Schlessinger, Kodaira–Spencer, Kuranishi, Deligne, Grauert, Gerstenhaber, and Arnol’d. The application of deformation methods to quantization theory goes back to Bayen–Flato–Fronsdal– Lichnerowicz–Sternheimer, and has led to the concept of a star product on symplectic and Poisson manifolds. The existence of such star products has been proved by de Wilde–Lecomte
and Fedosov for symplectic and by Kontsevich for Poisson manifolds. Recently, Fukaya and Kontsevich have found a far-reaching connection between general deformation theory, the theory of moduli, and mirror symmetry. Thus, deformation theory comes back to its origins, which lie in the desire to construct moduli spaces. Briefly, a moduli problem can be described as the attempt to collect all isomorphism classes of spaces of a certain type into one single object, the moduli space, and then to study its geometric and analytic properties. The observations by Fukaya and Kontsevich have led to new insight into the algebraic geometry of mirror varieties and their application to string theory.
Basic Definitions and Examples Deformation theory is based on the notion of a ringed space, so we briefly recall its definition. Definition 1 Let k be a field. By a k-ringed space one understands a topological space X together with a sheaf A of unital k-algebras on X. The sheaf A will be called the structure sheaf of the ringed space. In case each of the stalks Ax , x 2 X, is a local algebra, that is, has a unique maximal ideal mx , one calls (X, A) a locally k-ringed space. Likewise, one defines a commutative k-ringed space as a ringed space such that the stalks of the structure sheaf are all commutative. Given two k-ringed spaces (X, A) and (Y, B), a morphism from (X, A) to (Y, B) is a pair (f , ’), where
Deformation Theory 17
f : X ! Y is a continuous mapping and ’ : f 1 B ! A a morphism of sheaves of algebras. This means in particular that for every point x 2 X there is a homomorphism of algebras ’x : Bf (x) ! Ax induced by ’. Under the assumption that both ringed spaces are local, (f , ’) is called a morphism of locally ringed spaces, if each ’x is a homomorphism of local k-algebras, that is, maps the maximal ideal of Bf (x) to the one of Ax . Clearly, k-ringed spaces (resp. locally or commutative k-ringed spaces) together with their morphisms form a category. The following is a list of examples of ringed spaces, in particular of those which will be needed later. Example 2 (i) Denote by C1 the sheaf of smooth functions on Rn , by C! the sheaf of real analytic functions, and let O be the sheaf of holomorphic functions on Cn . Then (Rn , C1 ), (Rn , C! ), and (Cn , O) are ringed spaces over R resp. C. (ii) A differentiable manifold of dimension n can be understood as a locally R-ringed space (M, C1 M) which locally is isomorphic to (Rn , C1 ). Likewise, a real analytic manifold is a ringed space (M, C!M ) which locally can be modeled by (Rn , C! ), and a complex manifold is an (M, OM ) which locally looks like (Cn , O). (iii) Let D be a domain in Cn , and J an ideal sheaf in OD of finite type, which means that J is locally finitely generated over OD . Let Y be the support of the quotient sheaf OD =J . The pair (Y, OY ), where OY denotes the restriction of OD =J to Y, then is a ringed space, called a complex model space. A complex space now is a ringed space (X, OX ) which locally looks like a complex model space (cf. Grauert and Remmert 1984). (iv) Let k be an algebraically closed field, and An the affine space over k of dimension n. Then An , together with the sheaf of regular functions, is a ringed space. (v) Given a ring A, its spectrum Spec A together with the sheaf of regular functions OA forms a ringed space (cf. (Hartshorne (1997), section II.2)). One calls (Spec A, OA ) an affine scheme. More generally, a scheme is a ringed space (X, OX ) which locally can be modeled by affine schemes. (vi) Finally, if A is a local k-algebra, the pair (, A) can be understood as a locally ringed space. With A the algebra of formal power series k[[t]] over one variable t, this example plays an important role in the theory of formal deformations of algebras. bi
bi
f p Yp
P
Y Figure 1 A fibered space.
Definition 3 A morphism (f , ’) : (Y, B) ! (P, S) of ringed spaces is called fibered, if the following conditions are fulfilled: (i) (P, S) is a commutative locally ringed space; (ii) f : Y ! P is surjective; and (iii) ’y : S f (y) ! By maps S f (y) into the center of By for each y 2 Y. The fiber of (f , ’) over a point p 2 P then is the ringed space (Yp , Bp ) defined by Yp ¼ f 1 ðpÞ;
Bp ¼ Bjf 1 ðpÞ =mp Bjf 1 ðpÞ
where mp is the maximal ideal of S p which acts on Bjf 1 (p) via ’. A fibered morphism of ringed spaces can be pictured in Figure 1. Additionally to this intuitive picture, conditions (i)–(iii) imply that the stalks By are central extensions of By =mf (y) By by S f (y) . Definition 4 Let (P, S) be a commutative locally ringed space over a field k with P connected, let be a fixed point in P, and (X, A) a k-ringed space. A deformation of (X, A) over the parameter space (P, S) with distinguished point then is a fibered morphism (f , ’) : (Y, B) ! (P, S) over k together with an isomorphism (i, ) : (X, A) ! (Y , B ) such that for all p 2 P and y 2 f 1 (p) the homomorphism ’y : S p ! By is flat. The condition of flatness in the definition of a deformation serves as a substitute for ‘‘local triviality’’ and works also in the presence of singularities. (see Palamodov (1990), section 3) for a discussion of this point. In the remainder of this section, we provide a list of some of the most important deformation problems in mathematics, and show how these can be formulated within the above language. bi
Products of k-Ringed Spaces
Let (X, A) be any k-ringed space and (P, S) a k-scheme. For any closed point 2 P, the product
18
Deformation Theory
(X P, B) = (X, A) k (P, S) then is a flat deformation of (X, A) with distinguished point . This can be seen easily from the fact that B(x, p) = Ax k S p for every x 2 X and p 2 P.
y
Families of Matrices as Deformations
Let (P, OP ) be a complex space with distinguished point and AP : P ! Mat(n n, C) a holomorphic family of complex n n matrices over P. By the following construction, AP can be understood as a deformation, more precisely as a deformation of the matrix A := AP (). Let Y be the graph of AP in the product space P Mat(n n, C) and f : Y ! P be the restriction of the projection onto the first coordinate. Define the sheaf B as the inverse image sheaf f 1 S, and let ’ be the sheaf morphism which for every y 2 Y is induced by the identity map ’y : S f (y) ! By := S f (y) . It is then immediately clear that (f , ’) is a deformation of the fiber f 1 () and that this fiber coincides with the matrix A. Now let A be an arbitrary complex n n-matrix, and choose a GL(n, C)-slice through A, that is, a submanifold P containing A which is transversal to the GL(n, C)-orbit through A. Hereby, it is assumed that GL(n, C) acts by the adjoint action on Mat(n n, C). The family AP given by the canonical embedding P ,! Mat(n n, C) now is a deformation of A. The germ of this deformation at is versal in the sense defined in the next section. Deformation of a Scheme a` la Grothendieck
Assume that (P, S) is a connected scheme over k. A deformation of a scheme (X, A) then is a deformation (f , ’) : (Y, B) ! (P, S) in the sense defined above, together with the requirement that f : Y ! P is a proper map, that is, f 1 (K) is compact for every compact K P. As a particular example, consider the k-scheme Y = Spec k[x, y, t]=(xy t]. It gives rise to a fibration Y ! Spec k[t], whose fibers Ya with a 2 k are hyperbolas xy = a, when a 6¼ 0, and consist of the two axes x = 0 and y = 0, when a = 0. For k = R, this deformation can be illustrated as in Figure 2. For further information on this and similar examples, see Hartshorne (1977), in particular example 3.3.2. bi
Deformation of a Complex Space
According to Grothendieck, one understands by a deformation of a complex space (X, A) a morphism of complex spaces (f , ’) : (Y, B) ! (P, S) which is both a proper flat morphism of complex spaces and a deformation of (X, A) as a ringed space. In case (X, A) and (P, S) are complex manifolds and if P is
a =1 a=0 a = 0.5
x
Figure 2 Deformation of the coordinate axes.
connected, each of the fibers Yp is a compact complex manifold. Moreover, the family (Yp )p2P then is a family of compact complex manifolds in the sense of Kodaira–Spencer (cf. Palamodov (1990)). bi
Deformation of Singularities
Let p be a point of some Cn . Two complex spaces (X, OX ) (Cn , O) and (X0 , OX0 ) (Cn , O) with x 2 X \ X0 are then called germ equivalent at x if there exists an open neighborhood U 2 Cn of x such that X \ U = X0 \ U. Obviously, germ equivalence at x is an equivalence relation indeed. We denote the equivalence class of X by [X]x . Clearly, if [X]x = [X0 ]x , then one has OX,x = OX0 , x for the stalks at x. By a singularity one understands a pair ([X]x , OX, x ). In the literature, such a singularity is often denoted by (X, x). The singularity (X, x) is called nonsingular or regular if OX, x is isomorphic to an algebra of convergent power series C{z1 , . . . , zd }. A deformation of a complex singularity (X, x) over a complex germ (P, ) is a morphism of ringed spaces ([Y]x , OY, x ) ! ([P] , OP, ) which is induced by a holomorphic map and which is a deformation of ([X]x , OX, x ) as a ringed space. See Artin (1976) and the overview article by Greuel (1992) for further details and a variety of examples. bi
bi
First-Order Deformation of Algebras
Consider a k-algebra A and the truncated polynomial algebra S = k["]="2 k["]. Furthermore, let : A A ! A be a Hochschild 2-cocycle of A; in other words, assume that the relation a1 ða2 ; a3 Þ ða1 a2 ; a3 Þ þ ða1 ; a2 a3 Þ ða1 ; a2 Þa3 ¼ 0
½1
holds for all a1 , a2 , a3 2 A. Then one can define a new k-algebra B, whose underlying linear structure
Deformation Theory 19
is isomorphic to A k S and whose product is given by the following construction: any element b 2 B can be written uniquely in the form b = a0 þ a1 ", with a0 , a1 2 A. Then the product of b = a0 þ a1 " 2 B and b0 = a00 þ a01 " 2 B is given by b b0 ¼ a0 a00 þ ða0 ; a00 Þ þ a0 a01 þ a1 a00 " ½2
(1=2i){ , } and, even though HH3 (A, A) might not always vanish, a deformation quantization of M, that means a formal deformation of C1 (M) in the direction of the Poisson bracket (1=2i){ , }. For the symplectic case, this fact has been proved first by deWilde–Lecomte using methods from Hochschild cohomology theory. A more geometric and intuitive proof has been given by Fedosov (1996). The Poisson case has been settled in the work of Kontsevich (2003) (see also the section ‘‘Deformation quantization of Poisson manifolds’’). bi
By condition [1], this product is associative. One thus obtains a flat deformation : S ! B of the algebra A and calls it the first-order or infinitesimal deformation of A along the Hochschild cocycle . For further information on this and the connection between deformation theory and Hochschild cohomology, see the overview article by Gerstenhaber and Schack (1986). bi
Formal Deformation of an Algebra
Let us generalize the preceding example and explain the concept of a formal deformation of an algebra by Gerstenhaber. Assume again A to be an arbitrary k-algebra and choose bilinear maps n : A A ! A for n 2 N such that 0 is the product on A and 1 is a Hochschild cocycle. Furthermore, let S be the algebra k[[t]] of formal power series in one variable over k. Then define on the linear space B = A[[t]] of formal power series in one variable with coefficients in A the following bilinear map: ?:BB!B X n2N
n
an t ;
X n2N
! bn t
n
7!
X X n2N
m ðak ; bl Þtn
½3
k;l;m2N kþlþm¼n
If B together with ? becomes a k-algebra or, in other words, if ? is associative, one can easily see that it gives a flat deformation of A over S = k[[t]]. In that case, one says that B is a formal deformation of A by the family (n )n2N . Contrarily to the preceding example, there might not exist for every Hochschild cocycle on A a formal deformation B of A defined by a family (n )n2N such that 1 = . In case it exists, we will say that the deformation B of A is in the direction of . If the third Hochschild cohomology group H3 (A, A) vanishes, there exists for every Hochschild cocycle on A a deformation B of A in the direction of (see again Gerstenhaber and Schack (1986) for further details).
bi
Quantized Universal Enveloping Algebras According to Drinfeld
A quantized universal enveloping algebra for a complex Lie algebra g is a Hopf algebra A over C[[t]] such that A is a topologically free C[[t]]module (i.e., A = (A=tA)[[t]] as left C[[t]]-module) and A=tA is the universal enveloping algebra Ug of g. Because A is a topologically free C[[t]]-module, A is a flat C[[t]]-module and thus a deformation of Ug over C[[t]]. See Drinfel’d (1986) and the monograph by Kassel (1995) for further details and examples of quantized universal enveloping algebras. bi
bi
Quantum Plane
L Consider the tensor algebra T = n2N (R 2 )n of the two-dimensional real vector space R2 , and let (x, y) be the canonical basis of R 2 . Then form the tensor product sheaf T C = T R OC and let I C be the ideal sheaf in T C generated by the relation x y zy x ¼ 0
½4
where z : C ! C is the identity function. The quotient sheaf B = BC = T C =I C then is a sheaf of C-algebras and an OC -module. Using eqn [4] now move all occurrences of x in an element of BC to the right of all y’s. Since 1/z is an element of O(C ), one can thus show that BC is a free OC -module. Hence, BC is flat over OC . Further, it is easy to see that for every q 2 C the C-algebra Aq = Bq =mq Bq is freely generated by elements x, y with relations x y qy x ¼ 0
½5
bi
Formal Deformation Quantization of Symplectic and Poisson Manifolds
Let us consider the last two examples for the case where A is the algebra C1 (M) of smooth functions on a symplectic or Poisson manifold M. Then the Poisson bracket { , } gives a Hochschild cocycle on C1 (M). There exists a first-order deformation of C1 (M) along
We call Aq the q-deformed quantum plane and B = B(C ) the over C universally deformed quantum plane. Altogether, one can interpret B as a deformation of Aq over C , in particular as a deformation of A1 = T R C = C[x, y], the algebra of complex polynomials in two generators. In the same way, one can deform function algebras on higher-dimensional vector spaces as well as function algebras on certain Lie groups. In this manner, one obtains the quantum group
20
Deformation Theory
SUq (2) as a deformation of a Hopf algebra of functions on SU(2). See, for example, the work of Faddeev–Reshetikhin–Takhtajan (1990), Manin (1988) and Wess–Zumino (1990) for more information on q-deformations of vector spaces, Lie groups, differential calculi, etc. bi
bi
bi
unified and quite general method for constructing versal deformations in analytic geometry. (vi) Fialowski–Fuchs have constructed miniversal deformations of Lie algebras.
Schlessinger’s Theorem
Versal Deformations In this section, and the ones that follow, we consider only germs of deformations, that is, deformations over parameter spaces of the form (, S). This means in particular that the structure sheaf only consists of its stalk S at , a commutative local k-algebra. Let us now suppose that the sheaf morphism ’ : (Y, B) ! (, S) (over the canonical map Y ! ) is a deformation of the ringed space (X, A) and that : T ! S is a homomorphism of commutative local k-algebras. Then the sheaf morphism ’ : B S T ! T with ( ’)y (t) = 1 t for y 2 Y and t 2 T is a deformation of (X, A) over the parameter space (, T). One says that the deformation ’ is induced by the homomorphism . Definition 5 A deformation ’ : (Y, B) ! S of (X, A) is called versal if every (germ of a) deformation of (X, A) is isomorphic to a deformation germ induced by a homomorphism of k-algebras : T ! S. A versal deformation is called universal, if the inducing homomorphism : T ! S is unique, and miniversal if S is of minimal dimension.
Example 6 (i) In the section ‘‘Families of matrices as deformations,’’ the construction of a versal deformation of a complex matrix A has been sketched. (ii) According to Kuranishi, every compact complex manifold has a versal deformation by an analytic germ. See Kuranishi (1971) for a detailed exposition and the section ‘‘The Kodaira–Spencer algebra controlling deformations of compact complex manifolds’’ for a description of the principal ideas. (iii) Grauert has shown that for isolated singularities there exists a versal analytic deformation. (iv) By the work of Douady–Verdier, Grauert, and Palamodov one knows that for every compact complex space there exists a miniversal analytic deformation. One of the essential methods in the existence proof hereby is Palamodov’s construction of the cotangent complex (see Palamodov (1990). (v) Bingener (1987) has further established Palomodov’s approach and thus has provided a
According to Grothendieck, spaces in algebraic geometry are represented by functors from a category of commutative rings to the category of sets. In this picture, an affine algebraic variety X over the base field k and with coordinate ring A is equivalently described by the functor Homalg (A, ) defined on the category of commutative k-algebras. As will be shown by examples in the next section, versal deformations are often encoded by functors representing spaces. More precisely, a deformation problem leads to a so-called functor of Artin rings, which means a covariant functor F from the category of (local) Artinian k-algebras to the category of sets such that the set F(k) has exactly one element. The question now arises as to under which conditions the functor F is representable, that is, there exists a commutative k-algebra A such that F ffi Homalg (A, ). In the work of Schlessinger (1968), the structure of functors of Artin rings has been studied in detail. Moreover, criteria have been established, when such a functor is pro-presentable, which means that it can be represented by a complete local ^ where ‘‘completeness’’ is understood algebra A, with respect to the m-adic topology. Because of its importance for deformation theory, we will state Schlessinger’s theorem in this section. Before we come to its details, let us recall some notation. bi
Definition 7 By an Artinian k-algebra over a field k one understands a commutative k-algebra R which satisfies the following descending chain condition:
bi
bi
ðDecÞ
Every descending chain I1 Ik
Ikþ1 of ideals in R becomes stationary:
Among others, an Artinian algebra R has the following properties: 1. R is Noetherian, that is, it satisfies the ascending chain condition. 2. Every prime ideal in R is maximal. 3. (Chinese remainder theorem) R is isomorphic to a finite product ni= 1 Ri , where each Ri is a local Artinian algebra. 4. Every maximal ideal m of R is nilpotent, that is, mk = 0 for some k 2 N. 5. Every quotient R=mk with m maximal is finite dimensional.
Deformation Theory 21
Definition 8 Assume that f : B ! A is a surjective homomorphism in the category k-Algl,Art of local Artinian k-algebras. Then f is called a small extension if ker f is a nonzero principal ideal (b) in B such that mb = (0), where m is the maximal ideal of B.
Differential Graded Lie Algebras
Definition 10 By a graded algebra over a field k one a graded k-vector space A = L understands k k2Z A together with a bilinear map
bi
Theorem 9 (Schlessinger (1968, theorem 2.11)). Let F be a functor of Artin rings (over the base field k). Assume that A0 ! A and A00 ! A are morphisms in k-Algl,Art , and consider the map FðA0 A A00 Þ ! FðA0 Þ FðAÞ FðA00 Þ
½6
Then F is pro-representable if and only if F has the following properties: (H1) The map [6] is a surjection whenever A00 ! A is a small extension. (H2) The map [6] is a bijection, when A = k and A00 = k["]. (H3) One has dimk (tF ) < 1 for the tangent space tF := F(k["]). (H4) For every small extension A0 ! A, the map FðA0 A A0 Þ ! FðA0 Þ FðAÞ FðA0 Þ
ða; bÞ 7! a b ¼ ða; bÞ
such that Ak Al Akþl for all k, l 2 Z. The graded algebra A is called associative if (ab)c = a(bc) for all a, b, c 2 A . A graded subalgebra of A is a graded subspace L B = k2Z Bk A which is closed under , a graded ideal is a graded subalgebra I A such that I A I and A I I . A homomorphism between graded algebras A and B is a homogeneous map f : A ! B of degree 0 such that f (a b) = f (a) f (b) for all a, b 2 A . From now on, assume that k has characteristic 6¼ 2, 3. A graded Lie L algebra then is a graded k-vector space g = k2Z gk together with a bilinear map ½ ; : g g ! g ;
is an isomorphism.
ða; bÞ 7! ½a; b
such that the following axioms hold true:
Suppose that the functor F satisfies conditions ^ be an arbitrary complete local (H1)–(H4), and let A k-algebra. By Yoneda’s lemma, every element ^ nA ^ ^ ¼ proj lim A=m ¼ proj lim n 2 A n2N
: A A ! A ;
n2N
induces a natural transformation ^ Þ ! F; ^ ! R 7! Fðun Þðn Þ ½7 Homalg ðA; u:A where n 2 N is chosen large enough such that the ^ ! R factors through some homomorphism u : A n ^ un : A=m ! R. This is possible indeed, since R is Artinian. In the course of the proof of Schlessinger’s ^ and the element 2 A ^ are now contheorem, A structed in such a way that [7] is an isomorphism.
1. [gk , gl ] gkþl for all k, l 2 Z. 2. [, ] = (1)kl [, ] for all 2 gk , 2 gl . 3. (1)k1 k3 [[1 , 2 ], 3 ] þ (1)k2 k1 [[2 , 3 ], 1 ] þ (1)k3 k2 [[3 , 1 ], 2 ] = 0 for all i 2 gki with i = 1, 2, 3. By axiom (1), it is clear that a graded Lie algebra is in particular a graded algebra. So the above-defined notions of a graded ideal, homomorphism, etc., apply as well to graded Lie algebras. L Example 11 Let A = k2Z Ak be a graded associative algebra. Then A becomes a graded Lie algebra with the bracket ½a; b ¼ ab ð1Þkl ba
for a 2 Ak and b 2 Al
The space A regarded as a graded Lie algebra is often denoted by lie (A ). Definition 12 A linear map D : A ! A defined on a graded algebra A is called a derivation of degree l if
Differential Graded Lie Algebras and Deformation Problems According to a philosophy going back to Deligne ‘‘every deformation problem in characteristic zero is controlled by a differential graded Lie algebra, with quasi-isomorphic differential graded Lie algebras giving the same deformation theory’’ (cf. Goldman and Millson (1988), p. 48). In the following, we will explain the main idea of this concept and apply it to two particular examples. bi
DðabÞ ¼ ðDaÞb þ ð1Þkl aðDbÞ for all a 2 Ak and b 2 A A graded (Lie) algebra A together with a derivation d of degree 1 is called a differential graded (Lie) algebra if d d = 0. Then (A , d) becomes a cochain complex. Since ker d is a graded
22
Deformation Theory
subalgebra of A and im d a graded ideal in ker d, the cohomology space H ðA ; dÞ ¼ ker d=im d inherits the structure of a graded (Lie) algebra from A . Let f : A ! B be a homomorphism of differential graded (Lie) algebras (A , d) and (B , @). Assume further that f is a cochain map, that is, that f d = @ f . Then one says that f is quasi-isomorphism or that the differential graded (Lie) algebras A and B are quasi-isomorphic if the induced homomorphism on the cohomology level f : H (A , d) ! H (B , @) is an isomorphism. Finally, a differential graded (Lie) algebra (A , d) is called formal if it is quasiisomorphic to its cohomology (H (A , d), 0). Maurer–Cartan Equation
Assume that (g , [ , ], d) is a differential graded Lie algebra over C. Define the space MC(g ) of solutions of the Maurer–Cartan equation by MCðg Þ :¼ f! 2 g1 j d! 12½!; ! ¼ 0g
ðX; YÞ 7! logðexp X; exp YÞ
and the group g0 acts on g1 by the exponential function. More precisely, in this situation one can define for two objects , 2 MC(g ) the space of arrows ! as the set of all 2 g0 such that exp = . We have now the means to define for every complex differential graded Lie algebra g its deformation functor Def g . This functor maps the category of local Artinian C-algebras to the category of groupoids and is defined on objects as follows: Def g ðRÞ :¼ MCðg mÞ
Theorem 13 (Deligne, Goldman–Millson). Assume that f : g ! h is a quasi-isomorphism of differential graded Lie algebras. For every local Artinian C-algebra R the induced functor f : Def g (R) ! Def h (R) then is an equivalence of groupoids.
½8
In case the differential graded Lie algebra g is nilpotent, this space naturally possesses a groupoid structure or, in other words, a set of arrows which are all invertible. The reason for this is that, under the assumption of nilpotency, the space g0 is equipped with the Campbell–Hausdorff multiplication g0 g0 ! g0 ;
form Def g . Below, we will show in some detail how this works for two examples, namely the deformation theory of complex manifolds and the deformation quantization of Poisson manifolds. But before we come to this, let us state a result which shows how the deformation functor behaves under quasiisomorphisms of the underlying differential graded Lie algebra. This result is crucial in a sense that it allows to equivalently describe a deformation problem with controlling g by any other differential graded Lie algebra within the quasi-isomorphism class of g . So, in particular in the case where the differential graded Lie algebra is formal, one often obtains a direct solution of the deformation problem.
½9
Hereby, R is a complex local Artinian algebra, and m its maximal ideal. Note that since R is Artinian, g m is a nilpotent differential graded Lie algebra, hence Def g (R) carries a groupoid structure as constructed above. Clearly, Def g is also a functor of Artin rings as defined in the previous section. With appropriate choices of the differential graded Lie algebra g , essentially all deformation problems from the section ‘‘Basic definitions and examples’’ can be recovered via a functor of the
The Kodaira–Spencer Algebra Controlling Deformations of Compact Complex Manifolds
Let M be a compact complex n-dimensional manifold. Recall that then the complexified tangent bundle TC M has a decomposition into a holomorphic tangent bundle T 1, 0 M and an antiholomorphic tangent bundle T 0, 1 M. This leads to a decomposition of the space of complex n-forms into the spaces p, q M of forms on M of type (p, q). More generally, a smooth subbundle J0, 1 TC M which induces a decomposition of the form TC M = J1, 0 J0, 1 , where J1, 0 := J0, 1 , is called an almost complex structure on M. Clearly, the decomposition of TC M into the holomorphic and antiholomorphic part is an almost complex structure, and an almost complex structure which is induced by a complex structure is called integrable. Assume that an almost complex structure J0, 1 is given on M and that it has finite distance to the complex structure on M. The latter means that 1 the restriction %0, of the projection % : TC M ! T 0, 1 M J 1, 0 along T M to the subbundle J0,1 is an isomor1 phism. Denote by the inverse of %0, J , and let ! 2 0, 1 1, 0 (M, T M) be the composition % . One checks immediately that every almost complex structure with finite distance to the complex structure on M is uniquely characterized by a section ! 2 0, 1 (M, T 1, 0 M) and that every element of 0, 1 (M, T 1, 0 M) comes from an almost complex structure on M. As a consequence of the Newlander–Nirenberg theorem, one can now show that the almost
Deformation Theory 23
complex structure J0, 1 resp. ! is integrable if and only if the equation @! 12½!; ! ¼ 0
p2N 1, 0
1, 0
Hereby, (M, T M) denotes the T M-valued differential forms on M of type (0, p), @ : 0,p (M, T 1, 0 M) ! 0, pþ1 (M, T 1, 0 M) the Dolbeault operator, and [ , ] is induced by the Lie bracket of holomorphic vector fields. As a consequence of these considerations, deformations of the complex manifold M can equivalently be described by families (!p )p2P L1 which satisfy eqn [10] and ! = 0. Thus, it remains to determine the associated deformation functor DefL . According to Schlessinger’s theorem, the functor Def L is pro-representable. Hence, there exists a local C-algebra RL complete with respect to the m-adic topology such that Def L ðRÞ ¼ Homalg ðRL ; RÞ
½11
for every local Artinian C-algebra R. Moreover, by Artin’s theorem, there exists a ‘‘convergent’’ solution of the Maurer–Cartan equation, that is, RL can be replaced in eqn. [11] by a ring RL representing an analytic germ. Theorem 14 (Kodaira–Spencer, Kuranishi). The ringed space (RL , (0)) is a miniversal deformation of the complex structure on M. Deformation Quantization of Poisson Manifolds
Let A be an associative k-algebra with char k = 0. Put for every integer k 1 gk :¼ Homk ðAðkþ1Þ ; AÞ Then g becomes a graded vector space. Let us impose a differential and a bracket on g . The differential is the usual Hochschild coboundary b : gk ! gkþ1 , bf ða0 akþ1 Þ :¼ a0 f ða1 akþ1 Þ þ
k X ð1Þiþ1 f ða0 ai aiþ1 akþ1 Þ i¼0
þ ð1Þk f ða0 ak Þakþ1
½; : gk1 gk2 ! gk1 þk2
½10
is fulfilled. But this is nothing else than the Maurer– Cartan equation in the Kodaira–Spencer differential graded Lie algebra ! M 0;p 1;0 L ; @; ½; ¼ ðM; T MÞ; @; ½ ; 0, p
The bracket is the Gerstenhaber bracket
½f1 ; f2 :¼ f1 f2 ð1Þk1 k2 f2 f1 where f1 f2 ða0 ak1 þk2 Þ :¼
k1 X
ð1Þik2 f1 a0 ai1 f2 ðai aiþk2 Þ
i¼0
aiþk2 þ1 ak1 þk2
The triple (g , b, [ , ]) then is a differential graded Lie algebra. Consider the Maurer–Cartan equation b (1=2)[ , ] = 0 in g1 . Obviously, it is equivalent to the equality a0 ða1 ; a2 Þ ða0 a1 ; a2 Þ þ ða0 ; a1 a2 Þ ða0 ; a1 Þa2 ¼ ð ða0 ; a1 Þ; a2 Þ ða0 ; ða1 ; a2 ÞÞ for a0 ; a1 ; a2 2 A
½12
If one defines now for some 2 g1 the bilinear map m : A A ! A by m(a, b) = ab þ (a, b), then [12] implies that m is associative if and only if satisfies the Maurer–Cartan equation. Let us apply these observations to the case where A is the algebra C1 (M)[[t]] of formal power series in one variable with coefficients in the space of smooth functions on a Poisson manifold M. By (a variant of) the theorem of Hochschild–Kostant–Rosenberg and Connes, one knows that in this case the cohomology of (g , b) is given by formal power series with coefficients in the space 1 ( TM) of antisymmetric vector fields. Now, 1 ( TM) carries a natural Lie algebra bracket as well, namely the Schouten bracket. Thus, one obtains a second differential graded Lie algebra (1 ( TM)[[t]], 0, [ , ]). Unfortunately, the projection onto cohomology (g , b) ! 1 ( TM)[[t]] does not preserve the natural brackets, hence is not a quasiisomorphism in the category of differential graded Lie algebras. It has been the fundamental observation by Kontsevich that this defect can be cured as follows. bi
Theorem 15 (Kontsevich 2003). For every Poisson manifold M the differential graded Lie algebra (g , b, [ , ]) is formal in the sense that there exists a quasi-isomorphism (g , b, [ , ]) ! (1 ( TM) [[t]], 0, [ , ]) in the category of L1 -algebras. Note that the theorem only claims the existence of a quasi-isomorphism in the category of L1 -algebras or, in other words, in the category of homotopy Lie algebras. This is a notion somewhat weaker than a differential graded Lie algebra, but Theorem 13 also holds in the context of L1 -algebras.
24
Deformations of the Poisson Bracket on a Symplectic Manifold
Since the solutions of the Maurer–Cartan equation in (1 ( TM)[[t]], 0, [ , ]) are exactly the formal paths of Poisson bivector fields on M, Kontsevich’s formality theorem entails: Corollary 16 Every Poisson manifold has a formal deformation quantization. See also: Deformation Quantization; Deformation Quantization and Representation Theory; Deformations of the Poisson Bracket on a Symplectic Manifold; Fedosov Quantization; Holonomic Quantum Fields; Operads.
Further Reading Artin M (1976) Lectures on Deformations of Singularities. Lectures on Mathematics and Physics, vol. 54. Bombay: Tata Institute of Fundamental Research. II. Bayen F, Flato M, Fronsdal C, Lichnerowicz A, and Sternheimer D (1978) Deformation theory and quantization, I and II. Annals of Physics 111: 61–151. Bingener J (1987) Lokale Modulra¨ume in der analytischen Geometrie, Band 1 und 2, Aspekte der Mathematik Braunschweig: Vieweg. Drinfel’d VG (1986) Quantum Groups, Proc. of the ICM, 1986, pp. 798–820. Faddeev–Reshetikhin–Takhtjan (1990) Quantization of Lie groups and Lie algebras. Leningrad Mathematical Journal 1(1): 193–225. Fedosov B (1996) Deformation Quantization and Index Theory. Berlin: Akademie-Verlag. Grauert H and Remmert R (1984) Coherent Analytic Sheaves. Grundlehren der Mathematischen Wissenschaften, vol. 265. Berlin: Springer.
Gerstenhaber M and Schack S (1986) Algebraic cohomology and deformation theory. Deformation theory of algebras and structures and applications, Nato Advanced Study Institute, Castelvecchio-Pascoli/Italy. NATO ASI series, C247: 11–264. Goldman WM and Millson JJ (1988) The Deformation Theory of Representations of Fundamental Groups of Compact Ka¨hler Manifolds, Publ. Math. Inst. Hautes E´tud. Sci, vol. 67, pp. 43–96. Greuel G-M (1992) Deformation und Klassifikation von Singularita¨ten und Moduln, Jahresbericht der DMV, Jubila¨umstag., 100 Jahre DMV, Bremen/Dtschl. 1990, 177–238. Hartshorne R (1977) Algebraic Geometry. Graduate Texts in Mathematics, vol. 52. Berlin: Springer. Kassel Ch (1995) Quantum Groups, Graduate Texts in Mathematics, vol. 155. New York: Springer. Kontsevich M (2003) Deformation quantization of Poisson manifolds. Letters in Mathematical Physics 66(3): 157–216. Kuranishi M (1971) Deformations of Compact Complex Manifolds, Se´minaire de Mathe´matiques Supe´rieures. E´te´ 1969 Montreal: Les Presses de l’Universite´ de Montreal. Manin (1988) Quantum Groups and Non-Commutative Geometry. Montreal: Les Publications CRM. Palamodov VP (1990) Deformation of complex spaces. In: Gindikin SG and Khenkin GM (eds.) Several Complex Variables IV, Encyclopaedia of Mathematical Sciences, vol. 10. Berlin: Springer. Schlessinger M (1968) Functors of Artin rings. Transactions of the American Mathematical Society 130: 208–222. Wess–Zumino (1990) Covariant differential calculus on the quantum hyperplane. Nuclear Physics B, Proceedings Supplement 18B: 302–312.
Deformations of the Poisson Bracket on a Symplectic Manifold S Gutt, Universite´ Libre de Bruxelles, Brussels, Belgium S Waldmann, Albert-Ludwigs-Universita¨t Freiburg, Freiburg, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction to Deformation Quantization The framework of classical mechanics, in its Hamiltonian formulation on the motion space, employs a symplectic manifold (or more generally a Poisson manifold). Observables are families of smooth functions on that manifold M. The dynamics is defined in terms of a Hamiltonian H 2 C1 (M) and the time evolution of an observable ft 2 C1 (M R) is governed by the equation: (d=dt)ft = {H, ft }. The quantum-mechanical framework, in its usual Heisenberg’s formulation, employs a Hilbert space (states are rays in that space). Observables are families of self-adjoint operators on the Hilbert space. The dynamics is defined in terms of a Hamiltonian H, which is a self-adjoint operator,
and the time evolution of an observable At is governed by the equation dAt =dt = (i=h)[H, At ]. Quantization of a classical system is a way to pass from classical to quantum results. A first idea for quantization is to define a correspondence Q : f 7! Q(f ) mapping a function f to a self-adjoint operator Q(f) on a Hilbert space H in such a way that Q(1) = Id and [Q(f ), Q(g)] = ihQ({f , g}). Unfortunately, there is no such correspondence defined on all smooth functions on M when one puts an irreducibility requirement (which is necessary not to violate Heisenberg’s principle). Different mathematical treatments of quantization have appeared:
Geometric quantization of Kostant and Souriau: first, prequantization of a symplectic manifold (M, !) where one builds a Hilbert space and a correspondence Q defined on all smooth functions on M but with no irreducibility; second, polarization to ‘‘cut down the number of variables.’’ Berezin’s quantization where one builds on a particular class of symplectic manifolds (some
Deformations of the Poisson Bracket on a Symplectic Manifold
Ka¨hler manifolds) a family of associative algebras using a symbolic calculus, that is, a dequantization procedure. Deformation quantization introduced by Flato, Lichnerowicz, and Sternheimer in 1976 where they ‘‘suggest that quantization be understood as a deformation of the structure of the algebra of classical observables rather than a radical change in the nature of the observables.’’ This deformation approach to quantization is part of a general deformation approach to physics (a seminal idea stressed by Flato): one looks at some level of a theory in physics as a deformation of another level. Deformation quantization is defined in terms of a star product which is a formal deformation of the algebraic structure of the space of smooth functions on a Poisson manifold. The associative structure given by the usual product of functions and the Lie structure given by the Poisson bracket are simultaneously deformed. In this article we concentrate on some mathematical results concerning deformations of the Poisson bracket on a symplectic manifold, classification of star products on symplectic manifolds, group actions on star products, convergence properties of some star products, and star products on cotangent bundles.
25
! is a closed nondegenerate 2-form on M) and if u, v 2 C1 (M), the Poisson bracket of u and v is fu; vg :¼ Xu ðvÞ ¼ !ðXv ; Xu Þ where Xu denotes the Hamiltonian vector field corresponding to the function u, that is, such that i(Xu )! = du. In coordinates the components of the Poisson tensor Pij form the inverse matrix of the components !ij of !. Duals of Lie algebras form the class of linear Poisson manifolds. If g is a Lie algebra, then its dual g is endowed with the Poisson tensor P defined by P (X, Y) := ([X, Y]) for X, Y 2 g = (g ) (T g ) . Definition 2 A Poisson deformation of the Poisson bracket on a Poisson manifold (M, P) is a Lie algebra deformation of (C1 (M), { , }) which is a derivation in each argument, that is,Pof the form {u, v} = P (du, dv), where P = P þ k Pk is a series of skew-symmetric contravariant 2-tensors on M (such that [P , P ] = 0). Two Poisson deformations P and P0 of the Poisson bracket P on a Poisson manifold (M, P) are equivalent if there exists a formal path in the diffeomorphism group of M, startingP at the identity, that P is, a series T = exp D = Id þ j (1=j!)Dj for D = r1 r Dr where the Dr are vector fields on M, such that Tfu; vg ¼ fTu; Tvg0
Deformations of the Poisson Bracket on a Symplectic Manifold Definition 1 A Poisson bracket defined on the space of smooth functions on a manifold M is an R-bilinear map on C1 (M), (u, v) 7! {u, v} such that for any u, v, w 2 C1 (M): (i) {u, v} = {v, u}; (ii) {{u, v}, w} þ {{v, w}, u} þ {{w, u}, v} = 0; (iii) {u, vw} = {u, v}w þ {u, w}v. A Poisson bracket is given in terms of a contravariant skew-symmetric 2-tensor P on M (called the Poisson tensor) by {u, v} = P(du ^ dv). The Jacobi identity for the Poisson bracket is equivalent to the vanishing of the Schouten bracket [P, P] = 0. (The Schouten bracket is the extension – as a graded derivation for the exterior product – of the bracket of vector fields to skew-symmetric contravariant tensor fields.) A Poisson manifold (M, P) is a manifold M with a Poisson bracket defined by P. A particular class of Poisson manifolds, essential in classical mechanics, is the class of ‘‘symplectic manifolds.’’ If (M, !) is a symplectic manifold (i.e.,
where {u, v} = P (du, dv) and {u, v}0 = P0 (du, dv). Proposition 3 (Flato et al. 1975, Lecomte 1987). On a symplectic manifold (M, !), any Poisson deformation of the Poisson bracket corresponds to P a series of closed 2-forms on M, = ! þ r>0 r !r and is given by fu; vg ¼ P ðdu; dvÞ ¼ Xu ; Xv with i(Xu ) = du. The equivalence classes of Poisson deformations of the Poisson bracket P are parametrized by H 2 (M; R)[[]]. Poisson deformations are used in classical mechanics to express some constraints on the system. To deal with quantum mechanics, Flato et al. (1976) introduced star products. These give, by skew-symmetrization, Lie deformations of the Poisson bracket. Definition 4 A ‘‘star product’’ on (M, P) is an R_-bilinear associative product on C1 (M)_ given by X r Cr ðu; vÞ u v ¼ u v :¼ r0
26
Deformations of the Poisson Bracket on a Symplectic Manifold
for u, v 2 C1 (M) (we consider here real-valued functions; the results for complex-valued functions are similar), such that C0 (u, v) = uv, C1 (u, v) C1 (v, u) = {u, v}, 1 u = u 1 = u. When the Cr ’s are bidifferential operators on M, one speaks of a differential star product. When each Cr is a bidifferential operator of order at most r in each argument, one speaks of a natural star product. One finds in the literature other normalizations for the skew-symmetric part of C1 such as (i=2){ , }; these amount to a rescaling of the parameter . For physical applications, in the above convention for the formal parameter, corresponds to i h, where h is Planck’s constant. In the case of complex-valued functions, one can add the further requirement that the complex conjugation is a -involution for , that is, f g = g f . According to the interpretation of as being ih, we have to require = . Star products satisfying this additional property are called symmetric or Hermitian. A star product can also be defined not on the whole of C1 (M) but on a subspace N which is stable under pointwise multiplication and Poisson bracket. The simplest example of a deformation quantization is the Moyal product for the Poisson structure P on a vector space V = R m with constant coefficients: X P¼ Pij @ i ^ @ j ; Pij ¼ Pji 2 R i;j
When P is nondegenerate (so V = R2n ), the space of formal power series of polynomials on V with Moyal product is called the formal Weyl algebra W = (S(V)[[]], M ). Let g be the dual of a Lie algebra g. The algebra of polynomials on g is identified with the symmetric algebra S(g). One defines a new associative law on this algebra by a transfer of the product in the universal enveloping algebra U(g), via the bijection between S(g) and U(g) given by the total symmetrization : 1X Xð1Þ XðkÞ k! 2S k
Then U(g) = n0 Un , where Un := (Sn (g)) and we decompose an element u 2 U(g); accordingly P u = un . We define, for P 2 Sp (g) and Q 2 Sq (g), X n ðÞ 1 ðððPÞ ðQÞÞpþqn Þ ½2 PQ¼ n0
gg
R
^(X) = g u()e2ih, Xi and CBH denotes where u Campbell–Baker–Hausdorff formula for the product of elements in the group in a logarithmic chart (exp X exp Y = exp CBH(X, Y) 8 X, Y 2 g). We call this the standard (or CBH) star product on the dual of a Lie algebra. De Wilde and Lecomte (1983) proved that on any symplectic manifold there exists a differential star product. Fedosov (1994) gave a recursive construction of a star product on a symplectic manifold (M, !) constructing flat connections on the Weyl bundle. Omori et al. (1991) gave an alternative proof of existence of a differential star product on a symplectic manifold, gluing local Moyal star products. In 1997, Kontsevich gave a proof of the existence of a star product on any Poisson manifold and gave an explicit formula for a star product for any Poisson structure on V = Rm . This appeared as a consequence of the proof of his formality theorem. Fedosov’s Construction of Star Products
where @ i = @=@yi is the partial derivative in the direction of the coordinate yi , i = 1, . . . , n. The formula for the Moyal product is ðu M vÞðzÞ ¼ exp Prs @yr @y0s ðuðyÞvðy0 ÞÞy¼y0 ¼z ½1 2
: SðgÞ ! UðgÞ : X1 . . . Xk 7!
This yields a differential star product on g (Gutt 1983). This star product can be written with an integral formula (for = 2i)(Drinfeld 1987): Z ^ðXÞ^vðYÞe2ih;CBHðX;YÞi dX dY u vðÞ ¼ u
Fedosov’s construction gives a star product on a symplectic manifold (M, !), when one has chosen a symplectic connection and a sequence of closed 2-forms on M. The star product is obtained by identifying the space C1 (M)[[]] with an algebra of flat sections of the so-called Weyl bundle endowed with a flat connection whose construction is related to the choice of the sequence of closed 2-forms on M. Definition 5 The symplectic group Sp(n, R) acts by automorphisms on the formal Weyl algebra W. If (M, !) is a symplectic manifold, we can form its bundle F(M) of symplectic frames which is a principal Sp(n, R)-bundle over M. The associated bundle W = F(M) Sp(n, R) W is a bundle of associative algebras on M called the Weyl bundle. Sections of the Weyl bundle have the form of formal series X aðx; y; Þ ¼ k aðkÞi1 ...il ðxÞyi1 yil 2kþl0
where the coefficients a(k) are symmetric covariant l-tensor fields on M. The product of two sections taken pointwise makes the space of sections into an algebra, and in terms of the above representation of sections the multiplication has the form
Deformations of the Poisson Bracket on a Symplectic Manifold
ða bÞðx; y; Þ ij @ @ ¼ exp P aðx; y; Þbðx; z; Þ 2 @yi @zj y¼z Note that the center of this algebra coincides with C1 (M)[[]]. A symplectic connection on M is a linear torsionfree connection r such that r! = 0. Remark 6 It is well known that such connections always exist but, unlike the Riemannian case, are not unique. To see the existence, take any torsionfree connection r0 and set T(X, Y, Z) = (r0X !)(Y, Z). Define S by !(S(X, Y), Z) = (1=3)(T(X, Y, Z) þ T(Y, X, Z)), then rX Y = r0X Y þ S(X, Y) defines a symplectic connection. The connection r induces a covariant derivative on sections of the Weyl bundle, denoted @. The idea is to try to modify it to have zero curvature. Consider Da = @a (a) (1=)[r, a], where r is a 1-form with values in W P , with [a, a0 dx] = (a a0 0 a a)dx and (a) = (1=)[ ij !ij yi dxj , a]. Theorem 7 (Fedosov 1994). For a given series P = i1 i !i of closed 2-forms on M, there is a unique r 2 (W 1 ) satisfying some normalization condition, so that Da = @a (a) (1=)[r, a] is flat. For any a 2 C1 (M)[[]], there is a unique a in the subspace W D of flat sections of W , such that a(x, 0, ) = a0 (x, ). The use of this linear isomorphism to transport the algebra structure of W D to C1 (M)[[]] defines the star product of Fedosov r, . P Writing r, = i0 r C r , Cr only depends on !i ~ for i < r and C (u, v) = c! r (Xu , Xv ) þ Crþ1 (u, v), rþ1 where c 2 R and the last term does not depend on !r .
Classification of Star Products on a Symplectic Manifold Star products on a manifold M are examples of deformations of associative algebras (in the sense of Gerstenhaber). Their study uses the Hochschild cohomology of the algebra (here C1 (M) with values in C1 (M)) where p-cochains are p-linear maps from (C1 (M))p to C1 (M) and where the Hochschild coboundary operator maps the p-cochain C to the (p þ 1)-cochain ð@CÞðu0 ; . . . ; up Þ¼ u0 Cðu1 ; . . . ; up Þ þ
p X ð1Þr Cðu0 ; . . . ; ur1 ur ; . . . ; up Þ r¼1
þ ð1Þpþ1 Cðu0 ; . . . ; up1 Þup
27
For differential star products, we consider differential cochains given by differential operators on each argument. The associativity condition for a star product at order k in the parameter reads X ð@Ck Þðu; v; wÞ ¼ ðCr ðCs ðu; vÞ; wÞ rþs¼k;r;s>0
Cr ðu; Cs ðv; wÞÞÞ If one has cochains Cj , j < k such that the star product they define is associative to order k 1, then the right-hand side above is a cocycle (@(RHS) = 0) and one can extend the star product to order k if it is a coboundary (RHS = @(Ck )). Denoting by m the usual multiplication of functions, and writing = m þ C, where C is a formal series of multidifferential operators, the associativity also reads @C = [C, C] where the bracket on the right-hand side is the graded Lie algebra bracket on Dpoly (M)[[]] = {multidifferential operators}. Theorem 8 (Vey 1975). Every differential p-cocycle C on a manifold M is the sum of the coboundary of a differential (p 1)-cochain and a 1-differential skewsymmetric p-cocycle A: C = @B þ A. In particular, a cocycle is a coboundary if and only if its total skewsymmetrization, which is automatically 1-differential in each argument, vanishes. Given a connection r on M, B can be defined from C by universal formulas (Cahen and Gutt 1982). Also p Hdiff ðC1 ðMÞ; C1 ðMÞÞ ¼ ðp TMÞ
The similar result about continuous cochains is due to Connes (1985). In the somewhat pathological case of completely general cochains, the full cohomology is not known. Definition 9 Two star products and 0 on (M, P) are said to be equivalent if there of linear P is a series r operators on C1 (M), T = Id þ 1 T such that r r=1 Tðf gÞ ¼ Tf 0 Tg
½3
Remark that the Tr automatically vanish on constants since 1 is a unit for and for 0 . If and 0 are equivalent differential star products, then the equivalence is given by differential operators Tr ; if they are natural, is given by P the equivalence r T = Exp E with E = 1 r = 1 Er , where the Er are differential operators of order at most r þ 1. Nest and Tsygan (1995), then Deligne (1995) and Bertelson et al. (1995, 1997) proved that any differential star product on a symplectic manifold (M, !) is equivalent to a Fedosov star product and that its equivalence class is parametrized by the corresponding element in H 2 (M; R)[[]].
28
Deformations of the Poisson Bracket on a Symplectic Manifold
Kontsevich (IHES preprint 97) proved that the coincidence of the set of equivalence classes of star and Poisson deformations is true for general Poisson manifolds: Theorem 10 (Kontsevich). The set of equivalence classes of differential star products on a Poisson manifold (M, P) can be naturally identified with the set of equivalence classes of Poisson deformations of P: P = P þ P2 2 þ 2 (X, ^2 TX )[[]], [P , P ] = 0. Deligne (1995) defines cohomological classes associated to differential star products on a symplectic manifold; this leads to an intrinsic way to parametrize the equivalence class of such a differential star product. The characteristic class c( ) is given in terms of the skew-symmetric part of the term of order 2 in in the star product and in terms of local (‘‘-Euler’’) P derivations of the form D = (@=@) þ X þ r1 r D0r . This characteristic class has the following properties:
The map C from equivalence classes of star products on (M, !) to the affine space [!]= þ H 2 (M; R)_ mapping [ ] to c( ) is a bijection. The characteristic class is natural relative to diffeomorphisms and is equivariant under a change of parameter (Gutt and Rawnsley 1995). The characteristic class c( ) coincides (cf. Deligne (1995) and Neumaier (1999)) for Fedosov-type star products with their characteristic class introduced by Fedosov as the de Rham class of the curvature of the generalized connection used to build them (up to a sign and factors of 2). Index theory has been introduced in the framework of deformation quantization by Fedosov (1996) and by Nest and Tsygan (1995, 1996). We refer to the papers of Bressler, Nest, and Tsygan for further developments in that subject. A first tool in that theory is the existence of a trace for the deformed algebra; this trace is essentially unique in the framework of symplectic manifolds (an elementary proof is given in Karabegov (1998) and Gutt and Rawnsley (2003)); the trace is not unique for more general Poisson manifolds. Definition 11 A homomorphism from a differential star product on (M, P) to a differential star product 0 on (M0 , P0 ) is an R-linear map A : C1 (M) _ ! C1 (M0 )_, continuous in the -adic topology, such that Aðu vÞ ¼ Au 0 Av It is an isomorphism if the map is bijective.
Any isomorphism between two differential star products on symplectic manifolds is the combination of a change of parameter and a -linear isomorphism. Any -linear isomorphism between two star products on (M, !) and 0 on (M0 , !0 ) is the combination of the action on functions of a symplectomorphism : M0 ! M and an equivalence between and the pullback via of 0 . It exists if and only if those two star products are equivalent, that is, if and only if ( 1 ) c( 0 ) = c( ), where ( 1 ) denotes the action of 1 on the second de Rham cohomology space. In particular, a symplectomorphism of a symplectic manifold can be extended to a -linear automorphism of a given differential star product on (M, !) if and only if ( ) c( ) = c( ) (Gutt and Rawnsley 1999). The notion of homomorphism and its relation to modules has been studied by Bordemann (2004). The link between the notion of star product on a symplectic manifold and symplectic connections already appears in the seminal paper of Bayen et al. (1978), and was further developed by Lichnerowicz (1982), who showed that any Vey star product (i.e., a star product defined by bidifferential operators whose principal symbols at each order coincide with those of the Moyal star product) determines a unique symplectic connection. Fedosov’s construction yields a Vey star product on any symplectic manifold starting from a symplectic connection and a formal series of closed 2-forms on the manifold. Furthermore, any star product is equivalent to a Fedosov star product and the de Rham class of the formal 2-form determines the equivalence class of the star product. On the other hand, many star products which appear in natural contexts (e.g., cotangent bundles or Ka¨hler manifolds) are not Vey star products but are natural star products. bi
Theorem 12 (Gutt and Rawnsley 2004). Any natural star product on a symplectic manifold (M, !) determines uniquely (i) A symplectic connection r = r( ). (ii) A formal series of closed 2-forms = ( ) 2 2 (M)_. P (iii) A formal series E = r1 r Er of differential operators
r þ 1 (E2 of order 2), with P of order (k) i1 ...ik k Er u = rþ1 (E ) ri1 ...ik u, where the E(k) r r are k=2 symmetric contravariant k-tensor fields such that u v ¼ exp E ðexp EuÞ r; ðexp EvÞ
½4
We denote = r, , E. If is a diffeomorphism of M then the data for is r, , and E. In
Deformations of the Poisson Bracket on a Symplectic Manifold
particular, a vector field X is a derivation of a natural star product , if and only if L X ! = 0, L X = 0, L X r = 0, and L X E = 0.
Group Actions on Star Products Symmetries in quantum theories are automorphisms of an algebra of observables. In the framework where quantization is defined in terms of a star product, a symmetry of a star product is an automorphism of the R_-algebra C1 (M)_ with multiplication given by : ðu vÞ ¼ ðuÞ ðvÞ;
ð1Þ ¼ 1
where , being determined by what P it does on C1 (M), will be a formal series (u) = r0 r r (u) of linear maps r C1 (M) ! C1 (M). We denote by AutR_ (M, ) the set of those symmetries. Any such automorphism of then can be written as (u) = T(u 1 ), where isPa Poisson diffeomorphism of (M, P) and T = Id þ r1 r Tr is a formal series of linear maps. If is differential, then the Tr are differential operators; if is natural, P then T = Exp E with E = r1 r Er and Er is a differential operator of order at most r þ 1. If t is a one-parameter group of symmetries of the star product , then its generator D will be a derivation of . Denote the Lie algebra of -linear derivations of by DerR_ (M, ). An action of a Lie group G on a star product on a Poisson manifold (M, P) is a homomorphism : G ! AutR_ (M, ); then g = ( g )1 þ O() and there is an induced Poisson action of G on (M, P). Given a Poisson action of G on (M, P), a star product is said to be ‘‘invariant’’ under G if all the ( g )1 are automorphisms of . An action of a Lie group G on induces a homomorphism of Lie algebrasP D : g ! DerR_ (M, ). For each 2 g, D = þ r1 r Dr , where is the fundamental vector field on M defined by ; hence, ðxÞ ¼
d j ðexp tÞxÞ dt 0
Such a homomorphism D : g ! DerR_ (M, ) is called an action of the Lie algebra g on . Proposition 13 (Arnal et al. 1983). Given D : g ! DerR_ (M, ) a P homomorphism so that for each 2 g, D = þ r1 r Dr , where are the fundamental vector fields on M defined by an action of G on M and the Dr are differential operators, then there exists a local homomorphism : U G ! AutR_ (M, ) so that = D.
29
If we want the analog in our framework to the requirement that operators should correspond to the infinitesimal actions of a Lie algebra, we should ask the derivations to be inner so that functions are associated to the elements of the Lie algebra. A derivation D 2 DerR_ (M, ) is said to be essentially inner or Hamiltonian if D = (1=)ad u for some u 2 C1 (M)_. We call an action of a Lie group almost -Hamiltonian if each D is essentially inner; this is equivalent to the knowledge of a linear map : g ! C1 (M)_ 7! so that ad (1=) [ , ] = ad [,] . We say the action is -Hamiltonian if can be chosen to make g ! C1 ðMÞ_;
7!
a homomorphism of Lie algebras, where C1 (M)_ is endowed with the bracket (1=)[ , ] . Such a homomorphism is called a quantization in Arnal et al. (1983) and is called a generalized moment map in Bordemann et al. (1998). When a map 0 : g ! C1 (M) is a generalized moment map, that is, 1 0 0 0 0 ¼ 0½; the star product is said to be covariant under g. When a map : g ! C1 (M)_ is a generalized moment map, so that D has no terms in of degree > 0, thus D = , this map is called a quantum moment map (Xu 1998). Clearly in that situation, the star product is invariant under the action of g on M. Covariant star products have been considered to study representations theory of some classes of Lie groups in terms of star products. In particular, an autonomous star formulation of the theory of representations of nilpotent Lie groups has been given by Arnal and Cortet (1984, 1985). Consider a differential star product on a symplectic manifold, admitting an algebra g of vector fields on M consisting of derivations of , and assume there is a symplectic connection r which is invariant under g; then is equivalent, through an equivariant equivalence (T with L X T = 0), to a Fedosov star product r, , ; this yields to a classification of such invariant star products (Bertelson et al. 1998). Proposition 14 (Kravchenko, Gutt and Rawnsley, Mu¨ller-Bahns, Neumaier, and Hamachi). Consider a Fedosov star product r, on a symplectic manifold. A vector field X is a derivation of r, if and only if L X ! = 0, L X = 0, and L X r = 0. A vector field X is an inner derivation of = r, if
30
Deformations of the Poisson Bracket on a Symplectic Manifold
and only if L X r = 0 and there exists a series of functions X such that iðXÞ! iðXÞ ¼ d X In this case X(u) = (1=)(ad X )(u). On a symplectic manifold (M, !), a vector field X is an inner derivation of the natural star product = r, , E if and only if L X r = 0, L X E = 0, and there exists a series of functions X such that iðXÞ! iðXÞ ¼ d X Then X = (1=)ad X with X = Exp(E1 ) X . Let G be a compact Lie group of symplectomorphisms of (M, !) and g the corresponding Lie algebra of symplectic vector fields on M. Consider a star product on M which is invariant under G. The Lie algebra g consists of inner derivations for if and only if there exists a series of functions X and a representative (1=)(! ) of the characteristic class of such that i(X)! i(X) = d X . Star products which are invariant and covariant are used in the problem of reduction: this is a device in symplectic geometry which allows one to reduce the number of variables. An important issue in quantization is to know if and how quantization commutes with reduction. This problem has been studied by Fedosov for the action of a compact group on the particular star products constructed by him with trivial characteristic class ( r, 0 ). Here, one indeed obtains some ‘‘quantization commutes with reduction’’ statements. More generally, Bordemann, Herbig, and Waldmann considered covariant star products. In this case, one can construct a classical and quantum BRST complex whose cohomology describes the algebra of observables for the reduced system. While this is well known classically – at least under some regularity assumptions on the group action – for the quantized situation, the nontrivial question is whether the quantum BRST cohomology is ‘‘as large as’’ the classical one. Clearly, from the physical point of view, this is crucial. It turns out that whereas for strongly invariant star products one indeed obtains a quantization of the reduced phase space, in general the quantum BRST cohomology might be too small. More general situations of reduction have also been discussed by, for example, Bordemann as well as Cattaneo and Felder, when a coisotropic (i.e., first class) constraint manifold is given.
Convergence of Some Star Products on a Subclass of Functions Let (M, P) be a Poisson manifold and let be a differential star product on it with 1 acting as the identity. Observe that if there exists a value k of such that uv¼
1 X
r Cr ðu; vÞ
r¼0
converges (for the pointwise convergence of functions), for all u, v 2 C1 (M), to Fk (u, v) in such a way that Fk is associative, then Fk (u, v) = uv. This is easy to see as the order of differentiation in the Cr necessarily is at least r in each argument and thus the Borel lemma immediately gives the result. So assuming ‘‘too much’’ convergence kills all deformations. On the other hand, in any physical situation, one needs some convergence properties to be able to compute the spectrum of quantum observables in terms of a star product (as in Bayen et al. 1978). In the example of Moyal star product on the symplectic vector space (R 2n , !), the formal formula bi
ðu M vÞðzÞ ¼ exp
2
Prs @xr @ys ðuðxÞvðyÞÞ
x¼y¼z
obviously converges when u and v are polynomials. On the other hand, there is an integral formula for Moyal star product given by Z ðu vÞðÞ ¼ ðhÞ2n uð0 Þvð00 Þ 2i exp !ð; 00 Þ þ !ð00 þ 0 Þ h 0 þ !ð ; Þ d0 d00 and this product gives a structure of associative algebra on the space of rapidly decreasing functions I (R2n ). The formal formula converges (for = i h) in the topology of I 0 for u and v with compactly supported Fourier transform. Some works have been done about convergence of star products.
The method of quantization of Ka¨hler manifolds due to Berezin as the inverse of taking symbols of operators, to construct on Hermitian symmetric spaces star products which are convergent on a large class of functions on the manifold (Moreno, Cahen Gutt, and Rawnsley, Karabegov, Schlichenmaier).
Deformations of the Poisson Bracket on a Symplectic Manifold
The constructions of operator representations of star products (Fedosov, Bordemann, Neumaier, and Waldmann). The work of Rieffel and the notion of strict deformation quantization. Examples of strict (Fre´chet) quantization have been given by Omori, Maeda, Niyazaki, and Yoshioka, and by Bieliavsky. Convergence of Berezin-Type Star Products on Hermitian Symmetric Spaces
The method to construct a star product involves making a correspondence between operators and functions using coherent states, transferring the operator composition to the symbols, introducing a suitable parameter into this Berezin composition of symbols, taking the asymptotic expansion in this parameter on a large algebra of functions, and then showing that the coefficients of this expansion satisfy the cocycle conditions to define a star product on the smooth functions (Cahen et al. 1995). The idea of an asymptotic expansion appears in Berezin (1975) and in Moreno and OrtegaNavarro (1983, 1986). This asymptotic expansion exists for compact M, and defines an associative multiplication on formal power series in k1 with coefficients in C1 (M) for compact coadjoint orbits. For M a Hermitian symmetric space of compact type and more generally for compact coadjoint orbits (i.e., flag manifolds), this formal power series converges on the space of symbols (Karabegov 1998). For general Hermitian symmetric spaces of noncompact type, using their realization as bounded domains, one defines an analogous algebra of symbols of polynomial differential operators. Reshetikhin and Takhtajan have constructed an associative formal star product given by an asymptotic expansion on any Ka¨hler manifold. This they do in two steps, first building an associative product for which 1 is not a unit element, then passing to a star product. We denote by (L,r, h) a quantization bundle for the Ka¨hler manifold (M, !, J) (i.e., a holomorphic line bundle L with connection r admitting an invariant Hermitian structure h, such that the curvature is curv(r) =2i!). We denote by H the Hilbert space of square-integrable holomorphic sections of L which we assume to be nontrivial. The coherent states are vectors eq 2 H such that sðxÞ ¼ hs; eq iq;
8q 2 L x ; x 2 M; s 2 H
(L is the complement of the zero section in L). The function (x) = jqj2 keq k2 , q 2 L x , is well defined and real analytic.
31
Let A : H ! H be a bounded linear operator and let hAeq ; eq i b ; AðxÞ ¼ heq ; eq i
q 2 L x; x 2 M
b has an analytic be its symbol. The function A continuation to an open neighborhood of the given by diagonal in M M b yÞ ¼ hAeq0 ; eq i ; Aðx; heq0 ; eq i
q 2 L x ; q0 2 L y
which is holomorphic in x and antiholomorphic in y. ^ We denote by EðLÞ the space of symbols of bounded operators on H . We can extend this definition of symbols to some unbounded operators provided everything is well defined. The composition of operators on H gives rise to an associative product for the corresponding symbols: b BÞðxÞ b ðA ¼
Z
n b yÞBðy; b xÞ ðx; yÞ ðyÞ ! ðyÞ Aðx; n! M
where ðx; yÞ ¼
jheq0 ; eq ij2 keq0 k2 keq k2
;
q 2 L x ; q0 2 L y
is a globally defined real analytic function on M M provided has no zeros ( (x, y) 1 everywhere, with equality where the lines spanned by eq and eq0 coincide). Let k be a positive integer. The bundle (Lk = k L, k r , hk ) is a quantization bundle for (M, k!, J) and we denote by H k the corresponding space of ^ k ) the space of holomorphic sections and by E(L symbols of linear operators on H k . We let (k) be the corresponding function. We say that the quantization is regular if (k) is a nonzero constant for all non-negative k and if (x, y) = 1 implies x = y. (Remark that if the quantization is homogeneous, all (k) are constants.) Theorem 15 (Cahen et al.). Let (M, !, J) be a Ka¨hler manifold and (L,r, h) be a regular quanb B b be in B , where tization bundle over M. Let A, B C1 (M) consists of functions f which have an so that f (x, y) analytic continuation in M M (x, y)l is globally defined, smooth and bounded on K M and M K for each compact subset K of M for some positive power l. Then b k BÞðxÞ b ðA ¼
Z
b yÞBðy; b xÞ Aðx; M
k
ðx; yÞ ðkÞ kn
!n ðyÞ ðyÞ n!
32
Deformations of the Poisson Bracket on a Symplectic Manifold
defined for k sufficiently large, admits an asymptotic expansion in k1 as k ! 1 b k BÞðxÞ b ðA
X
b BÞðxÞ b kr Cr ðA;
r0
and the cochains Cr are smooth bidifferential operators, invariant under the automorphisms of the quantization and determined by the geometry b B) bB b B) b =A b and C1 (A, b alone. Furthermore, C0 (A, b = (i=){A, b B}. b A) b C1 (B, If M is a flag manifold, this defines a star product on C1 (M) and the k product of two symbols is convergent (it is a rational function of k without pole at infinity) (cf. Karabegov in that generality). If D be a bounded symmetric domain and E the algebra of symbols of polynomial differential operators on a homogeneous holomorphic line bundle L over D which gives a realization of a holomorphic discrete series representation of G0 , then for f and g in E the Berezin product f k g has an asymptotic expansion in powers of k1 which converges to a rational function of k. The coefficients of the asymptotic expansion are bidifferential operators which define an invariant and covariant star product on C1 (D ).
Star Products on Cotangent Bundles Since from the physical point of view cotangent bundles : T Q ! Q over some configuration space Q, endowed with their canonical symplectic structure !0 , are one of the most important phase spaces, any quantization scheme should be tested and exemplified for this class of classical mechanical systems. We first recall that on T Q there is a canonical vector field , the Euler or Liouville vector field which is locally given by = pk (@=@pk ). Here and in the following, we use local bundle coordinates (qk , pk ) induced by local coordinates xk on Q. Using we can characterize those functions f 2 C1 (T Q) which are polynomial in the fibers of degree k by f = kf . They are denoted by Polk (T Q), whereas Pol (T Q) denotes the subalgebra of all functions which are polynomial in the fibers. Clearly, most of the physically relevant observables such as the kinetic energy, potentials, and generators of point transformations are in Pol (T Q). Moreover, Pol (T Q) is a Poisson subalgebra with n o Polk ðT QÞ; Pol‘ ðT QÞ Polkþ‘1 ðT QÞ ½5 since L !0 = !0 is conformally symplectic.
All this suggests that for a quantization of T Q, the polynomials Pol (T Q) should play a crucial role. In deformation quantization this is accomplished by the notion of a homogeneous star product (De Wilde and Lecomte 1983). If the operator H¼
@ þ L @
½6
is a derivation of a formal star product ?, then ? is called homogeneous. It immediately follows that Pol(T Q)[] C1 (T Q)[[]] is a subalgebra over the ring C[] of polynomials in . Hence for homogeneous star products, the question of convergence (in general quite delicate) has a simple answer. Let us now describe a simple construction of a homogeneous star product (following Bordemann et al. (1998)). We choose a torsion-free connection r on Q and consider the operator of the symmetrized covariant derivative, locally given by D ¼ dxk _ r@=@xk : 1 ðSk T QÞ ! 1 ðSkþ1 T QÞ ½7 Clearly, D is a global L object and a derivation of the 1 1 k symmetric algebra k = 0 (S T Q). Let now 1 f 2 Pol (T Q) and 2 C (Q) be given. Then one defines the standard-ordered quantization %Std (f ) of f with respect to r to be the differential operator %Std (f ) : C1 (Q) ! C1 (Q) locally given by 1 X ðÞr 1 @rf %Std ðf Þ ¼ r! @p @p k1 kr r¼0 p¼0
is
@ @ 1 r D is k k 1 r r! @x @x
½8
where is denotes the symmetric insertation of vector fields in symmetric forms. Again, this is independent of the coordinate system xk . The infinite sum is actually finite as long as f 2 Pol (T Q) whence we can safely set = ih in this case. Indeed, [8] is the well-known symbol calculus for differential operators and it establishes a linear bijection %Std : Pol ðT QÞ ! DiffOpðC1 ðQÞÞ
½9
which generalizes the usual canonical quantization in the flat case of T Q = T Rn = R2n . Using this linear bijection, we can define a new product ?Std for Pol (T Q) by f ?Std g ¼ %1 Std ð%Std ðf Þ%Std ðgÞÞ ¼
1 X
r Cr ðf ; gÞ ½10
r¼0
It is now easy to see that ?Std fulfills all requirements of a homogeneous star product except for the fact that the Cr ( , ) are bidifferential. In this approach
Deformations of the Poisson Bracket on a Symplectic Manifold
this is far from being obvious as we only worked with functions polynomial in the fibers so far. Nevertheless, it is true whence ?Std indeed defines a star product for C1 (T Q)[[]]. In fact, there is a different characterization of ?Std using a slightly modified Fedosov construction: first one uses r to define a torsion-free symplectic connection on T Q by a fairly standard lifting. Moreover, using r one can define a standardordered fiberwise product Std for the formal Weyl algebra bundle over T Q, being the starting point of the Fedosov construction of star products. With these two ingredients one finally obtains ?Std from the Fedosov construction with the big advantage that now the order of differentiation in the Cr can easily be determined to be r in each argument, whence ?Std is even a natural star product. Moreover, Cr differentiates the first argument only in momentum directions which reflects the standard ordering. Already in the flat situation the standard ordering is not an appropriate quantization scheme from the physical point of view as it maps real-valued functions to differential operators which are not symmetric in general. To pose this question in a geometric framework, we have to specify a positive density 2 1 (jn jT Q) on the configuration space Q first, as for functions there is no invariant meaning of integration. Specifying we can consider the pre-Hilbert space C1 0 (Q) with inner product Z h ; i ¼
½11 Q
Now the adjoint with respect to [11] of %Std (f ) can be computed explicitly. We first consider the second-order differential operator ¼
@2 @2 @ k þ p þ kk‘ k ‘m k @p‘ @p‘ @pm @q @pk
½12
where k‘m are the Christoffel symbols of r. In fact, is defined independently of the coordinates and coincides with the Laplacian of the pseudoRiemannian metric on T Q which is obtained from the natural pairing of vertical and horizontal spaces defined by using r. Moreover, we need the 1-form defined by rX = (X) and the corresponding vertical vector field v 2 1 (T(T Q)) locally given by v = k (@=@pk ). Then %Std ðf Þy ¼ %Std ðN 2 f Þ;
v
N ¼ eð=2Þðþ
Þ
½13
Note that due to the curvature contributions, this statement is a highly nontrivial partial integration compared to the flat case. Note also that for
33
f 2 Pol(T Q)[], we have Nf 2 Pol(T Q)[] as well, and N commutes with H. As in the flat case this allows one to define a Weyl-ordered quantization by %Weyl ðf Þ ¼ %Std ðNf Þ
½14
together with a so-called Weyl-ordered star product f ? Weyl g ¼ N 1 ðNf ?Std NgÞ
½15
which is now a Hermitian and homogeneous star product such that % Weyl becomes a -representation of ? Weyl , that is, we have %Weyl (f ? Weyl g) = % Weyl (f ) % Weyl (g) and % Weyl (f )y = % Weyl (f ). Note that in the flat case this is precisely the Moyal star product M from [1]. The star products ?Std and ? Weyl have been extensively studied by Bordemann, Neumaier, Pflaum, and Waldmann and provide now a wellunderstood quantization on cotangent bundles. We summarize a few highlights of this theory: 1. In the particular case of a Levi-Civita connection r for some Riemannian metric g and the corresponding volume density g , the 1-form vanishes. This simplifies the operator N and describes the physically most interesting situation. 2. If the configuration space is a Lie group G, then its cotangent bundle T G ffi G g is trivial by using, for example, left-invariant 1-forms. In this case the star products ? Weyl and ?Std restrict to the CBH star product on g . Moreover, ? Weyl coincides with the star product found by Gutt (1983) on T G. 3. Using the operator N one can interpolate between the two different ordering descriptions %Std and % Weyl by inserting an additional ordering parameter in the exponent, that is, N = exp(( þ v )). Thus, one obtains -ordered representations % together with corresponding -ordered star products ? , where = 0 corresponds to standard ordering and = 1=2 corresponds to Weyl ordering. For = 1, one obtains antistandard ordering and in general one has the relation f ? g = g ?1 f as well as % (f )y = %1 (f ). 4. One can describe also the quantization of an electrically charged particle moving in a magnetic background field B. This is modeled by a closed 2-form B 2 1 (2 T Q) on Q. Using local vector potentials A 2 1 (T Q) with B = dA locally, and by minimal coupling, one obtains a star product ?B which depends only on B and not on the local potentials A. It will be equivalent to ? Weyl if and only if B is exact. In general, its characteristic class is, up to a factor, given by the class [B] of the magnetic field B. While the observable
34
@-Approach to Integrable Systems
algebra always exists, a Schro¨dinger-like representation of ?B only exists if B satisfies the usual integrality condition. In this case, there exists a representation on sections of a line bundle whose first Chern class is given by [B]. This manifests Dirac’s quantization condition for magnetic charges in deformation quantization. Another equivalent interpretation of this result is obtained by Morita theory: the star products ? Weyl and ?B are Morita equivalent if and only if B satisfies Dirac’s integrality condition. 5. Analogously, one can determine the unitary equivalence classes of representations for a fixed, exact magnetic field B. It turns out that the representations depend on the choice of the global vector potential A and are unitarily equivalent if the difference between the two vector potentials satisfies an integrality condition known from the Aharonov–Bohm effect. This way, the Aharonov– Bohm effect can be formulated within the representation theory of deformation quantization. 6. There are several variations of the representations %Std and %Weyl . In particular, one can construct a representation on half-forms instead of functions, thereby avoiding the choice of the integration density . Moreover, all the Weylordered representations can be understood as GNS representations coming from a particular positive functional, the Schro¨dinger functional. For %Weyl this functional is just the integration over the configuration space Q. 7. All the (formal) star products and their representations can be understood as coming from formal asymptotic expansions of integral formulas. From this point of view, the formal representations and
star products are a particular kind of global symbol calculus. 8. At least for a projectible Lagrangian submanifold L of T Q, one finds representations of the star product algebras on the functions on L. This leads to explicit formulas for the WKB expansion corresponding to this Lagrangian submanifold. 9. The relation between configuration space symmetries, the corresponding phase-space reduction, and the reduced star products has been analyzed extensively by Kowalzig, Neumaier, and Pflaum. See also: Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups; Deformation Quantization; Deformation Quantization and Representation Theory; Deformation Theory; Fedosov Quantization; Operads.
Further Reading Bayen F, Flato M, Frønsdal C, Lichnerowicz A, and Sternheimer D (1978) Deformation theory and quantization. Annals of Physics 111: 61–151. Cattaneo A (Notes By Indelicato D) Formality and star products. In: Gutt S, Rawnsley J, and Sternheimer D (eds.) Poisson Geometry, Deformation Quantisation and Group Representations. LMS Lecture Note Series 323, pp. 79–144. Cambridge: Cambridge University Press. Dito G and Sternheimer D (2002) Deformation quantization: genesis, developments and metamorphoses. In: Halbout G (ed.) Deformation Quantization, IRMA Lectures in Mathematics and Theoretical Physics, vol. 1, pp. 9–54. Berlin: Walter de Gruyter. Gutt S (2000) Variations on deformation quantization. In: Dito G and Sternheimer D (eds.) Confe´rence Moshe´ Flato 1999. Quantization, Deformations, and Symmetries, Mathematical Physics Studies, vol. 21, pp. 217–254. Dordrecht: Kluwer Academic. Waldmann S (2005) States and representations in deformation quantization. Reviews in Mathematical Physics 17: 15–75.
@-Approach to Integrable Systems P G Grinevich, L D Landau Institute for Theoretical Physics, Moscow, Russia ª 2006 Elsevier Ltd. All rights reserved.
Introduction The @-approach is one of the most generic methods for constructing solutions of completely integrable systems. Taking into account that most soliton systems are represented as compatibility condition for a set of linear differential operators (Lax pairs, zero-curvature representations, L–A–B Manakov triples), it is sufficient to construct these operators.
Such compatible families can be defined by presenting their common eigenfunctions. If it is possible to show that some analytic constraints imply that a function is a common eigenfunction of a family of operators, solutions of original nonlinear system are also generated. The main idea of the @ method is to impose the following analytic constraints: if denotes the spectral parameter and x the physical variables, then, for arbitrary fixed values x, the @ derivative of the wave function is expressed as a linear combination of the wave functions at other values of with x-independent coefficients. In specific examples, this property is either derived from the
@-Approach to Integrable Systems 35
direct spectral transform or imposed a priori. Of course, the specific realization of this scheme depends critically on the nonlinear system. The origin of the @-method came from the following observation. A solution of the onedimensional inverse-scattering problem (the problem of reconstructing the potential by discrete spectrum and scattering amplitude at positive energies) for the one-dimensional time-independent Schro¨dinger operator L ¼ @x2 þ uðxÞ
½1
was obtained by Gelfand, Levitan, and Marchenko in the 1950s. It essentially used analytic continuation of the wave function from the real momenta to the complex ones. If the potential u(x) decays sufficiently fast as jxj ! 1, then the eigenfunction equation L ðk; xÞ ¼ k2 ðk; xÞ has two solutions
þ (k, x)
and
½2
(k, x)
such that
1. (k, x) = (1 þ o(1))eikx as x ! 1. 2. The functions þ (k, x), (k, x) are holomorphic in k in the upper half-plane and the lower halfplane, respectively. Existence of analytic continuation to complex momenta is typical for one-dimensional systems. But in the multidimensional case the situation is different. For example, wave functions for the mutlidimensional Schro¨dinger operator constructed by Faddeev are well defined for all complex momenta k, but they are nonholomorphic in k, and they become holomorphic only after restriction to some special one-dimensional subspaces. The last property was one of the key points in the Faddeev approach. Beals and Coifman (1981–82) and Ablowitz et al. (1983) discovered that departure from holomorphicity for multidimensional wave functions can be interpreted as spectral data. Such spectral transforms proved to be very natural and suit perfectly the purposes of the soliton theory. Some other famous methods, including the Riemann–Hilbert problem, can be interpreted as special reductions of the @ method. bi
2. An n n matrix-valued function g(, x) (it describes the dynamics) such that
g(, x) depends on the spectral parameter 2 C and ‘‘physical’’ variables x = (x1 , . . . , xN ); the physical variables xk are either continuous (xk belongs to a domain in R or in C) or discrete (xk takes integer values); g(, x) is analytic in , defined for all 2 C, except for a finite number of singular points, and is single valued; and det g(, x) has only finite number of zeros. For problems with continuous physical variables the P typical form of g(, x) is g(, x) = exp( i xj Kj ()), where Kj () are meromorphic matrices, mutually commuting for all . The discrete variables are usually encoded in orders of poles and zeros. 3. An n n matrix-valued function R(, ) – the ‘‘generalized spectral data.’’ Usually, it is a regular function of four real variables > jxj jyj > > > > > < if xy > 0 ¼ pffiffiffi pffiffiffi pffiffiffi pffiffiffi > pffiffiffi Jjxj1=2 ð2 ÞJjyj1=2 ð2 ÞJjxjþ1=2 ð2 ÞJjyjþ1=2 ð2 Þ > > > > xy > > > : if xy < 0
½18 where Jx ( ) is the Bessel function of order x. One can observe that the kernel K(x, y) is not Hermitian, but the restriction of this kernel to the positive and negative semiaxis is Hermitian. M is a special case of an infinite parameter family of probability measures on Par, called the Schur measures, and defined as 1 s ðxÞs ðyÞ Z
½19
where s are the Schur functions, x = (x1 , x2 , . . . ) and y = (y1 , y2 , . . . ) are parameters such that X Y Z¼ s ðxÞs ðyÞ ¼ ð1 xi yj Þ1 ½20 2Par
{xi }1 i=1
i;j
= {yi }1 i = 1.
It was shown by is finite and Okounkov (2001), that the Schur measures belong to the class of the determinantal random point fields. bi
t1 ðx1 ; . . . ; xð1Þ n Þ¼
½17
M is called the Poissonization of the measures Mn . The analysis of the asymptotic properties of Mn and M has been important in connection to the famous Ulam problem and related questions in representation theory. It was shown by Borodin and Okounkov (2000), and, independently, Johansson (2001) that M is a determinantal random point field. The corresponding correlation kernel K (in the so-called modified Frobenius coordinates) is a so-called discrete Bessel kernel on Z1 ,
MðÞ ¼
probability density of their joint distribution at time t1 > 0, given that their paths have not intersected for all 0 t t1 , is equal to
NonIntersecting Paths of a Markov Process
Let pt, s (x, y) be the transition probability of a Markov process (t) on R with continuous trajectories and let (1 (t), 2 (t), . . . , n (t)) be n independent copies of the process. A classical result of Karlin and McGregor (1959) states that if n particles start at (0) (0) the positions x(0) then the 1 < x2 < < xn ,
1 ð0Þ ð1Þ detðp0;t1 ðxi ; xj ÞÞni;j¼1 Z
provided the process (1 (t), 2 (t), . . . , n (t)) in Rn has a strong Markovian property. Let 0 < t1 < t2 < < tMþ1 . The conditional probability density that the particles are in the (1) (1) positions x(1) at time t1 , at 1 < x2 < < xn (2) (2) the positions x1 < x2 < < x(2) n at time t2 , . . . , at the positions x1(M) < x(M) < < x(M) at time tM , n 2 given that at time tMþ1 they are at the positions x(Mþ1) < x(Mþ1) < < x(Mþ1) and their paths have n 1 2 not intersected, is then equal to ð1Þ
t1 ;t2 ;...;tM ðx1 ; . . . ; xðMÞ n Þ ¼
M 1 Y ðlÞ ðlþ1Þ n detðptl ;tlþ1 ðxi ; xj ÞÞi;j¼1 Zn ; M l¼0
½21
where t0 = 0. It is not difficult to show that [21] can be viewed as a determinantal random point process (see, e.g., Johansson (2003). The formulas of a similar type also appeared in the papers by Johansson, Pra¨hofer, Spohn, Ferrari, Forrester, Nagao, Katori, and Tanemura in the analysis of polynuclear growth models, random walks on a discrete circle, and related problems. bi
Ergodic Properties As before, let (X, B, Pr ) be a random point field with a one-particle space E. Hence, X is a space of the locally finite configurations of particles in E, B a Borel -algebra of measurable subsets of X, and Pr a probability measure on (X, B). Throughout this section, we assume E = Rd or Zd . We define an action {T t }t2E of the additive group E on X in the following natural way: T t : X ! X; ðT t Þi ¼ ðÞi þ t
½22
We recall that a random point field (X, B, P) is called translation invariant if, for any A2 B, any t 2 E, Pr (T t A) = Pr (A). The translation invariance of the correlation kernel K(x, y) = K(x y, 0) =: K(x y) implies the translation invariance of k-point correlation functions k ðx1 þ t; . . . ; xk þ tÞ ¼ k ðx1 ; . . . ; xk Þ a:e: k ¼ 1; 2; . . . ; t 2 E
½23
which, in turn, implies the translation invariance of the random point field. The ergodic properties
52
Determinantal Random Fields
of such point fields were studied by several mathematicians (Soshnikov 2000, Shirai and Takahashi, 2003, Lyons and Steif 2003). The first general result in this direction was obtained by Soshnikov (2000). bi
bi
bi
bi
Theorem 2 Let (X, B, P) be a determinantal random point field with a translation-invariant correlation kernel. Then the dynamical system (X, B, P, {T t }) is ergodic, has the mixing property of any multiplicity and its spectra is absolutely continuous. We refer the reader to the article on ergodic theory for the definitions of ergodicity, mixing property, absolute continuous spectrum of the dynamical system, etc. In the discrete case [15], E = Zd , more is known. Lyons and Steif (2003) proved that the shift dynamical system is Bernoulli, that is, it is isomorphic (in the ergodic theory sense) to an i.i.d. process. Under P the additional conditions Spec(K) (0, 1) and n jnjjK(n)j2 < 1, Shirai and Takahashi (2003a) proved the uniform mixing property. bi
bi
Gibbsian Properties Costin and Lebowitz (1995) were the first to question the Gibbsian nature of the determinantal random point fields; they studied the continuous determinantal random point process on R1 with a so-called sine correlation kernel Kðx; yÞ ¼
sinððx yÞÞ ðx yÞ
The first rigorous result (in the discrete case) was established by Shirai and Takahashi (2003b).
mathematical expectation of the integrable function F on (X, B, P) with respect to the -algebra Bc . The potential U is uniquely defined by the values of U(x, ), as follows from the following recursive relation: Uðfx1 ; . . . ; xn gjÞ ¼ Uðxn jfx1 ; . . . ; xn1 g [ Þ þ Uðxn1 jfx1 ; . . . ; xn2 g [ Þ þ þ Uðx1 jÞ For additional information about the Gibbsian property, see Introductory Articles: Equilibrium Statistical Mechanics. Much less is known in the continuous case. Some generalized form of Gibssianness, under quite restrictive conditions, was recently established by Georgii and Yoo (2004).
Central Limit Theorem for Counting Function In this section, we discuss the central-limit theorem type results for the linear statistics. The first important result in this direction was established by Costin and Lebowitz in 1995, who proved the central-limit theorem for the number of particles in the growing box, #[L, L] , L ! 1, in the case of the determinantal random point process on R 1 with the sine correlation kernel Kðx; yÞ ¼
sinððx yÞÞ ðx yÞ
Below we formulate the Costin–Lebowitz theorem in its general form due to Soshnikov (1999, 2000).
bi
Theorem 3 Let E be a countable discrete space and K a symmetric bounded operator on l2 (E). Assume that Spec(K) (0, 1). Then (X, B, P) is a Gibbs measure with the potential U given by U(xj)= log(J(x,x) hJ1 jx ,jx i), where x 2 E, 2 X, {x} \ =;. Here J(x,y) stands for the kernel of the operator J =(Id K)1 K, and we set J =(J(y,z))y,z2 and jx =(J(x,y))y2 . We recall that the Gibbsian property of the probability measure P on (X, B) means that E½FjBc ðÞ ¼
1 X Uð jc Þ e Fð [ c Þ Z;
where is a finite subset of E, Bc is the -algebra generated by the B-measurable functions with the support outside of , E[FjBc ] is the conditional
Theorem 4 Let E be a one-particle space, {0 Kt 1} a family of locally trace-class operators in L2 (E), {(X, B, Pt )} a family of the corresponding determinantal random point fields in E, and {It } a family of measurable subsets in E such that Var#It ¼ trðKt It ðKt It Þ2 Þ ! 1
as t ! 1
½24
Then the distribution of the normalized number of particles in It (with respect to Pt ) converges to the normal law, that is, #It E#It w pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! Nð0; 1Þ Var#It An analogous result holds for the joint distribution of the counting functions {#It1 , . . . , #Itk }, where It1 , . . . , Itk are disjoint measurable subsets in E.
Determinantal Random Fields 53
The proof of the Costin–Lebowitz theorem uses the k-point cluster functions. In the determinantal case, the cluster functions have a simple form rk ðx1 ; . . . ; xk Þ 1X Kðxð1Þ ; xð2Þ ÞKðxð2Þ ; xð3Þ Þ ¼ ð1Þl l 2S k
KðxðkÞ ; xð1Þ Þ
½25
The importance of the cluster function stems from the fact that the integrals of the k-point cluster function over the k-cube with a side I can be expressed as a linear combination of the first k cumulants of the counting random variable #I . In other words, Z rk ðx1 ; . . . ; xk Þ dx1 dxk I...I k X
¼
In the special case = (1n ) (i.e., consists of n parts, all of which equal to 1), one obtains that () = (1) , and K [x1 , . . . , xn ] = det(K(xi , xj )). Therefore, in the case = (1n ) the random point process with the correlation functions [27] is a determinantal random point process. When = (n) (i.e., the permutation has only one part, namely n) we have = 1 identically, and K [x1 , . . . , xn ] = per(K(xi , xj )), the permanent of the matrix K(xi , xj ). The corresponding random point process is known as the boson random point process. Pfaffian Processes
Let Kðx; yÞ ¼
kl Cl ð#I Þ
½26
l¼1
It follows from [25] that the integral at the LHS of [26] equals, up to a factor (1)l (l 1)!, to the trace of the kth power of the restriction of K to I. This allows one to estimate the cumulants of the counting random variable #I . For details, we refer the reader to Soshnikov (2000). The central-limit theorem for a general class of linear statistics, under some technical assumptions on the correlation kernel was proved in Soshnikov (2002). Finally, we refer the reader to Soshnikov (2000) for the functional central-limit theorem for the empirical distribution function of the nearest spacings. bi
bi
bi
Generalizations: Immanantal and Pfaffian Point Processes In this section, we discuss two important generalizations of the determinantal point processes. Immanantal Processes
Immanantal random point processes were introduced by P Diaconis and S N Evans in 2000. Let be a partition of n. Denote by the character of the corresponding irreducible representation of the symmetric group Sn . Let K(x, y), be a non-negative-definite, Hermitian kernel. An immanantal random point process is defined through the correlation functions k ðx1 ; . . . ; xk Þ ¼
X 2Sn
ðÞ
n Y
Kðxi ; xðiÞ Þ
½27
i¼1
In other words, the correlation functions are given by the immanants of the matrix with the entries K(xi , xj ). We will denote the RHS of [27] by K [x1 , . . . , xn ].
K11 ðx; yÞ K12 ðx; yÞ K21 ðx; yÞ K22 ðx; yÞ
be an antisymmetric 2 2 matrix-valued kernel, that is, Kij (x, y) = Kji (y, x), i, j = 1, 2. The Lkernel defines an integral operator acting on L2 (E) L2 (E), which we assume to be locally trace class. A random point process on E is called Pfaffian if its point correlation functions have a Pfaffian form k ðx1 ; . . . ; xk Þ ¼ pfðKðxi ; xj ÞÞi;j¼1;...;k ;
k 1 ½28
The RHS of [28] is the Pfaffian of the 2k 2k antisymmetric matrix (since each entry K(xi , xj ) is a 2 2 block). Determinantal random point processes is a special case of the Pfaffian processes, corresponding to the matrix kernel of the form ~ yÞ 0 Kðx; Kðx; yÞ ¼ ~ xÞ Kðy; 0 ~ is a scalar kernel. The most well known where K examples of the Pfaffian random point processes, that cannot be reduced to determinantal form are = 1 and = 4 polynomial ensembles of random matrices and their limits (in the bulk and at the edge of the spectrum), as the size of a matrix goes to infinity.
Acknowledgment The research of A Soshnikov was supported in part by the NSF grant DMS-0405864. See also: Dimer Problems; Ergodic Theory; Growth Processes in Random Matrix Theory; Integrable Systems in Random Matrix Theory; Percolation Theory; Quantum Ergodicity and Mixing of Eigenfunctions; Random Matrix Theory in Physics; Random Partitions; Statistical Mechanics and Combinatorial Problems; Symmetry Classes in Random Matrix Theory.
54
Diagrammatic Techniques in Perturbation Theory
Further Reading Borodin A and Olshanski G (2000) Distribution on partitions, point processes, and the hypergeometric kernel. Communications in Mathematical Physics 211: 335–358. Daley DJ and Vere-Jones D (1988) An Introduction to the Theory of Point Processes. New York: Springer. Diaconis P and Evans SN (2000) Immanants and finite point processes. Journal of Combinatorial Theory Series A 91(1–2): 305–321. Georgii H-O and Yoo HJ (2005) Conditional intensity and Gibbsianness of determinantal point processes. Journal of Statistical Physics 118(91/92): 55–84. Johansson K (2003) Discrete polynuclear growth and determinantal processes. Communications in Mathematical Physics 242(1–2): 277–329. Lyons R (2003) Determinantal probability measures. Publications Mathe´matiques Institut de Hautes E´tudes Scientifiques 98: 167–212. Lyons R and Steif J (2003) Stationary determinantal processes: phase multiplicity, Bernoillicity, entropy, and domination. Duke Mathematical Journal 120(3): 515–575. Macchi O (1975) The coincidence approach to stochastic point processes. Advances in Applied Probability 7: 83–122.
Okounkov A (2001) Infinite wedge and random partitions. Selecta Mathematica. New Series 7(1): 57–81. Shirai T and Takahashi Y (2003a) Random point fields associated with certain Fredholm determinants, I. Fermion, Poisson and boson point processes. Journal of Functional Analysis 205(2): 414–463. Shirai T and Takahashi Y (2003b) Random point fields associated with certain Fredholm determinants, II. Fermion shifts and their ergodic and Gibbs properties. The Annals of Probability 31(3): 1533–1564. Soshnikov A (2000) Determinantal random point fields. Russian Mathematical Surveys 55: 923–975. Soshnikov A (2002) Gaussian limit for determinantal random point fields. The Annals of Probability 30(1): 171–187. Soshnikov A (2004) Janossy densities of coupled random matrices. Communications in Mathematical Physics 251: 447–471. Spohn H (1987) Interacting Brownian particles: a study of Dyson’s model. In: Papanicolau G (ed.) Hydrodynamic Behavior and Interacting Particle Systems, IMA Vol. Math. Appl. vol. 9, pp. 157–179. New York: Springer. Tracy CA and Widom H (1998) Correlation functions, cluster functions, and spacing distributions for random matrices. Journal of Statistical Physics 92(5/6): 809–835.
Diagrammatic Techniques in Perturbation Theory G Gentile, Universita` degli Studi ‘‘Roma Tre’’, Rome, Italy ª 2006 Elsevier Ltd. All rights reserved.
solutions of the linear equation u˙ = M(t)u), we can write Z t þ WðtÞ d W 1 ð Þð Þ ½3 UðtÞ ¼ WðtÞU 0
Introduction Consider the dynamical system on Rd described by the equation u_ ¼
du ¼ GðuÞ þ "FðuÞ dt
Mij ðtÞ ¼ @ui Gj ðu0 ðtÞÞ
UðtÞ ¼
1 X
"k UðkÞ
½4
k¼1
½1
where F, G : S Rd ! Rd are analytic functions and " a real (small) parameter. Suppose also that for " = 0 a solution u0 : R ! S (for some initial condi) is known. tion u0 (0) = u We look for a solution of [1] which is a perturbation of u0 , that is, for a solution u which can be written in the form u = u0 þ U, with u(0) u . Then we conU = O(") and U(0) = U sider the variational equation _ ¼ MðtÞU þ ðtÞ; U
If we expect the solution U to be of order ", we can try to write it as a Taylor series in ", that is,
½2
˜ 0 (t), U), with (u ˜ 0 , U) = G(u0 þ U) where (t) = (u G(u0 ) @u G(u0 )U þ "F(u0 þ U). By defining the Wronskian matrix W as the solution of the ˙ = M(t)W such that W(0) = 1 matrix equation W (the columns of W are given by d independent
and, by inserting [4] into [3] and equating the coefficients with the same Taylor order, we obtain ðkÞ UðkÞ ðtÞ ¼WðtÞU Z t þ WðtÞ d W 1 ð ÞðkÞ ð Þ
½5
0
where (k) (t) is defined as ð1Þ ðtÞ ¼ Fðu0 ðtÞÞ ðkÞ ðtÞ ¼
1 X X 1 @pG ðu ðtÞÞ 0 p! @up p¼2 k þþk 1
Uðk1 Þ Uðkp Þ p ¼k
1 X 1 @pF þ ðu0 ðtÞÞ p! @up p¼1 X Uðk1 Þ Uðkp Þ k1 þþkp ¼k1
k2
½6
Diagrammatic Techniques in Perturbation Theory 55
Hence (k) (t) depends only on coefficients of orders strictly less than k. In this way, we obtain an algorithm useful for constructing the solution recursively, so that the problem is solved, up to (substantial) convergence problems.
Historical Excursus The study of a system like [1] by following the strategy outlined above can be hopeless if we do not make some further assumptions on the types of motions we are looking for. We shall see later, in a concrete example, that the coefficients U(k) (t) can increase in time, in a kdependent way, thus preventing the convergence of the series for large t. This is a general feature of this class of problems: if no care is taken in the choice of the initial datum, the algorithm can provide a reliable description of the dynamics only for a very short time. However, if one looks for solutions having a special dependence on time, things can work better. This happens, for instance, if one looks for quasiperiodic solutions, that is, functions which depend on time through the variable = !t, with ! 2 RN a vector with rationally independent components, that is such that ! 6¼ 0 for all 2 ZN n {0} (the dot denotes the standard inner product, ! = !1 1 þ þ !N N ). A typical problem of interest is: what happens to a quasiperiodic solution u0 (t) when a perturbation "F is added to the unperturbed vector field G, as in [1]? Situations of this type arise when considering perturbations of integrable systems: a classical example is provided by planetary motion in celestial mechanics. Perturbation series such as [4] have been extensively studied by astronomers in order to obtain a more accurate description of the celestial motions compared to that following from Kepler’s theory (in which all interactions between planets are neglected and the planets themselves are considered as points). In particular, we recall the works of Newcomb and Lindstedt (series such as [4] are now known as Lindstedt series). At the end of the nineteenth century, Poincare´ showed that the series describing quasiperiodic motions are well defined up to any perturbation order k (at least if the perturbation is a trigonometric polynomial), provided that the components of ! are assumed to be rationally independent: this means that, under this condition, the coefficients U(k) (t) are defined for all k 2 N. However, Poincare´ also showed that, in general, the series are divergent; this is due to the fact that, as seen later, in the perturbation series small divisors ! appear, which, even if they do not vanish, can be arbitrarily close to zero.
The convergence of the series can be proved indeed (more generally for analytic perturbations, or even those that are differentiably smooth enough) by assuming on ! a stronger nonresonance condition, such as the Diophantine condition j! j >
C0 jj
8 2 ZN n f0g
½7
where jj = j1 j þ þ jN j, and C0 and are positive constants. We note that the set of vectors satisfying [7] for some positive constant C0 have full measure in RN provided one takes > N 1. Such a result is part of the Kolmogorov–Arnold– Moser (KAM) theorem, and it was first proved by Kolmogorov in 1954, following an approach quite different fom the one described here. New proofs were given in 1962 by Arnol’d and by Moser, but only very recently, in 1988, Eliasson gave a proof in which a bound Ck is explicitly derived for the coefficients U(k) (t), again implying convergence for " to be small enough. Eliasson’s work was not immediately known widely, and only after publication of papers by Gallavotti and by Chierchia and Falcolini, in which Eliasson’s ideas were revisited, did his work become fully appreciated. The study of perturbation series [4] employs techniques very similar to those typical of a very different field of mathematical physics, the quantum field theory, even if such an analogy was stressed and used to full extent only in subsequent papers. The techniques have so far been applied to a wide class of problems of dynamical systems: a list of original results is given at the end.
A Paradigmatic Example Consider the case S = A TN , with A an open subset of RN , and let H0 : A ! R and f : A TN ! R be two analytic functions. Then consider the Hamiltonian system with Hamiltonian H(A, ) = H0 (A) þ "f (A, ). The corresponding equations describe a dynamical system of the form [1], with u = (A, ), which can be written explicitly: ( A_ ¼ "@ f ðA; Þ ½8 _ ¼ @A H0 ðAÞ þ "@A f ðA; Þ Suppose, for simplicity, H0 (A) = A2 =2 and f (A, ) = f (), where A2 = A A. Then, we obtain for the following closed equation: € ¼ "@ f ðÞ
½9
while A can be obtained by direct integration once [9] has been solved. For " = 0, [9] gives trivially = 0 (t) 0 þ !t, where ! = @A H0 (A0 ) = A0 is
56
Diagrammatic Techniques in Perturbation Theory
called the rotation (or frequency) vector. Hence, for " = 0 all solutions are quasiperiodic. We are interested in the preservation of quasiperiodic solutions when " 6¼ 0. For " 6¼ 0, we can write, as in [3], ¼ 0 ðtÞ þ aðtÞ;
aðtÞ ¼
1 X
"k aðkÞ ðtÞ
½10
k¼1
where a(k) is determined as the solution of the equation ðkÞ þ aðkÞ ¼ tA aðkÞ ðtÞ Z t Z ðk1Þ d d 0 ½@ f ðð 0 Þ 0
½11
0
with [@ f (( 0 )](k1) expressed as in [6]. The quasiperiodic solutions with rotation vector ! could be written as a Fourier series, by expanding X aðkÞ ðtÞ ¼ ei!t aðkÞ ½12 2ZN
Graphs and Trees
with ! as before. If the series [10], with the Taylor coefficients as in [12], exists, it will describe a quasiperiodic solution analytic in ", and in such a case we say that it is obtained by continuation of the unperturbed one with rotation vector !, that is 0 (t). Suppose that the integrand [@ f (( 0 )](k1) in [11] has vanishing average. Then the integral over 0 in [11] produces a quasiperiodic function, which in general has a nonvanishing average, so that the integral over produces a quasiperiodic (k) function plus a term linear in t. If we choose A in [11] so as to cancel out exactly the term linear in time, we end up with a quasiperiodic function. In Fourier space, an explicit calculation gives, for all 6¼ 0, að1Þ ¼ aðkÞ ¼
1 ð! Þ2 1
if 1 X
ð! Þ2 p¼1
k2
realize that, if this happened, to order k terms proportional to t2k could be present, thus requiring, at best, j"j < jtj2 for convergence up to time t. This would exclude a fortiori the possibility of quasiperiodic solutions. The aforementioned property of zero average can be verified only if the rotation vector is nonresonant, that is, if its components are rationally independent or, more particularly, if the Diophantine condition [7] is satisfied. Such a result was first proved by Poincare´, and it holds irrespective of how the parameters a(k) appearing in [11] are fixed. This reflects the fact that quasiperiodic motions take place on invariant surfaces (KAM tori), which can be parameterized in terms of the angle variables (t), so that the values a(k) contribute to the initial phases, and the latter can be arbitrarily fixed. The recursive equations [13] can be suitably studied by introducing a diagrammatic representation, as explained below.
X
ði0 Þpþ1 ðk1 Þ a1 . . . aðkp p Þ p! k1 þþkp ¼k1
0 þ1 þþp ¼
½13
which again is suitable for an iterative construction of the solution. The coefficients a(k) are left 0 undetermined, and we can fix them (arbitrarily) as identically vanishing. Of course, the property that the integrand in [11] has zero average is fundamental; otherwise, terms increasing as powers of t would appear (the so-called secular terms). Indeed, it is easy to
A (connected) graph G is a collection of points, called vertices, and lines connecting all of them. We denote with V(G) and L(G) the set of vertices and the set of lines, respectively. A path between two vertices is a minimal subset of L(G) connecting the two vertices. A graph is planar if it can be drawn in a plane without graph lines crossing. A tree is a planar graph G containing no closed loops (cycles); in other words, it is a connected acyclic graph. One can consider a tree G with a single special vertex v0 : this introduces a natural partial ordering on the set of lines and vertices, and one can imagine that each line carries an arrow pointing toward the vertex v0 . We can add an extra oriented line ‘0 connecting the special vertex v0 to another point which will be called the root of the tree; the added line will be called the root line. In this way, we obtain a rooted tree defined by V() = V(G) and L() = L(G) [ ‘0 . A labeled tree is a rooted tree together with a label function defined on the sets V() and L(). Two rooted trees which can be transformed into each other by continuously deforming the lines in the plane in such a way that the latter do not cross each other (i.e., without destroying the graph structure) will be said to be equivalent. This notion of equivalence can also be extended to labeled trees, simply by considering equivalent two labeled trees if they can be transformed into each other in such a way that the labels also match. Given two vertices v, w 2 V(), we say that w v if v is on the path connecting w to the root line. One
Diagrammatic Techniques in Perturbation Theory 57
can identify a line with the vertices it connects; given a line ‘ = (v, w), one says that ‘ enters v and exits w. For each vertex v, we define the branching number as the number pv of lines entering v. The number of unlabeled trees with k vertices can be bounded by the number of random walks with 2k steps, that is, by 4k . The labels are as follows: with each vertex v we associate a mode label v 2 ZN , and with each line we associate a momentum ‘ 2 ZN , such that the momentum of the line leaving the vertex v is given by the sum of the mode labels of all vertices preceding P v (with v being included): if ‘ = (v0 , v) then ‘ = w v w . Note that for a fixed unlabeled tree the branching labels are uniquely determined, and, for a given assignment of the mode labels, the momenta of the lines are also uniquely determined. Define ðiv Þpv þ1 Vv ¼ fv ; pv !
g‘ ¼
1 ð! ‘ Þ2
k ν Figure 1 Graphical representation of a(k).
k1
k2 ν1 ν2
k ν
=
ν
ν0
k3
ν3
νp
kp
Figure 2 Graphical representation of the recursive equation [13].
½14
where the tensor Vv is referred to as the node factor of v and the scalar g‘ as the propagator of the line ‘. One has jf j F ejj , for suitable positive constants F and , by the analyticity assumption. Then one can check that the coefficients a(k) , defined in [12], for 6¼ 0, can be expressed in terms of trees as X aðkÞ ValðÞ ¼ ðkÞ
2
0 ValðÞ ¼ @
Y v2VðÞ
10 V v A@
Y
1
½15
g‘ A
‘2LðÞ
where (k) denotes the set of all inequivalent trees with k vertices and with momentum associated with the root line, while the coefficients a(k) 0 can be fixed a(k) = 0 for all k 1, by the arbitrariness of the 0 initial phases previously remarked. The property that [@ f (( 0 ))](k1) in [11] has zero average for all k 1 implies that for all lines ‘ 2 L() one has g‘ = (! ‘ )2 only for ‘ 6¼ 0, whereas g‘ = 1 for ‘ = 0, so that the numerical values Val() are well defined for all trees . If a(k) 0 = 0 for all k 1, then ‘ 6¼ 0 for all ‘ 2 L(). The proof of [15] can be performed by induction on k. Alternatively, we can start from the recursive definition [13], whereby the trees naturally arise in the following way. Represent graphically the coefficient a(k) as in Figure 1; to keep track of the labels k and , we assign k to the black bullet and to the line. For k = 1, the black bullet is meant as a grey vertex (like the ones appearing in Figure 3).
Figure 3 An example of tree to be summed over in [15] for k = 39. The labels are not explicitly shown. The momentum of the P root line is , so that the mode labels satisfy the constraint v 2V () v = .
Then recursive equation [13] can be graphically represented as the diagram in Figure 2, provided that we associate with the (grey) vertex v0 the node factor Vv0 , with v0 = 0 and pv0 = p denoting the number of lines entering v0 , and with the lines ‘i , i = 1, . . . , p, entering v0 the momenta ‘i , respectively. Of course, the sums over p and over the possible assignments of the labels {ki }pi=1 and {i }pi=0 are understood. Each black bullet on the righthand side of Figure 2, together with its exiting line looks like the diagram on the left-hand side, so i) that it represents a(k i , i = 1, . . . , p. Note that Figure 2 has to be interpreted in the following way: if one associates with the diagram as drawn in the right-hand side a numerical value (as
58
Diagrammatic Techniques in Perturbation Theory
described above) and one sums all the values over the assignments of the labels, then the resulting quantity is precisely a(k) . The (fundamental) difference between the black bullets on the right- and left-hand sides is that the labels ki of the latter are strictly less than k, hence we can iterate the diagrammatic decomposition simply by expressing again each a(ki i ) as a(k) in [13], and so on, until one obtains a tree with k grey vertices and no black bullets; see Figure 3, where the labels are not explicitly written. This corresponds to the tree expansion [15]. Any tree appearing in [15] is an example of what physicists call a Feynman graph, while the diagrammatic rules one has to follow in order to associate to the tree its right numerical value Val() are usually called the Feynman rules for the model under consideration. Such a terminology is borrowed from quantum field theory.
Multiscale Analysis and Clusters Suppose we replace [9] with = "@ f (), so that no small divisors appear (that is, g‘ = 1 in [14]). Then convergence is easily proved P for " small enough, since (by using the identity v2V() pv = k 1 and the inequality ex xk =k! 1 for all x 2 Rþ and all k 2 N), one finds 0 1 2 k Y Y 4 F ejj=4 @ ejv j=4 A ½16 jV v j 2 v2vðÞ v2vðÞ and the sum over the mode labels can be performed by using the exponential decay factors ejv j=4 , while the sum over all possible unlabeled trees gives 4k . In particular, analyticity in t follows. Of course, the interesting case is when the propagators are present. In such a case, even if no division by zero occurs, as ! ‘ 6¼ 0 (by the assumed Diophantine condition [13] and the absence of secular terms discussed previously), the quantities ! ‘ in [14] can be very small. Then we can introduce a scale h characterizing the size of each propagator: we say that a line ‘ has scale h‘ = h 0 if ! ‘ is of order 2h C0 and scale h‘ = 1 if ! ‘ is greater than C0 (of course, a more formal definition can be easily envisaged, for which the reader is referred to the original papers). Then, we can bound j! ‘ j 2h C0 for any ‘ 2 L(), and write Y ‘2LðÞ
j g‘ j
C2k 0
1 Y
2
h¼0 2h0 k C2k exp 0 2
h¼h0
v2VðÞ
3
–1 1
3 3
1 3
3 3
2 2 2
0
3 3
3
! 2 log 2 hNh ðÞ
½17
0 0 3 5
3
2
5
6
5
6 5
2hNh ðÞ
1 X
where Nh () is the number of lines in L() with scale h and h0 is a (so far arbitrary) positive integer. The problem is then reduced to that of finding an estimate for Nh (). To identify which kinds of tree are the source of problems, we introduce the notion of a cluster and a self-energy graph. A cluster T with scale hT is a connected set of nodes linked by a continuous path of lines with the same scale label hT or a lower one and which is maximal, namely all the lines not belonging to T but connected to it have scales higher than hT and at least one line in T has scale hT . An inclusion relation is established between clusters, in such a way that the innermost clusters are the clusters with lowest scale, and so on. Each cluster T can have an arbitrary number of lines coming into it (entering lines), but only one or zero lines coming out from it (exiting line): lines of T which either enter or exit T are called external lines. A cluster T with only one entering line ‘2T and with one exiting line ‘1T such that one has ‘1 = ‘2 will be called a self-energy graph T T (SEG) or resonance. In such a case, the line ‘1T is called a resonant line. Examples of clusters and SEGs are suggested by the bubbles in Figure 4; the mode labels are not represented, whereas the scales of the lines are explicitly written. If Sh () is the number of SEGs whose resonant lines have scales h, then Nh () = Nh () Sh () will denote the number of nonresonant lines with scale h. A fundamental result, known as Siegel–Bryuno lemma, shows that, for some positive constant c, one has X Nh ðÞ 2h= c jv j ½18
2
Figure 4 Examples of clusters and SEGs. Note that the tree itself is a cluster (with scale 6), and each of the two clusters with one entering and one exiting lines is a SEG only if the momenta of its external lines are equal to each other.
Diagrammatic Techniques in Perturbation Theory 59
ν0
ν0
–ν0
ν0
–ν0
ν0
–ν0
ν0
–ν0
ν0
–ν0
ν
–ν0
Figure 5 Example of tree whose value grows like a factorial.
which, if inserted into [17] instead of Nh (), would give a convergent series; then h0 should be chosen in such a way that P the sum of the series in [17] is less than, say, v2VðÞ jv j=8. The bound [18] is a very deep one, and was originally proved by Siegel for a related problem (Siegel’s problem), in which, in the formalism followed here, SEGs do not occur; such a bound essentially shows that accumulation of small divisors is possible only in the presence of SEGs. A possible tree with k vertices whose value can be proportional to some power of k! is represented in Figure 5, where a chain of (k 1)=2 SEGs, k odd, is drawn with external lines carrying a momentum such that ! C0 jj . In order to take into account the resonant lines, we have to add a factor (! ‘ )2 for each resonant line ‘. It is a remarkable fact that, even if there are trees whose value cannot be bounded as a constant to the power k, there are compensations (that is, partial cancellations) between the values of all trees with the same number of vertices, such that the sum of all such trees admits a bound of this kind. The cancellations can be described graphically as follows. Consider a tree with a SEG T. Then take all trees which can be obtained by shifting the external lines of T, that is, by attaching such lines to all possible vertices internal to T, and sum together the values of all such trees. An example is given in Figure 6. The corresponding sum turns out to be proportional to (! )2 , if is the momentum of the resonant line of T, and such a factor compensates exactly the propagator of this line. The argument above can be repeated for all SEGs: this requires a little care because there are SEGs which are inside some other SEGs. Again, for details and a more formal discussion, the reader is referred to original papers.
The conclusion is that we can take into account the resonant lines: this simply adds an extra constant raised to the power k, so that an overall estimate Ck , for some C > 0, holds for U(k) (t), and the convergence of the series follows.
Other Examples and Applications The discussion carried out so far proves a version of the KAM theorem, for the system described by [9], and it is inspired by the original papers by Eliasson (1996) and, mostly, by Gallavotti (1994). Here we list some problems in which original results have been proved by means of the diagrammatic techniques described above, or by some variants of them. These are discussed in the following. The first generalization one can think of is the problem of conservation in quasi-integrable systems of resonant tori (that is, invariant tori whose frequency vectors have rationally dependent components). Even if most of such tori disappear as an effect of the perturbation, some of them are conserved as lowerdimensional tori, which, generically, become of either elliptic or hyperbolic or mixed type according to the sign of " and the perturbation. With techniques extending those described here (introducing also, in particular, a suitable resummation procedure for divergent series), this has been done by Gallavotti and Gentile; see Gallavotti et al. (2004) and Gallavotti and Gentile (2005) for an account. An expansion like the one considered so far can be envisaged also for the motions occurring on the stable and unstable manifolds of hyperbolic lowerdimensional tori for perturbations of Hamiltonians describing a system of rotators (as in the previous case) plus n pendulum-like systems. In such a case, the function G(u) has a less simple form. For n = 1, one can look for solutions which depend on time through two variables, = !t and x = egt , with (!, g) 2 RNþ1 , and ! Diophantine as before and g related to the timescale of the pendulum. This has been worked out by Gallavotti (1994), and then used by Gallavotti et al. (1999) to study a class of three-timescale systems, in order to obtain a lower bi
bi
bi
bi
bi
bi
Figure 6 Example of SEGs whose values have to be summed together in order to produce the cancellation discussed in the text. The mode labels are all fixed.
60
Diagrammatic Techniques in Perturbation Theory
bound on the homoclinic angles (i.e., the angles between the stable and unstable manifolds of hyperbolic tori which are preserved by the perturbation). The formalism becomes a little more involved, essentially because of the entries of the Wronskian matrix appearing in [5]. In such a case, the unperturbed solution u0 (t) corresponds to the rotators moving linearly with rotation vector ! and the pendulum moving along its separatrix; a nontrivial fact is that if g0 denotes the Lyapunov exponent of the pendulum in the absence of the perturbation, then one has to look for an expansion in x = egt with g = g0 þ O("), because the perturbation changes the value of such an exponent. The same techniques have also been applied to study the relation of the radius of convergence of the standard map, an area-preserving diffeomorphism from the cylinder to itself, which has been widely studied in the literature since the original papers by Greene and by Chirikov, both appeared in 1979, with the arithmetical properties of the rotation vector (which is, in this case, just a number). In particular, it has been proved that the radius of convergence is naturally interpolated through a function of the rotation number known as Bryuno function (which has been introduced by Yoccoz as the solution of a suitable functional equation completely independent of the dynamics); see Berretti and Gentile (2001) for a review of results of this and related problems. Also the generalized Riccati equation u˙ iu2 2if (!t) þ i"2 = 0, where ! 2 Td is Diophantine and f is an analytic periodic function of = !t, has been studied with the diagrammatic technique by Gentile (2003). Such an equation is related to two-level quantum systems (as first used by Barata), and existence of quasiperiodic solutions of the generalized Riccati equation for a large measure set E of values of " can be exploited to prove that the spectrum of the corresponding two-level system is pure point for those values of "; analogously, one can prove that, for fixed ", one can impose some further nonresonance conditons on !, still leaving a full measure set, in such a way that the spectrum is pure point. (We note, in addition, that, technically, such a problem is very similar to that of studying conservation of elliptic lower-dimensional tori with one normal frequency.) Finally we mention a problem of partial differential equations, where, of course, the scheme bi
described above has to be suitably adapted: this is the study of periodic solutions for the nonlinear wave equation utt uxx þ mu = ’(u), with Dirichlet boundary conditions, where m is a real parameter (mass) and ’(u) is a strictly nonlinear analytic odd function. Gentile and Mastropietro (2004) reproduced the result of Craig and Wayne for the existence of periodic solutions for a large measure set of periods, and, in a subsequent paper by the same authors with Procesi (2005), an analogous result was proved in the case m = 0, which had previously remained an open problem in literature. bi
See also: Averaging Methods; Integrable Systems and Discrete Geometry; KAM Theory and Celestial Mechanics; Stability Theory and KAM.
Further Reading Berretti A and Gentile G (2001) Renormalization group and field theoretic techniques for the analysis of the Lindstedt series. Regular and Chaotic Dynamics 6: 389–420. Chierchia L and Falcolini C (1994) A direct proof of a theorem by Kolmogorov in Hamiltonian systems. Annali della Scuola Normale Superiore di Pisa 21: 541–593. Eliasson LH (1996) Absolutely convergent series expansions for quasi periodic motions. Mathematical Physics Electronic Journal 2, paper 4 (electronic), Preprint 1988. Gallavotti G (1994) Twistless KAM tori, quasi flat homoclinic intersections, and other cancellations in the perturbation series of certain completely integrable Hamiltonian systems. A review. Reviews in Mathematical Physics 6: 343–411. Gallavotti G, Bonetti F, and Gentile G (2004) Aspects of the Ergodic, Qualitative and Statistical Theory of Motion. Berlin: Springer. Gallavotti G and Gentile G (2005) Degenerate elliptic tori. Communications in Mathematical Physics 257: 319–362. Gallavotti G, Gentile G, and Mastropietro V (1999) Separatrix splitting for systems with three time scales. Communications in Mathematical Physics 202: 197–236. Gentile G (2003) Quasi-periodic solutions for two-level systems. Communications in Mathematical Physics 242: 221–250. Gentile G and Mastropietro V (2004) Construction of periodic solutions of the nonlinear wave equation with Dirichlet boundary conditions by the Lindstedt series method. Journal de Mathe´matiques Pures et Applique´es 83: 1019–1065. Gentile G, Mastropietro V, and Procesi M (2005) Periodic solutions for completely resonant nonlinear wave equations with Dirichlet boundary conditions. Communications in Mathematical Physics 256: 437–490. Harary F and Palmer EM (1973) Graphical Enumeration. New York: Academic Press. Poincare´ H (1892–99) Les me´thodes nouvelles de la me´canique ce´leste, vol. I–III. Paris: Gauthier-Villars.
Dimer Problems 61
Dimer Problems R Kenyon, University of British Columbia, Vancouver, BC, Canada ª 2006 Elsevier Ltd. All rights reserved.
Definitions The dimer model arose in the mid-twentieth century as an example of an exactly solvable statistical mechanical model in two dimensions with a phase transition. It is used to model a number of physical processes: free fermions in 1 dimension, the twodimensional Ising model, and various other two-dimensional statistical-mechanical models at restricted parameter values, such as the 6- and 8-vertex models and O(n) models. A number of observable quantities such as the ‘‘height function’’ and densities of motifs have been shown to have conformal invariance properties in the scaling limit (when the lattice spacing tends to zero). Recently, the model is also used as an elementary model of crystalline surfaces in R3 . A dimer covering, or perfect matching, of a graph is a set of edges (‘‘dimers’’) which covers every vertex exactly once. In other words, it is a pairing of adjacent vertices (see Figure 1a which is a dimer covering of an 8 8 grid). Dimer coverings of a grid are sometimes represented as domino tilings, that is, tilings with 2 1 rectangles (Figure 1b). The dimer model is the study of the set of dimer coverings of a graph. Typically, the underlying graph is taken to be a regular lattice in two dimensions, for example, the square grid or the honeycomb lattice, or a finite part of such a lattice. Dimer coverings of the honeycomb graph are in bijection with tilings of plane regions with 60 rhombi, also known as lozenges (see Figure 2). These tilings in turn are projections of piecewiselinear surfaces in R3 composed of unit squares in the 2-skeleton of Z3 . So one can think of honeycomb dimer coverings as modeling discrete surfaces in R3 . These surfaces are monotone in the sense that the orthogonal projection to the plane P111 = {(x, y, z)jx þ y þ z = 0} is injective.
(a)
(b)
Figure 1 A dimer covering of a grid and the corresponding domino tiling.
Figure 2 Honeycomb dimers (solid) and the corresponding ‘‘lozenge’’ tilings (gray).
Other models related to the dimer model are:
The spanning tree model on planar graphs. The
set of spanning trees on a planar graph is in bijection with the set of dimer coverings on an associated bipartite planar graph. Conversely, dimer coverings of a bipartite planar graph are in bijection with directed spanning trees on an associated graph. The Ising model on a planar graph with zero external field can be modeled with dimers on an associated planar graph. Plane partitions (three-dimensional versions of integer partitions). Viewing a plane partition along the (1, 1, 1)-direction, one sees a lozenge tiling of the plane. Annihilating random walks in one dimension can be modeled with dimers on an associated planar graph. The monomer-dimer model, where one allows a certain density of holes (monomers) in a dimer covering. This model is unsolved at present, although some partial results have been obtained.
Gibbs Measures
The most general setting in which the dimer model can be solved is that of an arbitrary planar graph with energies on the edges. We define here the corresponding measure. Let G = (V, E) be a graph and M(G) the set of dimer coverings of G. Let E be a real-valued function on the edges of G, with E(e) representing the energy associated to a dimer on the bond e. One defines the energy of a dimer covering as the sum of the energies of those bonds covered with dimers.
62
Dimer Problems
The partition function of the model on (G, E) is then the sum X Z¼ eEðCÞ=kT C2MðGÞ
where the sum is over dimer coverings. In what follows we will take kT = 1 for simplicity. Note that Z depends on both G and E. The partition function is well defined for a finite graph and defines the Gibbs measure, which is by definition the probability measure = E on the set M(G) of dimer coverings satisfying (C) = (1=Z)eE(C) for a covering C. For an infinite graph G with fixed energy function E, a Gibbs measure on M(G) is by definition any measure which is a limit of the Gibbs measures on a sequence of finite subgraphs which fill out G. There may be many Gibbs measures on an infinite graph, since this limit typically depends on the sequence of finite graphs. When G is an infinite periodic graph (and E is periodic as well), it is natural to consider translation-invariant Gibbs measures; one can show that in the case of a bipartite, periodic planar graph the translation-invariant and ergodic Gibbs measures form a two-parameter family – see Theorem 3 below. For a translation-invariant Gibbs measure which is a limit of Gibbs measures on an increasing sequence of finite graphs Gn , one can define the partition function per vertex of to be the limit Z ¼ lim ZðGn Þ1=jGn j n!1
where jGn j is the number of vertices of Gn . The free energy, or surface tension, of is log Z.
adjacent. We then have the following result of Kasteleyn: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Theorem 1 Z = jPf(K)j = j det Kj. Here Pf(K) denotes the Pfaffian of K. Such an orientation of edges (which always exists for planar graphs) is called a Kasteleyn orientation; any two such orientations can be obtained from one another by a sequence of operations consisting of reversing the orientations of all edges at a vertex. If G is a bipartite graph, that is, the vertices can be colored black and white with no neighbors having the same color, then the Pfaffian of K is the determinant of the submatrix whose rows index the white vertices and columns index the black vertices. For bipartite graphs, instead of orienting the edges one can alternatively multiply the edge weights by a complex number of modulus 1, with the condition that the alternating product around each face (the first, divided by the second, times the third, as so on) is real and negative. For nonplanar graphs, one can compute the partition function as a sum of Pfaffians; for a graph embedded on a surface of Euler characteristic , this requires in general 22 Pfaffians. Local Statistics
The inverse of the Kasteleyn matrix can be used to compute the local statistics, that is, the probability that a given set of edges occurs in a random dimer covering (random with respect to the Gibbs measure ). Theorem 2 Let S = {(v1 , v2 ), . . . , (v2k1 , v2k )} be a set of edges of G. The probability that all these edges occur in a -random covering is ! k Y Kv2i1 ;v2i Pf 2k2k ððK1 Þvi ;vj Þ PrðSÞ ¼ i¼1
Combinatorics
Again, for bipartite graphs the Pfaffian can be made into a determinant.
Partition Function
One can compute the partition function for dimer coverings on a finite planar graph G as the Pfaffian (square root of the determinant) of a certain antisymmetric matrix, the Kasteleyn matrix. The Kasteleyn matrix is an oriented adjacency matrix of G, indexed by the vertices V: orient the edges of a graph embedded in the plane so that each face has an odd number of clockwise oriented edges. Then define K = (Kvv0 ) with 0
Kvv0 ¼ eEðvv Þ if G has an edge vv0 , with a sign according to the orientation of that edge, and Kvv0 = 0 if v, v0 are not
Heights
Bipartite graphs Suppose G is a bipartite planar graph. A 1-form on G is simply a function on the set of oriented edges which is antisymmetric with respect to reversing the edge orientation: f (e) = f (e) for an edge e. A 1-form can be identified with a flow: just flow by f (e) along oriented edge e. The divergence of the flow f is then d f . Let be the space of flows on edges of G, with divergence 1 at each white vertex and divergence 1 at each black vertex, and such that the flow along each edge from white to black is in [0, 1]. From a dimer covering M one can construct such a flow !(M) 2 : just flow
Dimer Problems 63
one unit along each dimer, and zero on the remaining edges. The set is a convex polyhedron in R E and its vertices can be seen to be exactly the dimer coverings. Given any two flows !1 , !2 2 , their difference is a divergence-free flow. Its dual (!1 !2 ) (or conjugate flow) defined on the planar dual of G is therefore the gradient of a function h on the faces of G, that is, (!1 !2 ) = dh, where h is well defined up to an additive constant. When !1 and !2 come from dimer coverings, h is integer valued, and is called the height difference of the coverings. The level sets of the function h are just the cycles formed by the union of the two matchings. If we fix a ‘‘base point’’ covering !0 and a face f0 of G, we can then define the height function of any dimer covering (with flow !) to be the function h with value zero at f0 and which satisfies dh = (! !0 ) .
below). When G is nonbipartite, it is conjectured that there is a single ergodic Gibbs measure. In the remainder of this section we assume that G is bipartite, and assume also that the Z2 -action preserves the coloring of the edges as black and white (simply pass to an index-2 sublattice if not). For integer n > 0 let Gn = G=nZ2 , a finite graph on a torus (in other words, with periodic boundary conditions). For a dimer covering M of Gn , we define (hx , hy ) 2 Z2 to be the horizontal and vertical height change of M around the torus, that is, the net flux of !(M) !0 across a horizontal, respectively vertical, cut around the torus (in other words, hx , hy are the horizontal and vertical periods around the torus of the 1-form !(M) !0 ). The characteristic polynomial P(z, w) of G is by definition X Pðz; wÞ ¼ eEðMÞ zhx why ð1Þhx hy M2MðG1 Þ
Nonbipartite graphs On a nonbipartite planar graph the height function can be similarly defined modulo 2. Fix a base covering !0 ; for any other covering !, the superposition of !0 and ! is a set of cycles and doubled edges of G; the function h is constant on the complementary components of these cycles and changes by 1 mod 2 across each cycle. We can think of the height modulo 2 as taking two values, or spins, on the faces of G, and the dimer chains are the spin-domain boundaries. In particular, dimers on a nonbipartite graph model can in this way model the Ising model on an associated dual planar graph.
Thermodynamic Limit By periodic planar graph we mean a graph G, with energy function on edges, for which translations by elements of Z2 or some other rank-2 lattice R2 are isomorphisms of G preserving the edge energies, and such that the quotient G=Z2 is a finite graph. Without loss of generality we can take = Z2 . The standard example is G = Z2 with E 0, which we refer to as ‘‘dimers on the grid.’’ However, other examples display different global behaviors and so it is worthwhile to remain in this generality. For a periodic planar graph G, an ergodic probability measure on M(G) is one which is translation invariant (the measure of a set is the same as any Z2 -translate of that set) and whose invariant subsets have measure 0 or 1. We will be interested in probability measures which are both ergodic and Gibbs (we refer to them as ergodic Gibbs measures, dropping the term ‘‘probability’’). When G is bipartite, there are multiple ergodic Gibbs measures (see Theorem 3
here the sum is over dimer coverings M of G1 = G=Z2 , and hx , hy depend on M. The polynomial P depends on the base point !0 only by a multiplicative factor involving a power of z and w. From this polynomial most of the large-scale behavior of the ergodic Gibbs measures can be extracted. The Gibbs measure on Gn converges as n ! 1 to the (unique) ergodic Gibbs measure with smallest free energy F = log Z. The unicity of this measure follows from the strict concavity of the free energy of ergodic Gibbs measures as a function of the slope, see below. The free energy F of the minimal free energy measure is Z 1 dz dw F¼ log Pðz; wÞ 2 z w ð2iÞ S1 S1 that is, minus the Mahler measure of P. For any translation-invariant measure on M(G), the average slope (s, t) of the height function for almost every tiling is by definition the expected horizontal and vertical height change over one fundamental domain, that is, s = E[h(f þ (1, 0)) h(f )] and t = E[h(f þ (0, 1)) h(f )] where f is any face. This quantity (s, t) lies in the Newton polygon of P(z, w) (the convex hull in R 2 of the set of exponents of monomials of P). In fact, the points in the Newton polygon are in bijection with the ergodic Gibbs measures on M(G): Theorem 3 When G is a periodic bipartite planar graph, any ergodic Gibbs measure has average slope (s, t) lying in N(P). Moreover, for every point (s, t) 2 N(P) there is a unique ergodic Gibbs measure (s, t) with that average slope.
64
Dimer Problems
In particular, this gives a complete description of the set of all ergodic Gibbs measures. The ergodic Gibbs measure (s, t) of slope (s, t) can be obtained as the limit of the Gibbs measures on Gn , when one conditions the configurations to have a particular slope approximating (s, t). Ronkin Function and Surface Tension
The Ronkin function of P is a map R : R2 ! R defined for (Bx , By ) 2 R2 by Z 1 dz dw log PðzeBx ; weBy Þ RðBx ; By Þ ¼ 2 z w ð2iÞ S1 S1
Figure 4 Minus the Ronkin function of P(z, w ) = 5 þ z þ 1=z þ w þ 1=w.
The Ronkin function is convex and its graph is piecewise linear on the complement of the amoeba A(P) of P, which is the image of the zero set {(z, w) 2 C2 j P(z, w) = 0} under the map (z, w) 7! ( log jzj, log jwj) (see Figures 3 and 4 for an example). The free energy F((s, t)) of (s, t), as a function of (s, t) 2 N(P), is the Legendre dual of the Ronkin function of P(z, w): we have Figure 5 (Negative of) the free energy for dimers on the square-octagon lattice.
Fððs; tÞÞ ¼ RðBx ; By Þ sBx tBy where @RðBx ; By Þ ; s¼ @Bx
@RðBx ; By Þ t¼ @By
The continuous map rR : R2 ! N(P) which takes (Bx , By ) to (s, t) is injective on the interior of A(P), collapses each bounded complementary component of A(P) to an integer point in the interior of N(P), and collapses each unbounded complementary component of A(P) to an integer point on the boundary of N(P). Under the Legendre duality, the facets in the graph of the Ronkin function (i.e., maximal regions
4
2
on which R is linear) give points of nondifferentiability of the free energy F, as defined on N(P). We refer to these points of nondifferentiability as ‘‘cusps.’’ Cusps occur only at integer slopes (s, t) (see Figure 5 for the free energy associated to the Ronkin function in Figure 4). By Theorem 3, the coordinates (Bx , By ) can also be used to parametrize the set of Gibbs measures (s, t) (but only those with slope (s, t) in the interior of N(P) or on the corners of N(P) and boundary integer points). This parametrization is not one-toone since when (Bx , By ) varies in a complementary component of the amoeba, the measure (s, t) does not change. On the interior of the amoeba the parametrization is one-to-one. The remaining Gibbs measures, whose slopes are on the boundary of N(P), can be obtained by taking limits of (Bx , By ) along the ‘‘tentacles’’ of the amoeba. Phases
–4
–2
2
4
–2
–4 Figure 3 The amoeba of P(z, w ) = 5 þ z þ 1=z þ w þ 1=w , which is the characteristic polynomial for dimers on the periodic ‘‘square-octagon’’ lattice.
The Gibbs measures (s, t) can be partitioned into three classes, or phases, according to the behavior of the fluctuations of the height function. If we measure the height at two distant points x1 and x2 in G, the average height difference, E[h(x1 ) h(x2 )], is a linear function of x1 x2 determined by the average slope of the measure. The height fluctuation is defined to be the random variable h(x1 ) h(x2 ) E[h(x1 ) h(x2 )]. This random variable depends on
Dimer Problems 65
the two points and we are interested in its behavior when x1 and x2 are far apart. We say (s, t) is 1. ‘‘Frozen’’ if the height fluctuations are bounded almost surely. 2. ‘‘Rough’’ (or ‘‘liquid’’) if the covariance in the height function E[h(x1 )h(x2 )] E[h(x1 )]E[h(x2 )] is unbounded as jx1 x2 j ! 1. 3. ‘‘Smooth’’ (or ‘‘gaseous’’) if the covariance of the height function is bounded but the height fluctuations are unbounded. The height fluctuations can be related to the decay of the entries of K1 , which are in turn related to the decay of the Fourier coefficients of 1=P. In particular, we have
0
–i 13 – 10 4 π
The characteristic polynomials P which occur in the dimer model are not arbitrary: their algebraic curves {P = 0} are all of a special type known as Harnack curves, which are characterized by the fact that the map from the zero-set of P in C2 to its amoeba in R2 is at most two-to-one. In fact: Theorem 5 By varying the edge energies all Harnack curves can be obtained as the characteristic polynomial of a planar dimer model.
4–5 π 4
0
0
1 –i 1 π –4
0
1–1 π 4
0
1–1 π 4
0
0
1 –i 1 π –4
0
–i – 5 + 4 4 π 0
0
–i 3 – 2 4 π
0
–i 4
(0, 0)
Theorem 4 The measure (s, t) is respectively frozen, rough, or smooth according to whether (Bx , By ) = (rR)1 (s, t) is in the closure of an unbounded complementary component of A(p), in the interior of A(P), or in the closure of a bounded component of A(P).
0
0 1 4
0
3–2 4 π
0
13 – 10 4 π
Figure 6 Values of K 1 on Z 2 with zero energies.
1 translation invariance K1 (x0 , y0 ), (x, y) = K(0, 0), (xx0 , yy0 ) ) and values in other quadrants can be obtained by 1 K1 (0, 0), (x, y) = iK(0, 0), (y, x) ). As a sample computation, using Theorem 2, the probability that the dimer covering the origin points to the right and, simultaneously, the one covering (0, 1) points upwards is
Kð0;0Þ;ð1;0Þ Kð0;2Þ;ð0;1Þ det ¼ 1 ðiÞ det
K1 ð0;0Þ;ð1;0Þ
K1 ð0;0Þ;ð0;1Þ
K1 ð0;2Þ;ð1;0Þ !
K1 ð0;2Þ;ð0;1Þ
1 4 14 þ 1
i 4 i 4
¼
!
1 4
Local Statistics
In the thermodynamic limit (on a periodic planar graph), local statistics of dimer coverings for the Gibbs measure of minimal free energy can be obtained from the limit of the inverse of the Kasteleyn matrix on the finite toroidal graphs Gn . This in turn can be computed from the Fourier coefficients of 1/P. As an example, let G be the square grid Z2 and take E = 0 (which corresponds to the uniform measure on configurations for finite graphs). An appropriate choice of signs for the Kasteleyn matrix is to put weights 1, 1 on alternate horizontal edges and i, i on alternate vertical edges in such a way that around each white vertex the weights are cyclically 1, i, 1, i. For this choice of signs we have Z 2 Z 2 iðxþyÞ 1 e d d K1 ¼ ð0;0Þ;ðx;yÞ 2 sin þ 2i sin ð2Þ2 0 0 This integral can be evaluated explicitly (see Figure 6 for values of K1 (0, 0), (x, y) near the origin; by
Another computation which follows is the decay of the edge covariances. If e1 , e2 are two edges at distance d, then Pr(e1 &e2 ) Pr(e1 )Pr(e2 ) decays quadratically in 1/d, since K1 ((0, 0), (x, y)) decays like 1=(jxj þ jyj).
Scaling Limits The scaling limit of the dimer model is the limit when the lattice spacing tends to zero. Let us define the scaling limit in the following way. Let Z2 be the square grid scaled by , so the lattice mesh size is . Fix a Jordan domain U R2 and consider for each a subgraph U of Z2 , bounded by a simple polygon, which tends to U as ! 0. We are interested in limiting properties of random dimer coverings of U , in the limit as ! 0, for example, the fluctuations of the height function and edge densities.
66
Dimer Problems
The limit depends on the (sequence of) boundary conditions, that is, on the exact choice of approximating regions U . By changing U one can change the limiting rescaled height function along the boundary. It is conjectured that the limit of the height function along the boundary of U (scaled by . . . and assuming this limit exists) determines essentially all of the limiting behavior in the interior, in particular the limiting local statistics. Therefore, let u be a real-valued continuous function on the boundary of U. Consider a sequence of subgraphs U of Z2 , as ! 0 as above, and whose height function along the boundary, when scaled by , is approximating u. We discuss the limit of the model in this setting. Crystalline Surfaces
The height function allows us to view dimer coverings as random surfaces in R3 : to a dimer covering of G, one associates the graph of its height function, extended in a piecewise linear fashion over the edges and faces of the dual G . These surfaces are then piecewise linear random surfaces, which resemble crystal surfaces in the sense that microscopically (on the scale of the lattice) they are rough, whereas their long-range behavior is smooth and facetted, as we now describe. In the scaling limit, boundary conditions as described in the last paragraph of the previous section are referred to as ‘‘wire-frame’’ boundary conditions, since the graph of the height function can be thought of as a (random) surface spanning the wire frame defined by its boundary values. In the scaling limit, there is a law of large numbers which says that the Gibbs measure on random surfaces (which is unique since we are dealing with a finite graph) concentrates, for fixed wire-frame boundary conditions, on a single surface S0 . That is, as the lattice spacing tends to zero, with probability tending to 1 the random surface lies close to a limiting surface S0 . The surface S0 is the unique surface which minimizes the total surface tension, or free energy, for its fixed boundary values, that is, minimizes the integral over the surface of the F((s, t)), where (s, t) is the slope of the surface at the point being integrated over. Existence and unicity of the minimizer follow from the strict convexity of the free energy/surface tension as a function of the slope. At a point where the free energy has a cusp, the crystal surface S0 will in general have a facet, that is, a region on which it is linear. Outside of the facets, one expects that S0 is analytic, since the free energy is analytic outside the cusps.
Fluctuations
While the scaled height function h in the scaling limit converges to its mean value h0 (whose graph is the surface S0 ), the fluctuations of the unrescaled height function h (1=)h0 will converge in law to a random process on U. In the simplest setting, that of honeycomb dimers with E 0, and in the absence of facets, the height fluctuations converge to a continuous Gaussian process, the image of the Gaussian free field on the unit disk D under a certain diffeomorphism (depending on h0 ) of D to U. In the particular case h0 = 0, is the Riemann map from D to U and the law of the height fluctuations is just the Gaussian free field on U (defined to be the Gaussian process whose covariance kernel is the Dirichlet Green’s function). The conformal invariance of the Gaussian free field is the basis for a number of conformal invariance properties of the honeycomb dimer model. Densities of Motifs
Another observable of interest is the density field of a motif. A motif is a finite collection of edges, taken up to translation. For example, consider, for the square grid, the ‘‘L’’ motif consisting of a horizontal domino and a vertical domino aligned to form an ‘‘L,’’ which we showed above to have a density 1=4 in the thermodynamic limit. The probability of seeing this motif at any given place is 1=4. However, in the scaling limit one can ask about the fluctuations of the occurrences of this motif: in a large ball around a point x, what is the distribution of NL A=4, where NL is the number of occurrences of the motif, and A is the area of the ball? These fluctuations form a random field, since there is a long-range correlation between occurrences of the motif. It is known that on Z2 , for the minimal free energy ergodic Gibbs measure, the rescaled density field 1 A pffiffiffiffi NL 4 A converges as ! 0 weakly to a Gaussian random field which is a linear combination of a directional derivative of the Gaussian free field and an independent white noise. A similar result holds for other motifs. The joint distribution of densities of several motifs can also be shown to be Gaussian. See also: Combinatorics: Overview; Determinantal Random Fields; Growth Processes in Random Matrix Theory; Statistical Mechanics and Combinatorial Problems; Statistical Mechanics of Interfaces.
Dirac Fields in Gravitation and Nonabelian Gauge Theory 67
Further Reading Cohn H, Kenyon R, and Propp J (2001) A variational principle for domino tilings. Journal of American Mathematical Society 14(2): 297–346.
Kasteleyn P (1967) Graph theory and crystal physics. In: Graph Theory and Theoretical Physics, pp. 43–110. London: Academic Press. Kenyon R, Okounkov A, and Sheffield S (2005) Dimers and Amoebae. Annals of Mathematics, math-ph/0311005.
Dirac Fields in Gravitation and Nonabelian Gauge Theory J A Smoller, University of Michigan, Ann Arbor, MI, USA ª 2006 Elsevier Ltd. All rights reserved.
that the gravitational field equations must be tensor equations; that is, coordinates are an artifact, and physics should not depend on the choice of coordinates.
Introduction bi
In this article we describe some recent results (Finster et al. 1999a,b, 2000 a–c, 2002a) concerning the existence of both particle-like, and black hole solutions of the coupled Einstein–Dirac–Yang–Mills (EDYM) equations. We show that there are stable globally defined static, spherically symmetric solutions. We also show that for static black hole solutions, the Dirac wave function must vanish identically outside the event horizon. The latter result indicates that the Dirac particle (fermion) must either enter the black hole or tend to infinity. The plan of the article is as follows. The next section describes the background material. It is followed by a discussion of the coupled EDYM equations for static, spherically symmetric particlelike and black hole solutions. The final section of the article is devoted to a discussion of these results.
Einstein’s Equations of GR
The metric gij =gij (x), i,j=0,1,2,3, x=(x0 ,x1 ,x2 ,x3 ), x0 =ct (c=speed of light, t=time), is the metric tensor defined on four-dimensional spacetime. Einstein’s equations are ten (tensor) equations for the unknown metric gij (gravitational field), and take the form Rij 12 Rgij ¼ Tij
½1
where the left-hand side Gij = Rij 12 Rgij is the Einstein tensor and depends only on the geometry, = 8G=c4 , where G is Newton’s gravitational constant, while Tij , the energy–momentum tensor, represents the source of the gravitational field, and encodes the distribution of matter. (The word ‘‘matter’’ in GR refers to everything which can produce a gravitational field, including elementary particles, electromagnetic or Yang–Mills (YM) fields. From the Bianchi identities in geometry (cf. Adler et al. (1975)), the (covariant) divergence of the Einstein tensor, Gij , vanishes identically, namely bi
Background Material Einstein’s Equations
j
We begin by describing the Einstein equation for the gravitational field (for more details, see, e.g., Adler et al. (1975)). We first note Einstein’s hypotheses of general relativity (GR):
Gi;j ¼ 0
bi
(E1) The gravitational field is the metric gij in 3 þ 1 spacetime dimensions. The metric is assumed to be symmetric. (E2) At each point in spacetime, the metric can be diagonalized as diag(1,1,1,1). (E3) The equations which describe the gravitational field should be covariant; that is, independent of the choice of coordinate system. The hypothesis (E1) is Einstein’s brilliant insight, whereby he ‘‘geometrizes’’ the gravitational field. (E2) means that there are inertial frames at each point (but not globally), and guarantees that special relativity (SR) is included in GR, while (E3) implies
so, on solutions of Einstein’s equations, j Ti;j ¼0
and this in turn expresses the conservation of energy and momentum. The quantities which comprise the Einstein tensor are given as follows: first, from the metric tensor gij , we form the Levi-Civita connection kij defined by: 1 k‘ @g‘j @gi‘ @gij k ij ¼ g 2 @xi @xj @x‘ where (4 4 matrix) [gk‘ ] = [gk‘ ]1 , and summation convention is employed; namely, an index which appears as both a subscript and a superscript is to be summed from 0 to 3. With the aid of kij , we can
68
Dirac Fields in Gravitation and Nonabelian Gauge Theory
construct the celebrated Riemann curvature tensor Riqk‘ : Riqk‘ ¼
@iq‘ @xk
@iqk @x‘
þ ipk pq‘ ip‘ pqk
Finally, the terms Rij and R which appear in the Einstein tensor Gij are given by Rij ¼
Rsisj
(the Ricci tensor), and
AðÞ ¼ 0;
AðrÞ > 0 if r >
is called the radius of the black hole, or the event horizon. Yang–Mills Equations
The YM equations generalize Maxwell’s equations. To see how this comes about, we first write Maxwell’s equations in an invariant way. Thus, let A denote a scalar-valued 1-form: A ¼ Ai dxi ;
R ¼ gij Rij is the scalar curvature. From the above definitions, one sees at once the enormous complexity of the Einstein equations. For this reason, one usually seeks solutions which have a high degree of symmetry, and in what follows, in this section, we shall only consider static, spherically symmetric solutions; that is, solutions which depend qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi only on r = jxj = (x1 )2 þ (x2 )2 þ (x3 )2 . In this case, the metric gij takes the form ds2 ¼ TðrÞ2 dt2 þ AðrÞ1 dr2 þ r2 d2
½2
2
where d2 = d2 þ sin d’2 is the standard metric on the unit 2-sphere, r,,’ are the usual spherical coordinates, and t denotes time. Black Hole Solutions
Consider the problem of finding the gravitational field outside a ball of mass M in R 3 ; that is, there is no matter exterior to the ball. Solving Einstein’s equations Gij 0 = 0 gives the famous Schwarzschild solution (1916): 2m 2 2 2 ds ¼ 1 c dt r 2m 1 2 þ 1 dr þ r2 d2 ½3 r where m = GM=c2 . Since 2m has the dimensions of length, it is called the Schwarzschild radius. Observe that when r = 2m, the metric is singular; namely, gtt = 0 and grr = 1. By transforming the metric [2] to the so-called Kruskal coordinates (cf. Adler et al. (1975)), one observes that the Schwarzschild sphere r = 2m has the physical characteristics of a black hole: light and nearby particles can enter the region r < 2m, nothing can exit this region, and there is an intrinsic (nonremovable) singularity at the center r = 0. For the general metric [2], we define a black hole solution of Einstein’s equations to be a solution which satisfies, for some > 0,
Ai 2 R
which is called the electromagnetic potential (by physicists), or a connection (by geometers). The electromagnetic field (curvature) is the 2-form F ¼ dA In local coordinates, F ¼ Fij dxi ^ dxj ;
Fij ¼
@Aj @Ai @xi @xj
In this framework, Maxwell’s equations are given by d? F ¼ 0;
dF ¼ 0
½4
where ? is the Hodge star operator, mapping 2-forms to 2-forms (in R4 ), and is defined by pffiffiffiffiffiffi ð? FÞk‘ ¼ 12 jgj"ijk‘ Fij where g = det(gij ) and "ijk‘ is the completely antisymmetric symbol defined by "ijk‘ = sgn(ijk‘). As usual, indices are raised (or lowered) via the metric, so that, for example, Fij ¼ g ‘i g mj Fjm It is important to notice that ? F depends on the metric. Note also that Maxwell’s equations are linear equations for the Ai ’s. The YM equations generalize Maxwell’s equations and can be described as follows. With each YM field (described below) is associated a compact Lie group G called the gauge group. For such G, we denote its Lie algebra by g , defined to be the tangent space at the identity of G. Now let A be a g -valued 1-form A ¼ Ai dxi
bi
where each Ai is in g . In this case, the curvature 2-form is defined by F ¼ dA þ A ^ A or, in local coordinates, Fij ¼
@Aj @Ai þ ½Ai ; Aj @xi @xj
Dirac Fields in Gravitation and Nonabelian Gauge Theory 69
The commutator [Ai , Aj ] = 0 if G is an abelian group, but is generally nonzero if G is a matrix group. In this framework, the YM equations can be written in the form d? F = 0, where now d is an appropriately defined covariant exterior derivative. For Maxwell’s equations, the gauge group G = U(1) (the circle group {ei : 2 R}) so g is abelian and we recover Maxwell’s equations from the YM equations. Observe that if G is nonabelian, then the YM equations d? F = 0 are nonlinear equations for the connection coefficients Ai . The Dirac Equation in Curved Spacetime
The Dirac equation is a generalization of Schro¨dinger’s equation, in a relativistic setting (Bjorken and Drell 1964). It thus combines quantum mechanics with the theory of relativity. In addition, the Dirac equation also describes the intrinsic ‘‘spin’’ of fermions and, for this reason, solutions of the Dirac equation are often called spinors. The Dirac equation can be written as
it is also independent of H. By generalizing ¯ 0 = jj2 , in the expression (due to Dirac), ¯ the adjoint Minkowski space, where 0 and , spinors, are defined by 0 1 1 0 ¼ 0 A; 0 ¼ @ 0 1 where denotes complex conjugation, and 1 is the ¯ j j is 2 2 identity matrix, the quantity G interpreted as the probability density of the Dirac particle. We normalize solutions of the Dirac equation by requiring j ¼ 1 ½9
bi
ðG mÞ ¼ 0
½5
Spherically Symmetric EDYM Equations In the remainder of this article we assume that all fields are spherically symmetric, so they depend only on the variable r = jxj. In this case, the Lorentzian metric in polar coordinates (t, r, , ’) takes the form [2]. The Dirac wave function can be (Finster et al. 2000b) described by two real functions, ( (r), (r)), and the potential W(r) corresponds to the magnetic component of an SU(2) YM field. As shown in Finster et al. (2000b), the EDYM equations are pffiffiffiffi 0 w ½10 A ¼ ðm þ !TÞ
r bi
where G is the Dirac operator, m is the mass of the Dirac particle (fermion), and is a complex-valued 4-vector called the wave function, or spinor. The Dirac operator G is of the form G ¼ iGj ðxÞ
@ þ BðxÞ @xj
½6
where Gj as well as B are 4 4 matrices, pffiffiffiffiffiffiffi m is the (rest) mass of the fermion, and i = 1. The Dirac equation is thus a linear equation for the spinors. The Gj (called Dirac matrices) and the Lorentzian metric gij are related by g jk I ¼ 12 fGj ; Gk g
bi
pffiffiffiffi 0 w A ¼ ðm þ !TÞ
r
rA0 ¼ 1 A
½7
where {Gj ,Gk } is the anticommutator j
k
j
k
k
fG ; G g ¼ G G þ G G
Thus, the Dirac matrices depend on the underlying metric in four-dimensional spacetime. Suppose that H is a spacelike hypersurface in R 4 , with future-directed normal vector = (x), and let d be the invariant measure on H induced by the metric gij . We define a scalar product on solutions , of the Dirac equation by Z j j d hji ¼ ½8 G H
This scalar product is positive definite, and because of current conservation (cf. Finster (1988)) bi
j ¼ 0 rj G
1 ð1 w2 Þ2 e2 r2
2!T 2 ð 2 þ 2 Þ j
2rA0
½11
2 Aw02 e2
T0 1 ð1 w2 Þ2 ¼1þAþ 2 e T r2 2 2 þ 2mTð Þ 2!T 2 ð 2 þ 2 Þ T 2 þ 4 w 2 Aw02 r e rAw00 ¼ ð1 w2 Þw þ e2 rT
A0 T 2AT 0 0 w r2 2T
½12
½13
½14
Equations [10] and [11] are the Dirac equations, [12] and [13] are the Einstein equations, and [14] is
70
Dirac Fields in Gravitation and Nonabelian Gauge Theory
the YM equation. The constants m, !, and e denote, respectively, the rest mass of the Dirac particle, its energy, and the YM coupling constant. Nonexistence of Black Hole Solutions
Let the surface r = > 0 represent a black hole event horizon: AðÞ ¼ 0;
AðrÞ > 0 if r >
½15
In this case, the normalization condition [9] is replaced by pffiffiffiffi Z 1 T 2 2 dr < 1; for every r0 > ½16 ð þ Þ A r0 In addition, we assume that the following global conditions hold: lim r 1 AðrÞ ¼ M < 1 ½17 r!1
crossing the horizon. Assumption 3 is considerably weaker than the corresponding assumption in Finster et al. (1999b), where, indeed, it was assumed that the function A(r) obeyed a power law A(r) = c(r )s þ O((r )sþ1 ), with positive constants c and s, for r > . The main result in this subsection is the following theorem: bi
Theorem 1 Every black hole solution of the EDYM equations [10]–[14] satisfying the regularity conditions 1–3 cannot be normalized and coincides with a Bartnik–McKinnon (BM) black hole of the corresponding Einstein–Yang–Mills (EYM) equations; that is, the spinors and must vanish identically outside the event horizon. bi
Remark Smoller and Wasserman (1998) proved that any black hole solution of the EYM equations that has finite mass (i.e., that satisfies [17]) must be one of the BM black hole solutions (Bartnik and McKinnon 1988) whose existence was first demonstrated in Smoller et al. (1993). Thus, amending the EYM equations by taking quantum-mechanical effects into account – in the sense that both the gravitational and YM fields can interact with Dirac particles – does not yield any new types of black hole solutions. bi
bi
(finite mass), lim TðrÞ ¼ 1
r!1
½18
(gravitational field is asymptotically flat Minkowskian), and lim wðrÞ2 ; w0 ðrÞ ¼ ð1; 0Þ
r!1
½19
(the YM field is well behaved). Concerning the event horizon r = , we make the following regularity assumptions: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1. The volume element j det gij j = j sin jr2 A1 T 2 is smooth and nonzero on the horizon; that is, T 2 A1 ; T 2 A 2 C1 ð½; 1ÞÞ
The present strategy in proving this theorem is to assume that we have a black hole solution of the EDYM equations [10]–[18] satisfying assumptions 1–3, where the spinors do not vanish identically outside of the black hole. We shall show that this leads to a contradiction. The proof is broken up into two cases: either A1=2 is integrable or nonintegrable near the event horizon. We shall only discuss the proof for the case when A1=2 is integrable near the event horizon, leaving the alternate case for the reader to view in Finster et al. (2000a). If A1=2 is integrable, then one shows that there are positive constants c, " such that bi
2. The strength of the YM field Fij is given by trðFij Fij Þ ¼
2Aw02 ð1 w2 Þ2 þ r4 r4
bi
(cf. Bartnik and McKinnon 1988). We assume that this scalar is bounded near the horizon; that is, outside the event horizon and near r = , assume that w and Aw02 are bounded
½20
3. The function A(r) is monotone increasing outside of and near the event horizon. bi
As discussed in Finster et al. (1999a), if assumption 1 or 2 were violated, then an observer freely falling into the black hole would feel strong forces when
1 c 2 ðrÞ þ 2 ðrÞ ; c
if
0
Historically, the first relations of that type to be obtained were the Kramers–Kro¨nig relations (1926), which concern the propagation of light in a dielectric medium. In this basic example, F(!) represents the complex refractive index of the medium n0 (!) = n(!) þ i(!) for a monochromatic wave with frequency !. The dispersive part D(!) is the real refractive index n(!), which is the inverse ratio of the phase velocity of the wave in the medium to its velocity c in the vacuum: the fact that it depends on the frequency ! corresponds precisely to the phenomenon of dispersion of light in a dielectric medium. A slab of the latter thus appears as a prototype of a macroscopic scatterer. The absorptive part A(!) is the rate of exponential
damping (!) of the wave, caused by the absorption of energy in the medium. It has appeared much later that for many scattering phenomena, dispersion relations can be derived from an appropriate set of general physical principles. This means that inside a certain axiomatic framework these relations are model independent with respect to the detailed structure of the scatterer or to the detailed type of particle interaction in the quantum case. In a very short and oversimplifying way, the following logical scheme holds. At first, one can say that any mathematical formulation of a physical principle of causality results in support-type properties with respect to a time variable t of an appropriate ‘‘causal structural function’’ R(t) of the physical system considered: typically, such a causal function should vanish for negative values of t. It ~ admits an follows that its Fourier transform R (c) ~ analytic continuation R in the upper half-plane of the corresponding conjugate variable, interpreted as a frequency (or an energy in the quantum case): here is the general reason for the occurrence of complex frequencies and of holomorphic functions of such variables. In fact, the relevant holomorphic ^ (c) ) always appears as generscattering function F(! (c) ~ ated by R via some (more or less sophisticated) ^ coincides with R ~ (c) procedure: in the simplest case, F itself, but this is not so in general. Finally, the derivation of suitable analyticity and boundedness ^ (c) ) in a domain whose typical form properties of F(! is the upper half-plane, allows one to apply a Cauchy-type integral representation to this function; the dispersion relations directly follow from the latter. The first part of this article aims to describe the most typical dispersion relations and their link with the Cauchy integral. It then presents two basic illustrations of these relations, which are: (1) in classical physics, the Kramers–Kro¨nig relations mentioned above, and (2) in quantum physics, the dispersion relations for the forward scattering of equal-mass particles. The aim of the subsequent parts is to give as complete as possible accounts of the derivation of the relevant analyticity domains inside appropriate axiomatic frameworks which, respectively, contain the previous two examples. The simplest axiomatic framework is the one which governs all the phenomena of linear response: in the latter, the proof of analyticity and dispersion relations most easily follows the logical line sketched above. It will be presented together with its application to the derivation of
88
Dispersion Relations
the Kramers–Kro¨nig relations. The rest of the article is devoted to the derivation of the so-called crossing analyticity domains which are the relevant background of dispersion relations for the twoparticle scattering (or collision) amplitudes in particle physics. This derivation relies on the general axiomatic framework of relativistic quantum field theory (QFT) (see Axiomatic Quantum Field Theory) and more specifically on the ‘‘analytic program in complex momentum space’’ of the latter. This framework, whose rigorous mathematical form has been settled around 1960, represents the safest conceptual approach for describing the particle collision processes in a range of energies which covers by far all those that can be produced and will be produced in the accelerators for several decades. A simple account of the fieldtheoretical axiomatic framework and of the logical line of the derivation of dispersion relations will be presented here for the simplest kinematical situations. A broader presentation of the analytic program including an extended class of analyticity properties for the general structure functions and (two-particle and multiparticle) collision amplitudes in QFT can also be found in this encyclopedia (see Scattering in Relativistic Quantum Field Theory: The Analytic Program). For brevity, we shall not treat here the derivation of dispersion relations in the framework of nonrelativistic potential theory. Concerning the latter, the interested reader can refer to the book by Nussenzweig (1972). A collection of old basic papers on fieldtheoretical dispersion relations can be found in the review book edited by Klein (1961). For a recent and well-documented review of the multiplicity of versions and applications of dispersion relations and their experimental checking, the reader can consult the article by Vernov (1996). bi
bi
bi
Typical Dispersion Relations The possibility of defining the scattering function ^ (c) ) in the full upper half-plane and of exploiting F(! ^ on the the corresponding boundary value F of F negative part as well as on the positive part of the real axis will depend on the framework of considered phenomena. For the moment, we do not consider the more general situations which also occur in particle physics and will be described later (‘‘crossing domains’’ and ‘‘quasi-dispersion-relations’’). In the simplest cases, the real and imaginary parts D and A of F are extended to negative values of the variable ! via additional symmetry relations resulting from appropriate ‘‘reality conditions.’’ As a typical and basic example, there occurs the
(c) ), (with !(c) and ^ ^ (c) ) = F(! symmetry relation F(! !(c) in the upper half-plane) and correspondingly D(!) = D(!), A(!) = A(!) on the reals; we shall call (S) this symmetry relation. The simplest case of dispersion relations is then obtained when D and A are linked by the reciprocal Hilbert transformations: Z þ1 1 1 Dð!Þ ¼ P d!0 Að!0 Þ 0 ½1a ! ! 1
1 Að!Þ ¼ P
Z
þ1
Dð!0 Þ
1
!0
1 d!0 !
½1b
where P denotes Cauchy’s principal value, defined for any differentiable function ’(x) (sufficiently regular at infinity) by Z þ1 ’ðxÞ dx P x 1 Z " Z þ1 dx dx ¼ lim ’ðxÞ þ ’ðxÞ ½2 "!0 x x 1 " As a matter of fact, the pair of equations [1a], [1b] is : equivalent to the following relation for F ¼ D þ iA: Z þ1 1 1 Fð!Þ ¼ d!0 Fð!0 Þ lim 0 ½3 "!0 ! ! i" 2i 1 The latter is obtained as a limiting case of the Cauchy formula Z þ1 Fð!0 Þ ^ ðcÞ Þ ¼ 1 d!0 ½4 Fð! 0 2i 1 ! !ðcÞ ^ is holomorphic and expressing the fact that F sufficiently decreasing at infinity in the upper halfplane I þ of the complex variable !(c) and that F(!) ^ (c) ) on all the reals. is the boundary value of F(! Finally, one checks that in view of the symmetry relation (S), the Hilbert integral relations between D and A given above reduce to the following dispersion relations: Z þ1 2 !0 Dð!Þ ¼ P Að!0 Þ 02 d!0 ½5a 2 ! ! 0 Að!Þ ¼
2! P
Z 0
þ1
Dð!0 Þ
1 d!0 !02 !2
½5b
Two Basic Examples
1. The Kramers–Kro¨nig relation in classical optics It will be shown in the next part that the complex refractive index n0 (!) = n(!) þ i(!) of a dielectric medium is the boundary value of a holomorphic
Dispersion Relations
^0 (!(c) ) in I þ satisfying function n the symmetry relation R1 (S), and such that the integral 1 j^ n0 (! þ i) 1j2 d! is uniformly bounded for all > 0. It follows that all the previous relations are ^ (c) ) = n ^0 (!(c) ) 1. satisfied by the function F(! In particular, the real refractive index n(!) and the : ‘‘extinction coefficient’’ (!) ¼ 2!(!)=c (c being the velocity of light in the vacuum) are linked by the following Kramers–Kro¨nig dispersion relation (corresponding to eqn [5a]): nð!Þ 1 ¼
c P 2
Z
ð!0 Þ d!0 !2
!02
½6
2. Dispersion relation for the forward two-particle scattering amplitude in relativistic quantum physics One considers the following collision phenomenon in particle physics. A particle 2 with mass m, called the target and sitting at rest in the laboratory, is collided by an identical particle 1 with relativistic energy ! larger than m (= mc2 ; in high-energy physics, one usually chooses units such that c = 1). After the collision, the particle 1 is scattered in all possible directions, , of space, according to a certain quantum scattering amplitude T (!), whose modulus is essentially the rate of probability for detecting 1 in the direction . The forward scattering amplitude T0 (!) corresponds to the detection of 2 in the forward longitudinal direction with respect to its incidence direction towards the target. Let us also assume that the particles carry no charge of any kind, so that each particle coincides with its ‘‘antiparticle.’’ In that case, T0 (!) is shown to be the boundary value ^ 0 (!(c) ) enjoying the followof a scattering function T ing properties: 1. it is a holomorphic function in I þ satisfying the symmetry relation (S); 2. its behavior at infinity in I þ is such that the integral 2 Z 1 ^ T0 ð! þ iÞ d! 2 1 ð! þ iÞ is uniformly bounded for all > 0; and 3. under more specific assumptions on the mass spectrum of the subjacent theory, the ‘‘absorptive : part’’ A(!) ¼ Im T0 (!) vanishes for j!j < m. : Then by applying eqn [5a] to the function D(!) ¼ Re[(T0 (!) T0 (0))=!2 ] (regular at ! = 0), one obtains the following dispersion relation: Re T0 ð!Þ 2!2 ¼ T0 ð0Þ þ P
Z
þ1 m
Að!0 Þ
1 d!0 !0 ð!02 !2 Þ
½7
89
Remark In view of (3), the scattering function ^ 0 (!(c) ) admits an analytic continuation as an even T ^ 0 ) in the cut-plane function of !(c) (still called T : C(cut) ¼ Cn{! 2 R; j!j m}. In fact, in view of (S) m ^ 0 satisfies the and (3), the boundary value T0 of T : relation T0 (!) = T0 (!) in the real interval m ¼ {! 2 R; m < ! < m}. Let us then introduce the : ^ ^ (!(c) ) ¼ T0 (!(c) ) as a holomorphic function T 0 function of !(c) in I : one sees that the boundary ^ 0 and T ^ from the respective domains values of T 0 I þ and I coincide on m and therefore admit a common analytic continuation throughout this real interval (in view of ‘‘Painleve´’s lemma’’ or ‘‘onedimensional edge-of-the-wedge theorem’’). One also notes that in view of (S) the extended func^ 0 satisfies the ‘‘reality condition’’ tion T (c) ^ ^ 0 (!(c) ) in C(cut) . The fact that T ^ 0 is well T0 (! ) = T m defined as an even holomorphic function in the cutplane C(cut) has been established in the general m framework of QFT, as explained in the last part of this article.
Phenomena of Linear Response: Causality and Dispersion Relations in the Classical Domain The subsequent axiomatic framework and results (due to J S Toll (1952, 1956)) concern any physical system which exhibits the following type of phenomena: whenever it receives some excitation signal, called the input and represented by a real-valued function of time fin (t) with compact support, the system emits a response signal, called the output and represented by a corresponding real-valued function fout (t), in such a way that the following postulates are satisfied: (P1) Linearity. To every linear combination of inputs a1 fin, 1 þ a2 fin, 2 , there corresponds the output a1 fout, 1 þ a2 fout, 2 . (P2) Reproductibility or time-translation invariance. Let be a time-translation parameter taking arbitrary real values; to every ‘‘time-translated : input’’ fin() (t) ¼ fin (t ), there corresponds : () the output fout (t) ¼ fout (t ). (P3) Causality. The effect cannot precede the cause, namely if tin and tout denote respectively the lower bounds of the supports of fin (t) and fout (t), then there always holds the inequality tin tout . (P4) Continuity of the response. There exists some continuity inequality which expresses the fact that a certain norm of the output is majorized by a corresponding norm of the input. The case of an L2 -norm inequality of the
90
Dispersion Relations
form jfout j jfin j is Rparticularly significant: : when the norm jf j ¼ [ jf (t)j2 dt]1=2 is interpretable as an energy (for the output as well as for the input), it acquires the meaning of a ‘‘dissipation’’ property of the system. The postulate of linear dependence (P1) of fout with respect to fin is obviously satisfied if the response is described by any general kernel K(t, t0 ) such that the following formula makes sense: Z þ1 fout ðtÞ ¼ Kðt; t0 Þfin ðt0 Þdt0 ½8 1
Conversely, the existence of a distribution kernel K can be established rigorously under the continuity assumption postulated in (P4) by using the Schwartz nuclear theorem. In full generality (see our comment in the next paragraph), the kernel K(t, t0 ) appears to be a tempered distribution in the pair of variables (t, t0 ) and the previous integral formula holds in the sense of distributions, which means that both sides of eqn [8] must be considered as tempered distributions (in t) acting on any smooth test-function g(t) in the Schwartz space S. (Note, for instance, that the trivial linear application fout = fin is represented by the kernel K(t, t0 ) = (t t0 )). From the reproductibility postulate (P2), it follows that the distribution K can be identified with a distribution of the single variable = t t0 , namely : K(t, t0 ) ¼ R(t t0 ). Moreover, the real-valuedness condition imposed to the pairs (fin , fout ) entails that R is real. Finally, the causality postulate (P3) implies that the support of the distribution R is contained in the positive real axis, so that one can write, in the sense of distributions, Z t fout ðtÞ ¼ Rðt t0 Þfin ðt0 Þdt0 ½9 1
The convolution kernel R(t t0 ) is typically what one calls in physics a ‘‘retarded kernel.’’ If we now introduce the frequency variable !, which is the conjugate of the time variable t, by the Fourier transformation Z þ1 ~f ð!Þ ¼ f ðtÞ ei!t dt 1
we see that the convolution equation [9] is equivalent to the following one: ~fout ð!Þ ¼ Rð!Þ ~ ~fin ð!Þ
½10
~ In the latter, the Fourier transform R(!) of R is a tempered distribution, which is the boundary value from the upper half-plane I þ of a holomorphic
~ (c) (!(c) ), called the Fourier–Laplace transfunction R ~ (c) is defined for all !(c) = ! þ i, with form of R. R > 0, by the following formula in which the exponential is a good test-function for the distribution R (since exponentially decreasing for t ! þ1): Z þ1 ðcÞ ðcÞ ðcÞ ~ R ð! Þ ¼ RðtÞei! t dt ½11 0
More precisely, the tempered-distribution character of ~ (c) is of R is strictly equivalent to the fact that R moderate growth both at infinity and near the reals in I þ , namely that it satisfies a majorization of the following form for some real positive numbers p and q: ~ ðcÞ ð! þ iÞj C jR
ð1 þ j!j2 þ 2 Þq p
½12
We thus conclude from eqn [10] that each phenomenon of linear response is represented very simply in the frequency variable by the multiplicative operator ~ ~ (c) (!(c) ) is called R(!), whose analytic continuation R the (causal) response function. A Typical Illustration: The Damped Harmonic Oscillator
We consider the motion x = x(t) of a damped harmonic oscillator of mass m submitted to an external force F(t). The force is the input (fin = F) and the resulting motion is the output, namely fout (t) = x(t). All the previous general postulates (P1)–(P4) are then satisfied, but this particular model is, of course, governed by its dynamical equation x00 ðtÞ þ 2 x0 ðtÞ þ !20 xðtÞ ¼
FðtÞ m
½13
where !0 is the eigenfrequency of the oscillator and is the damping constant ( > 0). The relevant solution of this second-order differential equation with constant coefficients is readily obtained in ~ ~(!) of x(t) and F(!) terms of the Fourier transforms x of F(t). One can in fact replace eqn [13] by the equivalent equation ð!2 2i ! þ !20 Þ~ xð!Þ ¼
~ Fð!Þ m
½14
~(!) = whose solution is of the form [10], namely x ~ F(!), ~ R(!) with ~ Rð!Þ ¼ ¼
mð!2
~ Fð!Þ þ 2i ! !20 Þ ~ Fð!Þ
mð! !1 Þð! !2 Þ
!1;2 ¼ ð!20 2 Þ1=2 i
½15a
½15b
Dispersion Relations
It is clear that the rational function defined by eqns [15] admits an analytic continuation in the full complex plane of !(c) minus the pair of simple poles (!1 , !2 ) which lie in the lower half-plane. In particular, it is holomorphic (and decreasing at infinity) in I þ , as expected from the previous general result. Moreover, this example suggests that for any particular phenomenon of linear response, the details of the dynamics are encoded in the singularities of the ~ (c) (!(c) ), which all holomorphic scattering function R lie in the lower half-plane. The validity of a dispersion relation only expresses the analyticity (and decrease at infinity) of that function in the upper half-plane, which is model independent. Remark The same mathematical analysis applies to any electric oscillatory circuit, in which the capacitance, inductance, and resistance are involved in place of the parameters m, !0 and : fin and fout correspond respectively to an external electric potential and to the current induced in the circuit; the response function is the admittance of the circuit. Application to the Kramers–Kro¨nig Relation
The background of the Kramers–Kro¨nig relation [6], namely the analyticity and boundedness properties ^0 (!(c) ) in I þ , is of the complex refractive index n provided by the previous axiomatic framework. ^0 (!(c) ) itself but However, it is not the quantity n appropriate functions of the latter which play the role of causal response functions; two phenomena can in fact be exhibited, which both contribute to ^0 (!(c) ). proving the relevant properties of n 1. Propagation of light in a dielectric slab with thickness . One considers the wave front fin (t) of an incoming wave normally incident upon the slab, with Fourier decomposition Z 1 þ1 ~ fin ðtÞ ¼ ½16 f ð!Þ ei!t d! 2 1 After having traveled through the medium, it gives rise to an outgoing wave fout (t) on the exit face of the slab, whose Fourier decomposition can be written as follows (provided the thickness of the slab is very small): Z 1 þ1 ~ 0 fout ðtÞ ¼ ½17 f ð!Þ ei!ðtn ð!Þ=cÞ d! 2 1 In the latter, the real part of n0 (!)=c is the inverse of the light velocity in the medium, while its imaginary part takes into account the exponential damping of the wave. The output fout thus appears as a causal
91
linear response with respect to fin (since fout ‘‘starts after’’ fin ). According to the general formula [10], ~ (c) can be the corresponding response function R directly computed from eqns [16] and [17], which yields: ~ ðcÞ ð!ðcÞ Þ ¼ ei!ðcÞ n^0 ð!ðcÞ Þ=c R
½18 ~ (c) R
In view of the previous axiomatic analysis, has to be holomorphic and of moderate growth in I þ , and since this holds for all ’s sufficiently small, it ^0 (!(c) ) itself is can be shown that the function n holomorphic and of moderate growth in I þ (no logarithmic singularity can be produced). 2. Polarization of the medium produced by an electric field. The dielectric polarization signal P(t) produced at a point of a medium by an external electric field E(t) is also a phenomenon of linear response which obeys the postulates (P1)–(P4); the corresponding formula [10] reads ~ ~ Pð!Þ ¼ 0 ð!ÞEð!Þ
½19a
where 0 is the complex dielectric susceptibility of the medium, which is related to n0 by Maxwell’s relation ~0 ð!Þ ¼
½n02 ð!Þ 1 4
½19b
One thus recovers the fact that 0 admits an analytic continuation in I þ ; one can also show by a physical argument that ˜ 0 (!), and thereby n0 (!) 1, tends to zero as a constant divided by !2 when ! tends to infinity. This behavior at infinity extends to ^0 (!(c) ) 1 in I þ in view of the Phragmen–Lindelo¨f n ^0 is known (from (1)) to be of theorem, since n moderate growth. This justifies the analytic background of Kramers–Kro¨nig’s relation.
From Relativistic QFT to the Dispersion Relations of Particle Physics: Historical Considerations and General Survey In the quantum domain, the derivation of dispersion relations for the two-particle scattering (or collision) amplitudes of particle physics has represented, since 1956 and throughout the 1960s, an important conceptual progress for the theoretical treatment of that branch of physics. These phenomena are described in a quantum-theoretical framework in which the basic kinematical variables are the energies and momenta of the particles involved. These variables play the role of the frequency of light in the optical scattering phenomena. Moreover, since large energies and momenta are involved, which allow the occurrence of particle creation
92
Dispersion Relations
according to the conservation laws of special relativity, it is necessary to use a relativistic quantummechanical framework. Around 1950, the success of the quantum electrodynamics formalism for computing the electron–photon, electron–electron, and electron–positron scattering amplitudes revealed the importance of the concept of relativistic quantum field for the understanding of particle physics. However, the methods of perturbation theory, which had ensured the success of quantum electrodynamics in view of the small value of the coupling parameter of that theory (namely the electric charge of the electron), were at that time inapplicable to the strong nuclear interaction phenomena of high-energy physics. This failure motivated an important school of mathematical physicists for working out a modelindependent axiomatic approach of relativistic QFT (e.g., Lehmann, Symanzik, Zimmermann (1954), Wightman (1956), and Bogoliubov (1960); see Axiomatic Quantum Field Theory). Their main purpose was to provide a conceptually satisfactory treatment of relativistic quantum collisions, at least for the case of massive particles. Among various postulates expressing the invariance of the theory under the Poincare´ group in an appropriate quantummechanical Hilbert-space framework, the approach basically includes a certain formulation of the principle of causality, called microcausality or local commutativity. This axiomatic approach of QFT was followed by a conceptually important variant, namely the algebraic approach to QFT (Haag, Kastler, Araki 1960), whose most important developments are presented in the book by Haag (1992) (see Algebraic Approach to Quantum Field Theory). From the historical viewpoint, and in view of the analyticity properties that they also generate, one can say that all these (closely related) approaches parallel the axiomatic approach of linear response phenomena with, of course, a much higher degree of complexity. In particular, the characterization of scattering (or collision) amplitudes in terms of appropriate structure functions of the basic quantum fields of the theory is a nontrivial preliminary step which was taken at an early stage of the theory under the name of ‘‘asymptotic theory and reduction formulae’’ (Lehmann, Symanzik, Zimmermann 1954 –57, Haag–Ruelle 1962, Hepp 1965). There again, in the field-theoretical axiomatic framework, causality generates analyticity through Fourier–Laplace transformation, but several complex variables now play the role which was played by the complex frequency in the axiomatics of linear response phenomena: they are obtained by complexifying the relativistic energy– momentum variables of the (Fourier transforms of the) quantum fields involved in the high-energy bi
collision processes. In fact, the holomorphic functions which play the role of the causal response function ~ R(!) are the QFT structure functions or ‘‘Green functions in energy–momentum space.’’ The study of all possible analyticity properties of these functions resulting from the QFT axiomatic framework is called the analytic program (see Scattering in Relativistic Quantum Field Theory: The Analytic Program). The primary basic scope of the latter concerns the derivation of analyticity properties for the scattering functions of two-particle collision processes, which appears to be a genuine challenge for the following reason. The basic Einstein relation E = mc2 , which applies to all the incoming and outgoing particles of the collisions, operates as a geometrical constraint on the corresponding physical energy–momentum vectors: according to the Minkowskian geometry, the latter have to belong to mass hyperboloids, which define the so-called ‘‘mass shell’’ of the collision considered. It is on the corresponding complexified mass-shell manifold that the scattering functions are required to be defined as holomorphic functions. In the analytic program of QFT, the derivation of such analyticity domains and of corresponding dispersion relations in the complex plane of the squared total energy variable, s, of each given collision process then relies on techniques of complex geometry in several variables. As a matter of fact, the scattering amplitude is a function (or distribution) of two variables F(s, t), where t is a second important variable, called the squared momentum transfer, which plays the role of a fixed parameter for the derivation of dispersion relations in the variable s. The value t = 0 corresponds to the special kinematical situation which has been described above (for the case of equal-mass particles 1 and 2 ) under the name of forward scattering and the variable s is a simple affine function of the energy ! of the colliding particle 1 in the laboratory Lorentz frame, (namely s = 2m2 þ 2m! in the equalmass case). It is for the corresponding scattering : : amplitude T0 (!) ¼ F0 (s) ¼ F(s, t)jt = 0 that a dispersion relation such as eqn [7] can be derived, although this derivation is far from being as simple as for the phenomena of linear response in classical physics: even in that simplest case, it already necessitates the use of analytic completion techniques in several complex variables. The first proof of this dispersion relation was performed by K Symanzik in 1956. In the case of general kinematical situations of measurements, the direction of observation of the scattered particle includes a nonzero angle with the incidence direction, which always corresponds to a negative value of t. The derivation of dispersion relations at fixed t = t0 < 0, namely for the scattering amplitude
Dispersion Relations
: Ft0 (s) ¼ F(s, t)jt = t0 requires further arguments of complex geometry, and it is submitted to subtle limitations of the form t1 < t0 0, where t1 depends on the mass spectrum of the particles involved in the theory. The first rigorous proof of dispersion relations at t < 0 was performed by N N Bogoliubov in 1960. Three conceptually important features of the dispersion relations in particle physics deserve to be pointed out. 1. In comparison with the dispersion relations of classical optics, a feature which appears to be new is the so-called ‘‘crossing property,’’ which is characteristic of high-energy physics since it relies basically on the relativistic kinematics. According to that property, the boundary values of the analytic scattering ^t (s) at positive and negative values of s function F from the respective half-planes Im s > 0 and Im s < 0 are interpreted, respectively, as the scattering amplitudes of two physically different collision processes, which are deduced from each other by replacing the incident particle by the corresponding antiparticle; one also says that ‘‘these two collision processes are related by crossing.’’ A typical example is provided by the proton–proton and proton–antiproton collisions, whose scattering amplitudes are therefore mutually related by the property of analytic continuation. This type of relationship between the values of the scattering function at positive and negative values of s generalizes in a nontrivial way the symmetry relation (S) satisfied by the forward ^ 0 (!(c) ) when each particle coinscattering function T cides with its antiparticle (see the second basic example above). No nontrivial crossing property ^ 0 is an holds in that special case and the fact that T (c) even function of ! precisely expresses the identity of the two-collision processes related by crossing. In the general case, for t = 0 as well as for t = t0 < 0 for any value of t0 , the analyticity domain that one obtains for the scattering function is not the full cutplane of s: in its general form, a ‘‘crossing domain’’ may exclude some bounded region Bt0 from the cutplane, but it always contains an infinite region which is the exterior of a circle minus cuts along the two infinite parts of the real s-axis (Bros, Epstein, Glaser 1965): these cuts are along the physical regions of the two collision processes related by crossing. In that ^t (s) still satisfies general case, the scattering function F what can be called a quasi-dispersion-relation, in which the right-hand side contains an additional Cauchy integral, taken along the boundary of Bt . 2. A second important feature concerns the behavior at large values of s of the scattering ^t (s) in their analyticity domain. As functions F indicated in the presentation of the second basic
93
example, a ‘‘precise-increase’’ property was expected to be satisfied by the forward scattering amplitude T0 (!) for ! (or s) tending to infinity. This ‘‘precise-increase’’ property implied the necessity of writing the corresponding dispersion relation [7] for the function (T0 (!) T0 (0))=!2 : this is what one calls a ‘‘dispersion relation with a subtraction.’’ As a matter of fact, the existence of such restrictive bounds on the total cross sections at high energies had been discovered in 1961 by M Froissart: his derivation relied basically on the use of the unitarity of the scattering operator (expressing the quantum principle of conservation of probabilities), but also on a strong analyticity postulate for the scattering function not implied by the general field-theoretical approach (namely the Mandelstam domain of ‘‘double dispersion relations’’). In the general framework of QFT, Froissarttype bounds appeared to be closely linked to a further nontrivial extension of the range of ‘‘admis^t (s) can be analytically sible’’ values of t for which F continued in a cut-plane or crossing domain. In fact, the extension of this range to positive (i.e., ‘‘unphysical’’) and even complex values of t, and as a second step the proof of Froissart-type bounds in s( log s)2 for Ft (s) at all these admissible values of t, were performed in 1966 by A Martin. They rely on a subtle conspiracy of the analyticity properties deduced from the QFT axiomatic framework and of positivity and unitarity properties expressing the basic Hilbertian structure of the quantum collision theory. The consequence of these bounds on the exact form of the dispersion relations is that, as in formula [7] of the case t = 0, it is justified to write a (the so-called ‘‘subtracted’’) dispersion relation for (Ft (s) Ft (0) sFt0 (0))=s2 : for the general case when the crossing property replaces the symmetry (S), such a dispersion relation involves two subtractions (since Ft0 (0) 6¼ 0). Detailed information concerning the interplay of analyticity and unitarity on the mass shell and the derivation of refined forms of dispersion relations and various boundedness properties for the scattering functions are given in the book by Martin (1969). 3. Constraints imposed by dispersion relations and experimental checks. The conceptual importance of dispersion relations incorporating the above features (1) and (2) is displayed by such spectacular application as the relationship between the high-energy behaviors of proton–proton and proton–antiproton cross sections. Even though the closest forms of relationship between these cross sections (e.g., the existence of equal high-energy limits) necessitate for their proof some extra assumption concerning, for instance, the behavior bi
94
Dispersion Relations
of the ratio between the dispersive and absorptive parts of the forward scattering amplitude, one can speak of an actual model-independent implication of general QFT that imposes nontrivial constraints on phenomena. Otherwise stated, checking experimentally the previous type of relationship up to the limits of high energies imposed by the present technology of accelerators constitutes an indirect, but important test of the validity of the general principles of QFT. As a matter of fact, it has also appeared frequently in the literature of high-energy physics during the last 40 years that the Froissart bound by itself was considered as a key criterion to be satisfied by any sensible phenomenological model in particle physics. As already stated above, the Froissart bound is one of the deepest consequences of the analytic program of general QFT, since its derivation also incorporates in the most subtle way the quantum principle of probability conservation. Would it be only for the previous basic results, the derivation of dispersion relations (and, more generally, the results of the analytic program) in QFT appear as an important conceptual bridge between a fundamental theoretical framework of relativistic quantum physics and the phenomenology of high-energy particle physics.
Basic Concepts and Main Steps in the QFT Derivation of Dispersion Relations The rest of this article outlines the derivation of the analytic background of dispersion relations for the forward scattering amplitudes in the framework of axiomatic QFT. After a brief introduction on relativistic scattering processes and the problematics of causality in particle physics, it gives an account of the Wightman axioms and the simplest reduction formula which relates the forward scattering amplitude to a retarded product of the field operators. Then it describes how the latter can be used for justifying a certain type of analyticity domain for the forward scattering functions, namely a crossing domain or in the best cases a cut-plane in the squared energy variable s. This is the basic result that allows one to write dispersion relations (or quasi-dispersion-relations) at t = 0; the exact form of the latter, including at most two subtractions, relies on the use of Hilbertian positivity and of the unitarity of the scattering operator. Relativistic Quantum Scattering as a Phenomenon of Linear Response
Collisions of quantum particles may be seen as phenomena of linear response, but in a way which
differs greatly from what has been previously described. Particles in Minkowskian geometry Each state of a relativistic classical particle with mass m is characterized by its energy–momentum vector or 4-momentum p = (p0 , p) satisfying the mass-shell : condition p2 ¼ p20 p2 = m2 (in units such that c = 1). In view of the condition of positivity of the energy p0 > 0 the ‘‘physical mass shell’’ thus coinþ cides with the positive sheet Hm of the mass 2 hyperboloid Hm with equation p = m2 . The set of all energy–momentum configurations characterizing the collisions of two relativistic classical particles with initial (resp. final) 4-momenta p1 , p2 (resp. p01 , p02 ) is the mass-shell manifold M defined by the conditions p2i ¼ m2 ;
2 p02 i ¼m ;
p1 þ p2 ¼
p01
þ
pi; 0 > 0;
p0i; 0 > 0;
i ¼ 1; 2
p02
where the latter equation expresses the relativistic law of total energy–momentum conservation. M is an eight-dimensional manifold, invariant under the (six-dimensional) Lorentz group: the orbits of this group that constitute a foliation of M are parametrized by two variables, namely the squared total energy s = (p1 þ p2 )2 = (p01 þ p02 )2 and the squared momentum transfer t = (p1 p01 )2 = (p2 p02 )2 (or u = (p1 p02 )2 = 4m2 s t). In these variables, called the Mandelstam variables, the ‘‘physical region’’ of the collision is represented by the set of pairs (s, t) (or triplets (s, t, u) with s þ t þ u = 4m2 ) such that t 0, u 0, and therefore s 4m2 . Correspondingly, each state of a relativistic quantum particle with mass m is characterized by a wave þ packet ^f (p) on Hm , which is an element of unit norm þ of L2 (Hm ; m (p)), with m (p) = dp=(p2 þ m2 )1=2 . In Minkowskian spacetime with coordinates x = (x0 , x), any such state is represented by a wave function f (x) whose Fourier transform is the tempered distribution þ ^ (with support in Hm ) f (p) (p2 m2 ): f (x) is a positive-energy solution of the Klein–Gordon equation (@ 2 =@x20 x þ m2 )f (x) = 0. A free two-particle þ state is a symmetric wave packet ^f (p1 , p2 ) on Hm þ þ þ Hm in the Hilbert space L2 (Hm Hm ; m m ). Scattering kernels as response kernels: distribution character While the input to be considered is a free þ þ wave packet ^fin (p1 , p2 ) on Hm Hm , representing the preparation of an initial two-particle state, the output corresponds to the detection of a final two-particle state also characterized by a wave packet ^gout (p01 , p02 ) þ þ on Hm Hm . In quantum mechanics, linearity is linked to the ‘‘superposition principle’’ of states,
Dispersion Relations
which allows one to state that collisions are described by a certain bilinear form (^fin , ^ gout ) ! S(^fin , ^gout ), called the ‘‘scattering matrix.’’ This bilinear form is bicontinuous with respect to the Hilbertian norms of the wave packets, and it then results from the Schwartz nuclear theorem that it is represented by a distribution kernel S(p1 , p2 ; p01 , p02 ), namely a tempered distribution with support contained in M, in such a way that (formally) Sð^fin ; ^gout Þ ¼
Z
^fin ðp1 ; p2 Þ^gout ðp0 ; p0 ÞSðp1 ; p2 ; p0 ; p0 Þ 1 2 1 2
m ðp1 Þm ðp2 Þm ðp01 Þm ðp02 Þ
½20
If there were no interaction, S(^fin , ^ gout ) would reduce to the Hilbertian scalar product in L2 þ þ (Hm Hm ; m m ) and the corresponding kernel S would be the identity kernel 1 I p1 ; p2 ; p01 ; p02 ¼ p1 p01 p2 p02 2 þ p1 p02 p2 p01 In the general case, the interaction is therefore : described by the scattering kernel T(p1 , p2 ; p01 , p02 ) ¼ 0 0 0 0 S(p1 , p2 ; p1 , p2 ) I(p1 , p2 ; p1 , p2 ). The action of T as a bilinear form (defined in the same way as the action of S in eqn [20]) may be seen as the quantum analog of the classical response formula [10]. Note, however, the difference in the mathematical treatment of the output: instead of being considered as the direct response (~fout ) to the input, it is now explored by Hilbertian duality in terms of detection wave packets ^ gout , in conformity with the principles of quantum theory. Finally, in view of the invariance of the collision process under the Lorentz group, the scattering kernel T is constant along the orbits of this group in M and it then defines a distribution : F(s, t) ¼ T(p1 , p2 ; p01 , p02 ) with support in the physical region : this is what is called the scattering amplitude. What becomes of causality? One can show that the positive-energy solutions of the Klein–Gordon equation cannot vanish in any open set of Minkowski spacetime; they necessarily spread out in the whole spacetime. This makes it impossible to formulate a causality condition comparable to eqn [9] in terms of the spacetime wave functions fin and gout corresponding to the input and output wave packets ^fin , ^ gout . In this connection, it is, however, appropriate to note that (after various attempts of ‘‘weak causality conditions’’) a certain condition called ‘‘macrocausality’’ (Iagolnitzer and Stapp 1969; see the book by Iagolnitzer (1992)) has been shown to be equivalent to some local properties of analyticity bi
95
of the scattering kernel T; but it is not our purpose to develop that point here for two reasons: (1) the interpretation of that condition is rather involved, because it integrates a very weak form of causality together with the spatial short-range character of the strong nuclear interactions between the elementary particles; (2) the domains of analyticity obtained are by far too small with respect to those necessary for writing dispersion relations. The reason for this failure is that the scattering kernel only represents an asymptotic quantum observable, in the sense that it is intended to describe observations far apart from the extremely small spacetime region where the particles strongly interact, namely in regions where this interaction is asymptotically small. Although well adapted to what is actually observed in the detection experiments, the concept of scattering kernel is not sufficient for describing the fundamental interactions of physics: it must be enriched by other theoretical concepts which might explicitly take into account the microscopic interactions in spacetime. This motivates the introduction of quantum fields as basic quantities in particle physics. Relativistic Quantum Fields: Microcausality and the Retarded and Advanced Kernels; Analyticity in Complex Energy–Momentum Space
By an idealization of the concept of quantum electromagnetic field and a generalization to all types of microscopic interactions of matter, one considers that all the phenomena involving such interactions can be described by fields i (x), whose amplitude can, in principle, be measured in arbitrarily small regions of Minkowski spacetime. In the quantum framework, one is thus led to the notion of local observable O (emphasized as a basic concept in the axiomatic approach of Araki, Haag, and Kastler). In the Wightman field-theoretical framework, a local observable corresponds to the measuring process of a ponderated R average of a field i (x) : of the form O ¼ i [f ] = i (x)f (x) dx. In the latter, f (x) denotes a smooth real-valued test-function with (arbitrary) compact support K in spacetime; the observable O is then said to be localized in K. Each observable O = i (f ) has to be a self-adjoint (unbounded) operator acting in (a dense domain of) the Hilbert space H generated by all the states of the system of fundamental fields {i }; therefore, the correct mathematical concept of relativistic quantum field (x) is an ‘‘operator-valued tempered distribution on Minkowski spacetime.’’ Here the additional ‘‘temperateness assumption’’ is a convenient technical assumption which in particular allows the passage to the energy–momentum space by making use of the Fourier transformation.
96
Dispersion Relations
In this QFT framework, it is natural to express a certain form of causality by assuming that two observables (f ) and (f 0 ) commute if the supports of f and f 0 are spacelike-separated regions in spacetime, which means that no signal with velocity smaller or equal to the velocity of light can propagate from either one of these regions to the other. This expresses the idea that these two observables should be independent, that is, ‘‘compatible as quantum observables.’’ This postulate is equivalent to the following condition, called microcausality or local commutativity, and understood in the sense of operator-valued tempered distributions: ½ðx1 Þ; ðx01 Þ ¼ 0;
for ðx1 x01 Þ2 < 0
½21
where (x1 x01 )2 is the squared Minkowskian pseudonorm of x = x1 x01 = (x0 , x), namely x2 = x20 x2 . It follows that for every admissible pair of states , 0 in H, the tempered distribution : C;0 ðx1 ; x01 Þ ¼
½22
has its support contained in the union of the sets V þ : x1 x01 2 V þ and V : x1 x01 2 V , where V þ and V are, respectively, the closures of the forward : and backward cones V þ ¼ {x = (x0 , x); x0 > jxj}, : þ V ¼ V in Minkowski spacetime. It is always possible to decompose the previous distribution as C;0 ðx1 ; x01 Þ ¼ R;0 ðx1 ; x01 Þ A;0 ðx1 ; x01 Þ
½23
in such a way that the supports of the distributions R, 0 (x1 , x01 ) and A, 0 (x1 , x01 ) belong, respectively, to V þ and V . R, 0 and A, 0 are called, respectively, retarded and advanced kernels and they are often formally expressed (for convenience) as follows: R;0 ðx1 ; x01 Þ ¼ ðx1;0 x01;0 ÞC;0 x1 ; x01 A;0 x1 ; x01 ¼ ðx1;0 x01;0 ÞC;0 x1 ; x01 in terms of the Heaviside step function (t) of the time-coordinate difference t = x1, 0 x01, 0 . For every pair (, 0 ), R, 0 (x1 , x01 ) appears as a relativistic generalization of the retarded kernel R(t t0 ) of eqn [10]: its support property in spacetime, similar to the support property of R in time, expresses a relativistic form of causality, or ‘‘Einstein causality.’’ There exists a several-variable extension of the theory of Fourier–Laplace transforms of tempered distributions which is based on a formula similar to eqn [11]. We introduce the vector variables X = (x1 þ x01 )=2, x = x1 x01 and a complex 4-momentum k = p þ iq = (k0 , k) as the conjugate
vector variable of x with respect to the Minkows: kian scalar product k ¼ k0 x0 k x, and we define Z x x
~ ðcÞ 0 ðk; XÞ ¼ R R ;0 X þ ; X eik x dx ½24 ; þ 2 2 V Since q x > 0 for all pairs (q, x) such that q 2 V þ , þ ~ (c) 0 (k, X) is holomorphic x 2 V , it follows that R , with respect to k in the domain T þ containing all k = p þ iq such that q belongs to V þ . Moreover, in the limit q ! 0 this holomorphic function tends (in the sense of distributions) to the Fourier transform ~ , 0 (p, X) of R, 0 (X þ x=2, X x=2) with respect R to x. The domain T þ , which is called the ‘‘forward tube,’’ is the analog of the domain I þ of the !-plane; bounds of moderate type comparable to those of [12] ~ (c) 0 in T þ . apply to the holomorphic function R , Similarly, the advanced kernel A, 0 (X þ x=2, X ~ (c) 0 (k, X), x=2) admits a Fourier–Laplace transform A , which is holomorphic and of moderate growth in the ‘‘backward tube’’ T containing all k = p þ iq such that q belongs to V . In view of [23], the Fourier ~ , 0 (p, X) of C, 0 (X þ x=2, X x=2) transform C then appears as the difference between the boundary ~ (c) ~ (c) values of R 1 , 2 and A1 , 2 on the reals (from the þ respective domains T and T ). The Field-Theoretical Axiomatic Framework and the Passage from the Structure Functions of QFT to the Scattering Kernels (Case of Forward Scattering)
The postulates (Wightman axioms) Apart from the causality postulate, which we have already presented above in view of its distinguished role for generating analyticity properties in complex energy–momentum space, the field-theoretical axiomatic approach to collision theory is based on the following postulates (for all the fundamental developments of axiomatic field theory, the interested reader may consult the books by Streater and Wightman (1980) and by Jost (1965); see Axiomatic Quantum Field Theory). bi
1. There exists a unitary representation g ! U(g) of the Poincare´ group G in the Hilbert space of states H; in this representation, the abelian subgroup of translations of space and time has a Lie algebra whose generators are interpreted as the four self-adjoint (commuting) operators P of total energy–momentum of the system. 2. The quantum field operators (x) transform covariantly under that representation; in the simplest case of scalar fields (considered here), (gx) = U(g)(x)U(g1 ). 3. There exists a unique state , called the vacuum, such that the action of all polynomials of field operators on generates a dense subset of H;
Dispersion Relations
moreover, is assumed to be invariant under the representation U of G, and thereby such that P = 0. 4. Spectral condition or positivity of energy in all physical states. The joint spectrum of the operators P is contained in the closed forward cone V þ of energy–momentum space. In order to perform the collision theory of massive particles, one needs a more detailed ‘‘mass-gap assumption’’: is the union of the origin O, of one or þ several positive sheets of hyperboloid Hm and i þ of a region VM defined by the conditions p2 M2 , p0 > 0, with M larger than all the mi . The Hilbert space H is correspondingly decomposed as the direct sum of the vacuum subspace (or zero-particle subspace) generated by , of subspaces of stable one-particle states with masses mi isoþ morphic to L2 (Hm , mi ), and of a remaining subi 0 space H . As a result of the construction of ‘‘asymptotic states,’’ H0 can be shown to contain two subspaces H0in and H0out , generated, respectively, by N-particle incoming states (with N arbitrary and 2) and by N-particle outgoing states. The collision operator S is then defined as the partially isometric operator from H0out onto H0in , which maps a reference basis of outgoing states onto the corresponding basis of incoming states. An independent postulate: asymptotic completeness (see Scattering, Asymptotic Completeness and Bound States and Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools) The theory is said to satisfy the property of asymptotic completeness if all the states of H can be interpreted as superpositions of various N-particle states (either in the incoming or in the outgoing state basis), namely if one has H0 = H0in = H0out . This property is not implied by the previous postulates on quantum fields, but its physical interpretation and its role in the analytic program are of primary importance (see Scattering in Relativistic Quantum Field Theory: The Analytic Program). Let us simply note here that asymptotic completeness implies as a by-product the unitarity property of the collision operator S on the full Hilbert space H0 (i.e., SS = S S = I). Connection between retarded kernel and scattering kernel for the forward scattering case; a simple ‘‘reduction formula’’ We consider the scattering of a particle 1 with mass m1 on a target consisting of a particle 2 with mass m2 and denote by T(p1 , p2 ; p01 , p02 ) the corresponding scattering kernel (defined similarly as for the case of equal-mass
97
particles considered earlier). Equations [22]–[24] are then applied to the case when and 0 coincide with a one-particle state of 2 at rest, namely with 4-momentum p2 = p02 along the time axis: p2 = ((p2 )0 , 0), (p2 )0 = m2 . This describes in a simple way the case of forward scattering, since in view of the energy–momentum conservation law p1 þ p2 = p01 þ p02 , the choice p2 = p02 also implies that p1 = p01 . (The possibility of restricting the distribution T(p1 , p2 ; p01 , p02 ) to such fixed values of the energy– momenta is shown to be mathematically well justified). The advantage of this simple case is that the corresponding kernels [22], [23] of (x1 , x01 ) are invariant under spacetime translations and therefore depend only on x (and not on X). We can thus rewrite eqns [22], [23] with simplified notations as follows: h x x i : Cp2 ðxÞ ¼ 2 2 ½25 ¼ Rp2 ðxÞ Ap2 ðxÞ which can be shown to give correspondingly by Fourier transformation ~ p ðpÞ ¼ 2 ~ p ðpÞ ~ p ðpÞ A ½26 ¼R 2
2
If the particle 1 appears in the asymptotic states of the field , the scattering kernel T(p1 , p2 ; p01 , p02 ) is then given in the forward configurations p1 = p01 2 þ þ Hm , p2 = p02 2 Hm , by the following reduction for1 2 mula in which s = (p1 þ p2 )2 : F0 ðsÞ ¼ Tðp1 ; p2 ; p1 ; p2 Þ ~ p ðp1 Þ ¼ p2 m2 R 1
1
2
þ jHm 1
½27
Analyticity Domains in Energy–Momentum Space: From the ‘‘Primitive Off-Shell Domains’’ of QFT to the Crossing Manifolds on the Mass Shell
For simplicity, we shall restrict ourselves to the consideration of forward scattering amplitudes, namely to the derivation of crossing analyticity domains and (quasi-)dispersion relations at t = 0 for two-particle collision processes of the form 1 þ 2 ! 1 þ 2 , 1 and 2 being given massive particles with arbitrary spins and charges. The holomorphic function Hp2 (k) and its primitive domain D. Nontriviality of dispersion relations for the scattering amplitudes As suggested by eqn [24], we can exploit the analyticity properties of the Fourier–Laplace transforms of the retarded and ~ p (p) and advanced kernels Rp2 and Ap2 : in fact, R 2
98
Dispersion Relations
~ p (p) are, respectively, the boundary values of the A 2 holomorphic functions Z ~ ðcÞ ðkÞ ¼ Rp2 ðxÞ eik x dx R p2 þ V Z ½28 ðcÞ ik x ~ Ap2 ðkÞ ¼ Ap2 ðxÞ e dx V
from the corresponding domains T þ and T . According to the reduction formula [27], it is appropriate to : consider correspondingly the functions Hpþ2 (k) ¼ (k2 (c) (c) : ~ (k), which are ~ (k) and H (k) ¼ (k2 m2 )A m21 )R p2 p2 p2 1 also, respectively, holomorphic in T þ and T . Then the forward scattering amplitude F0 (s) = F(s, 0) = T(p1 , p2 ; p1 , p2 ) appears as the restriction to the þ hyperboloid sheet p 2 Hm of the boundary value 1 þ þ Hp2 (p) of Hp2 (k) on the reals. Moreover, it can be seen that the two boundary ~ p (p) and H (p) = (p2 values Hpþ2 (p) = (p2 m21 )R 2 p2 2 ~ m1 )Ap2 (p) coincide as distributions in the region R ¼ fp 2 R4 ; ðp þ p2 Þ2 < ðm1 þ m2 Þ2 ; ðp p2 Þ2 < ðm1 þ m2 Þ2 g
½29
This follows from the intermediate expression in eqn [26] and from the fact that a state of the form (p2 m21 )(p)p2 > is a state of energy–momentum p þ p2 and therefore vanishes (in view of the spectral condition) if (p þ p2 )2 < (m1 þ m2 )2 (here we also use a simplifying assumption according to which no one-particle bound state is present in this channel). The situation obtained concerning the holomorphic functions Hpþ2 (k) and Hp2 (k) parallels (in complex dimension four) the case of a pair of holomorphic functions in the upper and lower halfplanes whose boundary values on the reals coincide on a certain interval playing the role of R. As in this one-dimensional case there is a theorem, called the ‘‘edge-of-the-wedge theorem’’ (see below), which implies that Hpþ2 (k) and Hp2 (k) have a common analytic continuation Hp2 (k): this function is holomorphic in a domain D which is the union of T þ , T and of a complex neighborhood of R; D is called the primitive domain of Hp2 (k). Moreover, it follows from the postulate of invariance of the field (x) under the action of the Poincare´ group (see postulate (2)) that the holomorphic function Hp2 (k) only depends of the two complex variables = k2 (= k20 k2 ) and k p2 or equivalently s = (k þ p2 )2 = þ m22 þ 2k p2 ; it thus defines a correspond: ^ p ( , s) ¼ ing holomorphic function H Hp2 (k) in the 2 image of D in these variables. In view of the reduction formula [27], the ^0 (s) should appear as the scattering function F
^ p ( , s) to restriction of the holomorphic function H 2 the physical mass-shell value = m21 . However, it turns out that the section of D by the complex massshell manifold M(c) with equation k2 = m21 is empty: this geometrical fact is responsible for the nontriviality of the proof of dispersion relations for ^0 (s) on the mass shell. In the physical quantity F fact, the tube T þ [ T which constitutes the basic part of the domain D and is given by the fieldtheoretical microcausality postulate, is a ‘‘purely offshell’’ complex domain, as it can be easily checked: if a complex point k = p þ iq is such that q2 > 0, the corresponding squared mass = k2 = p2 q2 þ 2ip q is real if and only if p q = 0, which implies p2 < 0 (i.e., p spacelike) and therefore = p2 q2 < 0. ‘‘Off-shell dispersion relations’’ as a first step The starting point, which is easy to obtain from the domain D, is the analyticity of the holomorphic ^ p ( , s) in a cut-plane of the variable s for function H 2 all negative values of the squared mass variable . This cut-plane is always the complement in C (i.e., the complex s-plane) of the union of the s-cut (s real (m1 þ m2 )2 ) and of the u-cut (u = 2 þ 2m22 s real (m1 þ m2 )2 ). This analyticity property thus justifies ‘‘off-shell dispersion relations’’ at fixed negative values of for the field-theoretical structure ^ p ( , s). function H 2 The latter property and the subsequent analysis ^p concerning the process of analytic continuation of H 2 to positive values of will be more easily understood geometrically if one reduces the complex space of k to a two-dimensional complex space, which is legitimate ^ p ( , s). in view of the equality Hp2 (k) = H 2 Having chosen the k0 -axis along p2 , we reduce the orthogonal space coordinates k of k to the radial variable kr . One thus gets the following expressions of the variables and s (resp. u): ¼ k20 k2r ;
s ¼ þ m22 þ 2m2 k0
ðresp: u ¼ þ m22 2m2 k0 Þ : ^ p ( , s) ¼ Then we can write H Hp2 (k0 , kr ) = 2 Hp2 (k0 , kr ), and describe the image Dr of the domain D in the variables k = (k0 , kr ) = p þ iq as Trþ [ Tr [ N (Rr ), where: : 1. Tr is defined by the condition q2 ¼ q20 q2r > 0, q0 > 0 or q0 < 0, 2. N is a complex neighborhood of the real region Rr defined as follows. Let hþ s , hu be the two branches of hyperbolae with respective equations: 2 2 2 hþ s : ðp0 þ m2 Þ pr ¼ ðm1 þ m2 Þ ; p0 þ m2 > 0 2 2 2 h u : ðp0 m2 Þ pr ¼ ðm1 þ m2 Þ ; p0 m2 < 0
Dispersion Relations
Then Rr is the intersection of the region situated below hþ s and of the region situated above hu . Let us now consider any complex hyperbola : h(c) [ ] with equation k2 ¼ k20 k2r = . On such a complex curve either one of the variables k0 or s or u is a good parameter for holomorphic functions which are even in kr , like Hp2 (k0 , kr ). If is real, any complex point k = p þ iq of h(c) [ ] is such that p2 and q2 have opposite signs (since p q = 0). Therefore, the sign of q2 is always opposite to the sign of (= p2 q2 ): if is negative, all the complex points of h(c) [ ] thus belong to Trþ [ Tr ; the union of all these points with the real points of h(c) [ ] in Rr is therefore a subset of Dr , which is represented in the complex plane of s by the cut-plane . The function ^ p ( , s) is therefore analytic (and univalent) in H 2 for each < 0. Moreover, the existence of moderate bounds of type [12] on Hp2 in D (resulting from the temperateness assumption) then implies the validity of dispersion relations (with subtractions) for ^ p ( , s) in . H 2 The problem of analytic completion to the complex mass-shell hyperbola h(c) [m21 ]: what is provided by the Jost–Lehmann–Dyson domain A basic fact in complex geometry in n variables, with n 2, is the existence of a distinguished class of domains, called holomorphy domains: for each domain U in this class, there exists at least one function which is holomorphic in U and cannot be analytically continued at any point of the boundary of U. In one dimension, every domain is a holomorphy domain. In dimension larger than one, a general domain U is not a holomorphy domain, but it ^ which is a admits a holomorphy envelope U, holomorphy domain containing U, such that every function holomorphic in U admits an analytic ^ continuation in U. It turns out that the domain Dr considered above in the last subsection) is not a holomorphy domain; ^ r (obtained geometrically its holomorphy envelope D by Bros, Messiah, and Stora in 1961) coincides with a domain introduced by Jost–Lehmann (1957) and Dyson (1958) by methods of wave equations. This domain can be characterized as the union of Dr with all the complex points of all the hyperbolae with equations (k0 a)2 (kr b)2 = c2 (for all a, b, c real, including the complex straight lines for which c = 0) whose both branches have a nonempty intersection with the real region Rr . In particular, one easily sees that all the hyperbolae h(c) [ ] with 0 < m21 belong to the previous class. It follows that for any in this positive ^ p ( , s) can still be interval, the function H 2
99
analytically continued as a holomorphic function of s in the cut-plane and thereby satisfies the corresponding dispersion relations. The physical mass shell hyperbola h(c) [m21 ] thus appears as a limiting case of the previous family (for tending to m21 from below). The analyticity of ^ p (m2 , s) in 2 can then be justified provided one H m1 2 1 knows that this function is analytic at at least one point of m2 : but this additional information results 1 from a more thorough exploitation of the analyticity properties resulting from the QFT postulates. This will be now briefly outlined below. Further information coming from the four-point function in complex momentum space It is possible to obtain further analyticity properties of : ^ p ( , s) ¼ H Hp2 (k) by considering the latter as 2 the restriction to the submanifold k1 = k3 = k; k2 = k4 = p2 of a master analytic function H4 (k1 , k2 , k3 , k4 ), called the four-point function of the field in complex energy–momentum space (see Scattering in Relativistic Quantum Field Theory: The Analytic Program). This function is holomorphic in a well-defined primitive domain D4 of the linear submanifold k1 þ k2 þ k3 þ k4 = 0. It is then possible to compute some local parts situated near the reals of the holomorphy envelope of D4 , which implies, as a by-product, that the function ^ p ( , s) can be analytically continued in a set of H 2 the form ¼ fð ; sÞ; 2 ; s 2 V s1 ð Þg [ fð ;sÞ; 2 ; u ¼ 2 þ 2m22 s 2 V u1 ð Þg
½30
with the following specifications: 1. is a domain in the -plane, which is a complex neighborhood of a real interval of the form a < < M21 ; here M1 denotes a spectral mass threshold in the theory such that M1 > m1 ; 2. for each , V s1 ( ) (resp. V u1 ( )) is a cutneighborhood in the s-plane of the real half-line s > s1 (resp. of the half-line u = 2 þ 2m22 s real >u1 ); s1 and u1 denote appropriate real numbers independent of . The final analytic completion: crossing domains on h(c) [m21 ]. Dispersion relations for 0 –0 meson scattering and ‘‘quasi-dispersion-relations’’ for proton–proton scattering We now wish to describe briefly the final step of analytic completion, which displays the existence of a ‘‘quasi-cut-plane domain’’ ^ p (m2 , s), even in the more in s for the function H 2 1 general case when the s-cut and u-cut are associated with different scattering channels, whose respective mass thresholds s = M212 and u = M02 12 are unequal.
100 Dispersion Relations
This general situation may occur as soon as one charged particle 1 of the s-channel is replaced by the corresponding antiparticle 1 in the u-channel, in contrast with the case of neutral particles (like the 0 meson) which coincide with their own antiparticles. Here it is important to note that the two real branches hþ [m21 ] and h [m21 ] of the mass shell hyperbola h(c) [m21 ] correspond, respectively, to the physical region of the ‘‘direct scattering channel’’ of the reaction 1 þ 2 ! 1 þ 2 with squared total energy s, and to the physical region of the ‘‘crossed scattering channel’’ of the reaction 1 þ 2 ! 1 þ 2 with squared total energy u. A typical and important example is the case of proton–proton scattering in the s-channel, where M12 equals twice the mass m(= m1 = m2 ) of the proton, while the corresponding u-channel refers to the proton–antiproton scattering, whose threshold M012 equals twice the mass of the meson. In that general case, the analysis of the subsection ‘‘‘Off-shell dispersion relations’ as a first step’’ still ^ p ( , s) is always applies, so that the function H 2 analytic in a set of the form S0 ¼ fð ; sÞ; a < < 0; s 2 g
½31
Then, the additional information described above in the last subsection allows one to use the following crucial property of analytic completion, which we call Crossing lemma If a function G( , s) is holomorphic in a domain which contains the union of the sets and S0 (see eqns [30] and [31]), then it admits an analytic continuation in a set of the following form: fð ; sÞ; 2 ; s 2 ; js m22 j ¼ ju m22 j > Rð Þg ^ p ( , s) By applying this property to the function H 2 2 and restricting to the mass-shell value m1 which belongs to , one obtains the analyticity of the : ^ ^0 (s) ¼ scattering function F Hp2 (m21 , s) in a crossing domain of the complex mass shell hyperbola h(c) [m21 ]: the crossing between the two physical regions hþ [m21 ] (s M212 ) and h [m21 ] (u M02 12 ) is ensured by a complex domain of h(c) [m21 ] whose image in the s-plane of is the ‘‘cut-neighborhood infinity’’ {s; s 2 m2 , s m21 m22 = u m21 m22 > R(m21 )}. 1 ^0 for Note that the relevant boundary values of F obtaining the scattering amplitudes of the two collision processes with respective physical regions hþ [m21 ] and h [m21 ] have to be taken from the respective sides Im s > 0 and Im u = Im s > 0 of the corresponding s- and u-cuts.
It is only for the neutral case, where M12 = M012 = m1 þ m2 , that a more favorable scenario occurs, as explained earlier: in this case, the interval { 2 ] a, 0[} of the set S0 is replaced by { 2 ] a, m21 [}, so that the whole cut-plane domain m2 is obtained in the result of the previous crossing 1 lemma. The scattering amplitudes of 0 –0 meson scattering and of meson–proton scattering enjoy this property and, therefore, satisfy genuine dispersion relations in which the scattering function is even (see the second basic example described at the beginning of this article). In the general case of crossing domains obtained above, corresponding Cauchy integral relations have been written and used under the name of ‘‘quasi-dispersion-relations.’’ Complementary results Some comments can now be added concerning the passage from the purely geometrical results (i.e., analyticity domains) described above to the writing of precise (quasi-) dispersion relations with two subtractions: Polynomial bounds and dispersion relations with N subtractions The previous methods of analytic completion also allow one to control the bounds at infinity in the relevant complex domains. As it has been noticed after eqn [24], the Fourier–Laplace transforms of the retarded and advanced kernels, and thereby the holomorphic functions Hp2 (k) discussed at the start of this section are bounded at most by a power of a suitable norm of k in their respective tubes T . Correspondingly, the holomorphic function Hp2 (k) (resp. Hp2 (k0 , kr )) admits the same type of bound in its primitive analyticity domain D (resp. Dr ). These bounds are a consequence of the tempered distribution character of the structure functions of the fields which is built-in in the Wightman fieldtheoretical framework. Then it can be checked that ^ r of Dr , and thereby in in the holomorphy envelope D the cut-plane (or crossing) domains obtained in the ^ r and of the complex mass shell intersection of D (c) 2 h [m1 ], the same type of power bound is still valid: ^0 (s) is therefore bounded by some power jsjN1 of jsj F and thus satisfies a (quasi-)dispersion relation with N subtractions. The same type of argument holds for all the similar cut-domains (or crossing domains) in s ^t (s) for all negative value of t. obtained for F It is also worthwhile to mention that a similar remarkable (since not at all predictable) result was also obtained in the Haag, Kastler, and Araki framework of algebraic QFT (Epstein, Glaser, Martin, 1969; see Scattering in Relativistic Quantum Field Theory: The Analytic Program for further comments). In this connection, one can also mention a more recent result. In the Buchholz–Fredenhagen axiomatic approach of charged fields (1982), in which
Dissipative Dynamical Systems of Infinite Dimension 101
locality is replaced by the more general notion of ‘‘stringlike locality’’ (see Algebraic Approach to Quantum Field Theory, Axiomatic Quantum Field Theory, and Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools), a proof of forward dispersion relations has again been obtained (Bros, Epstein, 1994). The extension of the analyticity domains by positivity and the derivation of bounds by unitarity (Martin 1966; see the book by Martin (1969)). The following ingredients have been used: bi
1. Positivity conditions on the absorptive part of F(s, t), which are expressed by the infinite set of inequalities (d=dt)n Im F(s, t)jt =0 0 (for all integers n), 2. The existence of a two-dimensional complex neighborhood of some point (s = s0 , t = 0) in the analyticity domain resulting from QFT. The following results have then been obtained: (a) It is justified to differentiate the forward (subtracted) dispersion relations with respect to t at any order. ^ t) can be analytically continued in a fixed (b) F(s, circle jtj < tmax for all values of s. The latter implies the extension of dispersion relations in s to positive (and complex) values of t. (c) In a last step, the use of unitarity conditions for the ‘‘partial waves’’ f‘ (s) of F(s, t) (see Scattering in Relativistic Quantum Field Theory: The Analytic Program) allows one to obtain
Froissart-type bounds on the scattering amplitudes and thereby to justify the writing of (quasi-)dispersion relations with at most two subtractions for all the admissible values of t. See also: Algebraic Approach to Quantum Field Theory; Axiomatic Quantum Field Theory; Perturbation Theory and its Techniques; Scattering in Relativistic Quantum Field Theory: The Analytic Program; Scattering, Asymptotic Completeness and Bound States; Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools.
Further Reading Haag R (1992) Local Quantum Physics.Berlin: Springer. Iagolnitzer D (1992) Scattering in Quantum Field Theories: The Axiomatic and Constructive Approaches. Princeton Series in Physics. Princeton: Princeton University Press. Jost R (1965) The General Theory of Quantized Fields. AMS, Providence, RI: American Mathematical Society. Klein L (ed.) (1961) Dispersion Relations and the Abstract Approach to Field Theory. New York: Gordon and Breach. Martin A (1969) Scattering Theory: Unitarity, Analyticity and Crossing, Lecture Notes in Physics. Berlin: Springer. Nussenzweig HM (1972) Causality and Dispersion Relations. New York: Academic Press. Streater RF and Wightman AS (1964, 1980) PCT, Spin and Statistics, and all that. Princeton: Princeton University Press. Vernov YuS (1996) Dispersion Relations in the Historical Aspect, IHEP Publications, Protvino Conf.: Fundamental Problems of High Energy Physics and Field Theory.
Dissipative Dynamical Systems of Infinite Dimension M Efendiev and S Zelik, Universita¨t Stuttgart, Stuttgart, Germany A Miranville, Universite´ de Poitiers, Chasseneuil, France ª 2006 Elsevier Ltd. All rights reserved.
Introduction A dynamical system (DS) is a system which evolves with respect to the time. To be more precise, a DS (S(t), ) is determined by a phase space which consists of all possible values of the parameters describing the state of the system and an evolution map S(t) : ! that allows one to find the state of the system at time t > 0 if the initial state at t = 0 is known. Very often, in mechanics and physics, the evolution of the system is governed by systems of
differential equations. If the system is described by ordinary differential equations (ODEs), d yðtÞ ¼ Fðt; yðtÞÞ; yð0Þ ¼ y0 ; dt yðtÞ :¼ ðy1 ðtÞ; . . . ; yN ðtÞÞ
½1
for some nonlinear function F : Rþ RN ! RN , we have a so-called finite-dimensional DS. In that case, the phase space is some (invariant) subset of RN and the evolution operator S(t) is defined by SðtÞy0 :¼ yðtÞ;
yðtÞ solves ½1
½2
We also recall that, in the case where eqn [1] is autonomous (i.e., does not depend explicitly on the time), the evolution operators S(t) generate a semigroup on the phase space , that is, Sðt1 þ t2 Þ ¼ Sðt1 Þ Sðt2 Þ;
t1 ; t2 2 Rþ
½3
102 Dissipative Dynamical Systems of Infinite Dimension
Now, in the case of a distributed system whose initial state is described by functions u0 = u0 (x) depending on the spatial variable x, the evolution is usually governed by partial differential equations (PDEs) and the corresponding phase space is some infinite-dimensional function space (e.g., := L2 () or := L1 () for some domain R N .) Such DSs are usually called infinite dimensional. The qualitative study of DSs of finite dimensions goes back to the beginning of the twentieth century, with the pioneering works of Poincare´ on the Nbody problem (one should also acknowledge the contributions of Lyapunov on the stability and of Birkhoff on the minimal sets and the ergodic theorem). One of the most surprising and significant facts discovered at the very beginning of the theory is that even relatively simple equations can generate very complicated chaotic behaviors. Moreover, these types of systems are extremely sensitive to initial conditions (the trajectories with close but different initial data diverge exponentially). Thus, in spite of the deterministic nature of the system (we recall that it is generated by a system of ODEs, for which we usually have the unique solvability theorem), its temporal evolution is unpredictable on timescales larger than some critical time T0 (which depends obviously on the error of approximation and on the rate of divergence of close trajectories) and can show typical stochastic behaviors. To the best of our knowledge, one of the first ODEs for which such types of behaviors were established is the physical pendulum parametrically perturbed by time-periodic external forces, y00 ðtÞ þ sinðyðtÞÞð1 þ " sinð!tÞÞ ¼ 0
½4
where ! and " > 0 are physical parameters. We also mention the more recent (and more relevant for our topic) famous example of the Lorenz system which is defined by the following system of ODEs in R3 : 8 0 > < x ¼ ðy xÞ y0 ¼ xy þ rx y ½5 > : 0 z ¼ xy bz where , r, and b are some parameters. These equations are obtained by truncation of the Navier–Stokes equations and give an approximate description of a horizontal fluid layer heated from below. The warmer fluid formed at the bottom tends to rise, creating convection currents. This is similar to what happens in the Earth’s atmosphere. For a sufficiently intense heating, the time evolution has a sensitive dependence on the initial conditions, thus representing a very irregular and chaotic
convection. This fact was used by Lorenz to justify the so-called ‘‘butterfly effect,’’ a metaphor for the imprecision of weather forecast. The theory of DSs in finite dimensions had been extensively developed during the twentieth century, due to the efforts of many famous mathematicians (such as Anosov, Arnold, LaSalle, Sinai, Smale, etc.) and, nowadays, much is known on the chaotic behaviors in such systems, at least in low dimensions. In particular, it is known that, very often, the trajectories of a chaotic system are localized, up to a transient process, in some subset of the phase space having a very complicated fractal geometric structure (e.g., locally homeomorphic to the Cartesian product of Rm and some Cantor set) which, thus, accumulates the nontrivial dynamics of the system (the so-called strange attractor). The chaotic dynamics on such sets are usually described by symbolic dynamics generated by Bernoulli shifts on the space of sequences. We also note that, nowadays, a mathematician has a large amount of different concepts and methods for the extensive study of concrete chaotic DSs in finite dimensions. In particular, we mention here different types of bifurcation theories (including the KAM theory and the homoclinic bifurcation theory with related Shilnikov chaos), the theory of hyperbolic sets, stochastic description of deterministic processes, Lyapunov exponents and entropy theory, dynamical analysis of time series, etc. We now turn to infinite-dimensional DSs generated by PDEs. A first important difficulty which arises here is related to the fact that the analytic structure of a PDE is essentially more complicated than that of an ODE and, in particular, we do not have in general the unique solvability theorem as for ODEs, so that even finding the proper phase space and the rigorous construction of the associated DS can be a highly nontrivial problem. In order to indicate the level of difficulties arising here, it suffices to recall that, for the three-dimensional Navier–Stokes system (which is one of the most important equations of mathematical physics), the required associated DS has not been constructed yet. Nevertheless, there exists a large number of equations for which the problem of the global existence and uniqueness of a solution has been solved. Thus, the question of extending the highly developed finite-dimensional DS theory to infinite dimensions arises naturally. One of the first and most significant results in that direction was the development of the theory of integrable Hamiltonian systems in infinite dimensions and the explicit resolution (by inverse-scattering methods) of several important conservative equations
Dissipative Dynamical Systems of Infinite Dimension 103
of mathematical physics (such as the Korteweg–de Vries (and the generalized Kadomtsev–Petiashvilli hierarchy), the sine-Gordon, and the nonlinear Schro¨dinger equations). Nevertheless, it is worth noting that integrability is a very rare phenomenon, even among ODEs, and this theory is clearly insufficient to understand the dynamics arising in PDEs. In particular, there exist many important equations which are essentially out of reach of this theory. One of the most important classes of such equations consists of the so-called dissipative PDEs which are the main subject of our study. As hinted by this denomination, these systems exhibit some energy dissipation process (in contrast to conservative systems for which the energy is preserved) and, of course, in order to have nontrivial dynamics, these models should also account for the energy income. Roughly speaking, the complicated chaotic behaviors in such systems usually arise from the interaction of the following mechanisms: 1. energy dissipation in the higher part of the Fourier spectrum; 2. external energy income in its lower part; 3. energy flux from lower to higher Fourier modes provided by the nonlinear terms of the equation. We chose not to give a rigorous definition of a dissipative system here (although the concepts of energy dissipation and related dissipative systems are more or less obvious from the physical point of view, they seem too general to have an adequate mathematical definition). Instead, we only indicate several basic classes of equations of mathematical physics which usually exhibit the above behaviors. The first example is, of course, the Navier–Stokes system, which describes the motion of a viscous incompressible fluid in a bounded domain (we will only consider here the two-dimensional case R2 , since the adequate formulation in three dimensions is still an open problem): ( @t u ðu; rx Þu ¼ x u þ rx p þ gðxÞ ½6 div u ¼ 0; ujt¼0 ¼ u0 ; uj@ ¼ 0 Here, u(t, x) = (u1 (t, x), u2 (t, x)) is the unknown velocity vector, p = p(t, x) is the unknown pressure, x is the Laplacian with respect to x, > 0 and g are given kinematic viscosity and external forces, respectively, P and (u, rx )u is the inertial term ([(u,rx )u]i = 2j=1 uj @xj ui , i = 1, 2). The unique global solvability of [6] has been proved by Ladyzhenskaya. Thus, this equation generates an infinite-dimensional DS in the phase space of divergence-free squareintegrable vector fields.
The second example is the damped nonlinear wave equation in Rn : @t2 u þ @t u x u þ f ðuÞ ¼ 0 uj@ ¼ 0; ujt¼0 ¼ u0 ; @t ujt¼0 ¼ u00
½7
which models, for example, the dynamics of a Josephson junction driven by a current source (sine-Gordon equation). It is known that, under natural sign and growth assumptions on the nonlinear interaction function f, this equation generates a DS in the energy phase space E of pairs of functions (u, @t u) such that @t u and rx u are square integrable. The last class of equations that we will consider here consists of reaction–diffusion systems in a domain R n : @t u ¼ ax u f ðuÞ;
ujt¼0 ¼ u0
½8
(endowed with Dirichlet (uj@ = 0) or Neumann (@n uj@ = 0) boundary conditions), which describes some chemical reaction in . Here, u = (u1 , . . . , uN ) is an unknown vector-valued function which describes the concentrations of the reactants, f (u) is a given interaction function, and a is a diffusion matrix. It is known that, under natural assumptions on f and a, these equations also generate an infinitedimensional DS, for example, in the phase space := [L1 ()]n . We emphasize once more that the phase spaces in all these examples are appropriate infinite-dimensional function spaces. Nevertheless, it was observed in experiments that, up to a transient process, the trajectories of the DS considered are localized inside a ‘‘very thin’’ invariant subset of the phase space having a complicated geometric structure which, thus, accumulates all the nontrivial dynamics of the system. It was conjectured a little later that these invariant sets are, in some proper sense, finite dimensional and that the dynamics restricted to these sets can be effectively described by a finite number of parameters. Thus (when this conjecture is true), in spite of the infinitedimensional initial phase space, the effective dynamics (reduced to this invariant set) is finite dimensional and can be studied by using the algorithms and concepts of the classical finite-dimensional DS theory. In particular, this means that the infinite dimensionality plays here only the role of (possibly essential) technical difficulties, which cannot, however, produce any new dynamical phenomena which are not observed in the finite-dimensional theory. The above finite-dimensional reduction principle of dissipative PDEs in bounded domains has been given solid mathematical grounds (based on the concept of the so-called global attractor) over the
104 Dissipative Dynamical Systems of Infinite Dimension
last three decades, starting from the pioneering papers of Ladyzhenskaya. This theory is considered in more detail here. The finite-dimensional reduction theory has some limitations. Of course, the first and most obvious restriction of this principle is the effective dimension of the reduced finite-dimensional DS. Indeed, it is known that, typically, this dimension grows at least linearly with respect to the volume vol() of the spatial domain of the DS considered (and the growth of the size of is the same (up to a rescaling) as the decay of the viscosity coefficient or the diffusion matrix a, see eqns [6]–[8]). So, for sufficiently large domains , the reduced DS can be too large for reasonable investigations. The next, less obvious, but much more essential, restriction is the growing spatial complexity of the DS. Indeed, as shown by Babin–Buinimovich, the spatial complexity of the system (e.g., the number of topologically different equilibria) grows exponentially with respect to vol(). Thus, even in the case of relatively small dimensions, the reduced system can be out of reasonable investigations, due to its extremely complicated structure. Therefore, the approach based on the finitedimensional reduction does not look so attractive for large domains. It seems, instead, more natural, at least from the physical point of view, to replace large bounded domains by their limit unbounded ones (e.g., = Rn or cylindrical domains). Of course, this approach requires a systematic study of dissipative DSs associated with PDEs in unbounded domains. The dynamical study of PDEs in unbounded domains started from the pioneering paper of Kolmogorov–Petrovskij–Piskunov, in which the traveling wave solutions of reaction–diffusion equations in a strip were constructed and the convergence of the trajectories (for specific initial data) to this traveling wave solutions were established. Starting from this, many results on the dynamics of PDE in unbounded domains have been obtained. However, for a long period, the general features of such dynamics remained completely unclear. The main problems arising here are: 1. the essential infinite dimensionality of the DS considered (absence of any finite-dimensional reduction), which leads to essentially new dynamical effects that are not observed in finitedimensional theories; 2. the additional spatial ‘‘unbounded’’ directions lead to the so-called spatial chaos and the interaction between spatial and temporal chaotic modes generates the spatio-temporal chaos, which also has no analog in finite dimensions.
Nevertheless, several ideas are mentioned in the following which (from authors’ point of view) were the most important for the development of these topics. The first one is the pioneering paper of Kirchga¨ssner, in which dynamical methods were applied to the study of the spatial structure of solutions of elliptic equations in cylinders (which can be considered as equilibria equations for evolution PDEs in unbounded cylindrical domains). The second is the Sinai–Buinimovich model of spacetime chaos in discrete lattice DSs. Finally, the third is the adaptation of the concept of a global attractor to unbounded domains by Abergel and Babin–Vishik. We note that the situation on the understanding of the general features of the dynamics in unbounded domains, however, seems to have changed in the last several years, due to the works of Collet–Eckmann and Zelik. This is the reason why a section of this review is devoted to a more detailed discussion on this topic. Other important questions are the object of current studies and we only briefly mention some of them. We mention for instance, the study of attractors for nonautonomous systems (i.e., systems in which the time appears explicitly). This situation is much more delicate and is not completely understood; notions of attractors for such systems have been proposed by Chepyzhov– Vishik, Haraux and Kloeden–Schmalfuss. We also mention that theories of (global) attractors for non-well-posed problems have been proposed by Babin–Vishik, Ball, Chepyzhov–Vishik, Melnik– Valero, and Sell.
Global Attractors and Finite-Dimensional Reduction Global Attractors: The Abstract Setting
As already mentioned, one of the main concepts of the modern theory of DSs in infinite dimensions is that of the global attractor. We give below its definition for an abstract semigroup S(t) acting on a metric space , although, without loss of generality, the reader may think that (S(t), ) is just a DS associated with one of the PDEs ([6]–[8]) described in the introduction. To this end, we first recall that a subset K of the phase space is an attracting set of the semigroup S(t) if it attracts the images of all the bounded subsets of , that is, for every bounded set B and every " > 0, there exists a time T (depending in general on B and ") such that the image S(t)B belongs to the
Dissipative Dynamical Systems of Infinite Dimension 105
"-neighborhood of K if t T. This property can be rewritten in the equivalent form lim distH ðSðtÞB; KÞ ¼ 0
½9
t!1
where distH (X, Y) := supx2X inf y2Y d(x, y) is the nonsymmetric Hausdorff distance between subsets of . The following definition of a global attractor is due to Babin–Vishik. Definition 1 A set A is a global attractor for the semigroup S(t) if (i) A is compact in ; (ii) A is strictly invariant: S(t)A = A, for all t 0; (iii) A is an attracting set for the semigroup S(t). Thus, the second and third properties guarantee that a global attractor, if it exists, is unique and that the DS reduced to the attractor contains all the nontrivial dynamics of the initial system. Furthermore, the first property indicates that the reduced phase space A is indeed ‘‘thinner’’ than the initial phase space (we recall that, in infinite dimensions, a compact set cannot contain, e.g., balls and should thus be nowhere dense). In most applications, one can use the following attractor’s existence theorem. Theorem 1 Let a DS (S(t), ) possess a compact attracting set and the operators S(t) : ! be continuous for every fixed t. Then, this system possesses the global attractor A which is generated by all the trajectories of S(t) which are defined for all t 2 R and are globally bounded. The strategy for applying this theorem to concrete equations of mathematical physics is the following. In a first step, one verifies a so-called dissipative estimate which has usually the form kSðtÞu0 k Qðku0 k Þ et þ C ;
u0 2
½10
where k k is a norm in the function space and the positive constants and C and the monotonic function Q are independent of t and u0 2 (usually, this estimate follows from energy estimates and is sometimes even used in order to ‘‘define’’ a dissipative system). This estimate obviously gives the existence of an attracting set for S(t) (e.g., the ball of radius 2C in ), which is, however, noncompact in . In order to overcome this problem, one usually derives, in a second step, a smoothing property for the solutions, which can be formulated as follows: kSð1Þu0 k1 Q1 ðku0 k Þ;
u0 2
½11
where 1 is another function space which is compactly embedded into . In applications, is
usually the space L2 () of square integrable functions, 1 is the Sobolev space H 1 () of the functions u such that u and rx u belong to L2 () and estimate [11] is a classical smoothing property for solutions of parabolic equations (for hyperbolic equations, a slightly more complicated asymptotic smoothing property should be used instead of [11]). Since the continuity of the operators S(t) usually arises no difficulty (if the uniqueness is proven), then the above scheme gives indeed the existence of the global attractor for most of the PDEs of mathematical physics in bounded domains. Dimension of the Global Attractor
In this subsection, we start by discussing one of the basic questions of the theory: in which sense is the dynamics on the global attractor finite dimensional? As already mentioned, the global attractor is usually not a manifold, but has a rather complicated geometric structure. So, it is natural to use the definitions of dimensions adopted for the study of fractal sets here. We restrict ourselves to the so-called fractal (or box-counting, entropy) dimension, although other dimensions (e.g., Hausdorff, Lyapunov, etc.) are also used in the theory of attractors. In order to define the fractal dimension, we first recall the concept of Kolmogorov’s "-entropy, which comes from the information theory and plays a fundamental role in the theory of DSs in unbounded domains considered in the next section. Definition 2 Let A be a compact subset of a metric space . For every " > 0, we define N" (K) as the minimal number of "-balls which are necessary to cover A. Then, Kolmogorov’s "-entropy H" (A) = H" (A, ) of A is the digital logarithm of this number, H" (A) := log2 N" (A). We recall that H" (A) is finite for every " > 0, due to the Hausdorff criterium. The fractal dimension df (A) 2 [0, 1] of A is then defined by df ðAÞ :¼ lim sup H" ðAÞ= log2 1="
½12
"!0
We also recall that, although this dimension coincides with the usual dimension of the manifold for Lipschitz manifolds, it can be noninteger for more complicated sets. For instance, the fractal dimension of the standard ternary Cantor set in [0, 1] is ln 2= ln 3. The so-called Mane´ theorem (which can be considered as a generalization of the classical Yitni embedding theorem for fractal sets) plays an important role in the finite-dimensional reduction theory.
106 Dissipative Dynamical Systems of Infinite Dimension
Theorem 2 Let be a Banach space and A be a compact set such that df (A) < N for some N 2 N. Then, for ‘‘almost all’’ (2N þ 1)-dimensional planes L in , the corresponding projector L : ! L restricted to the set A is a Ho¨lder continuous homeomorphism. Thus, if the finite fractal dimensionality of the attractor is established, then, fixing a hyperplane L satisfying the assumptions of the Mane´ theorem and projecting the attractor A and the DS S(t) restricted to A onto this hyperplane (A := L A ¯ := L S(t) 1 and S(t) L ), we obtain, indeed, a which is defined on a finite¯ reduced DS (S(t), A) dimensional set A L R 2Nþ1 . Moreover, this DS will be Ho¨lder continuous with respect to the initial data.
Estimates on the Fractal Dimension
Obviously, good estimates on the dimension of the attractors in terms of the physical parameters are crucial for the finite-dimensional reduction described above, and (consequently) there exists a highly developed machinery for obtaining such estimates. The best-known upper estimates are usually obtained by the so-called volume contraction method, which is based on the study of the evolution of infinitesimal k-dimensional volumes in the neighborhood of the attractor (and, if the DS considered contracts the k-dimensional volumes, then the fractal dimension of the attractor is less than k). Lower bounds on the dimension are usually based on the observation that the global attractor always contains the unstable manifolds of the (hyperbolic) equilibria. Thus, the instability index of a properly constructed equilibrium gives a lower bound on the dimension of the attractor. In the following, several estimates for the classes of equations given in the introduction are formulated, beginning with the most-studied case of the reaction–diffusion system [8]. For this system, sharp upper and lower bounds are known, namely C1 volðÞ df ðAÞ C2 volðÞ
½13
where the constants C1 and C2 depend on a and f (and, possibly, on the shape of ), but are independent of its size. The same types of estimates also hold for the hyperbolic equation [7]. Concerning the Navier–Stokes system [6] in general two-dimensional domains , the asymptotics of the fractal dimension as ! 0 is not known. The best-known upper bound has the form df (A) C 2 and was obtained by Foias–Temam by using the so-called Lieb–Thirring
inequalities. Nevertheless, for periodic boundary conditions, Constantin–Foias–Temam and Liu obtained upper and lower bounds of the same order (up to a logarithmic correction): C1 4=3 df ðAÞ C2 4=3 ð1 þ lnð 1 ÞÞ1=3
½14
Global Lyapunov Functions and the Structure of Global Attractors
Although the global attractor has usually a very complicated geometric structure, there exists one exceptional class of DS for which the global attractor has a relatively simple structure which is completely understood, namely the DS having a global Lyapunov function. We recall that a continuous function L : ! R is a global Lypanov function if 1. L is nonincreasing along the trajectories, that is, L(S(t)u0 ) L(u0 ), for all t 0; 2. L is strictly decreasing along all nonequilibrium solutions, that is, L(S(t)u0 ) = L(u0 ) for some t > 0 and u0 implies that u0 is an equilibrium of S(t). For instance, in the scalar case N = 1, the reaction–diffusion equations [8] R possess the2 global Lyapunov function L(u ) := [ajrx u0 (x)j þ F(u0 R v0 (x))]dx, where F(v) := 0 f (u) du. Indeed, multiplying eqn [8] by @t u and integrating over , we have d LðuðtÞÞ ¼ 2k@t uðtÞk2L2 ðÞ 0 dt
½15
Analogously, in the scalar case N = 1, multiplying the hyperbolic equation [7] by @t u(t) and integrating over , we obtain the standard global Lyapunov function for this equation. It is well known that, if a DS posseses a global Lyapunov function, then, at least under the generic assumption that the set R of equilibria is finite, every trajectory u(t) stabilizes to one of these equilibria as t ! þ1. Moreover, every complete bounded trajectory u(t), t 2 R, belonging to the attractor is a heteroclinic orbit joining two equilibria. Thus, the global attractor A can be described as follows: [ A¼ Mþ ðu0 Þ ½16 u0 2R
where Mþ (u0 ) is the so-called unstable set of the equilibrium u0 (which is generated by all heteroclinic orbits of the DS which start from the given equilibrium u0 2 A). It is also known that, if the equilibrium u0 is hyperbolic (generic assumption), then the set Mþ (u0 ) is a -dimensional submanifold of , where is the instability index of u0 . Thus, under the generic hyperbolicity assumption on the equilibria, the
Dissipative Dynamical Systems of Infinite Dimension 107
attractor A of a DS having a global Lyapunov function is a finite union of smooth finite-dimensional submanifolds of the phase space . These attractors are called regular (following Babin–Vishik). It is also worth emphasizing that, in contrast to general global attractors, regular attractors are robust under perturbations. Moreover, in some cases, it is also possible to verify the so-called transversality conditions (for the intersection of stable and unstable manifolds of the equilibria) and, thus, verify that the DS considered is a Morse–Smale system. In particular, this means that the dynamics restricted to the regular attractor A is also preserved (up to homeomorphisms) under perturbations. A disadvantage of the approach of using a regular attractor is the fact that, except for scalar parabolic equations in one space dimension, it is usually extremely difficult to verify the ‘‘generic’’ hyperbolicity and transversality assumptions for concrete values of the physical parameters and the associated hyperbolicity constants, as a rule, cannot be expressed in terms of these parameters. Inertial Manifolds
It should be noted that the scheme for the finitedimensional reduction described above has essential is ¯ drawbacks. Indeed, the reduced system (S(t), A) only Ho¨lder continuous and, consequently, cannot be realized as a DS generated by a system of ODEs (and reasonable conditions on the attractor A which guarantee the Lipschitz continuity of the Mane´ projections are not known). On the other hand, the complicated geometric structure of the attractor makes the use of this finite-dimensional A (or A) reduction in computations hazardous (in fact, only the heuristic information on the number of unknowns which are necessary to capture all the dynamical effects in approximations can be extracted). In order to overcome these problems, the concept of an inertial manifold (which allows one to embed the global attractor into a smooth manifold) has been suggested by Foias–Sell–Temam. To be more precise, a Lipschitz finite-dimensional manifold M is an inertial manifold for the DS (S(t), ) if 1. M is semiinvariant, that is, S(t)M M, for all t 0; 2. M satisfies the following asymptotic completeness property: for every u0 2 , there exists v0 2 M such that kSðtÞu0 SðtÞv0 k Qðku0 k Þet
½17
where the positive constant and the monotonic function Q are independent of u0 . We can see that an inertial manifold, if it exists, confirms in a perfect way the heuristic conjecture on the finite dimensionality formulated in the introduction. Indeed, the dynamics of S(t) restricted to an inertial manifold can be, obviously, described by a system of ODEs (which is called the inertial form of the initial PDE). On the other hand, the asymptotic completeness gives (in a very strong form) the equivalence of the initial DS (S(t), ) with its inertial form (S(t), M). Moreover, in turbulence, the existence of an inertial manifold would yield an exact interaction law between the small and large structures of the flow. Unfortunately, all the known constructions of inertial manifolds are based on a very restrictive condition, the so-called spectral gap condition, which requires arbitrarily large gaps in the spectrum of the linearization of the initial PDE and which can usually be verified only in one space dimension. So, the existence of an inertial manifold is still an open problem for many important equations of mathematical physics (including in particular the two-dimensional Navier–Stokes equations; some nonexistence results have also been proven by Mallet–Paret). Exponential Attractors
We first recall that Definition 1 of a global attractor only guarantees that the images S(t)B of all the bounded subsets converge to the attractor, without saying anything on the rate of convergence (in contrast to inertial manifolds, for which this rate of convergence can be controlled). Furthermore, as elementary examples show, this convergence can be arbitrarily slow, so that, until now, we have no effective way for estimating this rate of convergence in terms of the physical parameters of the system (an exception is given by the regular attractors described earlier for which the rate of convergence can be estimated in terms of the hyperbolicity constants of the equilibria. However, even in this situation, it is usually very difficult to estimate these constants for concrete equations). Furthermore, there exist many physically relevant systems (e.g., the so-called slightly dissipative gradient systems) which have trivial global attractors, but very rich and physically relevant transient dynamics which are automatically forgotten under the global-attractor approach. Another important problem is the robustness of the global attractor under perturbations. In fact, global attractors are usually only upper semicontinuous under
108 Dissipative Dynamical Systems of Infinite Dimension
perturbations (which means that they cannot explode) and the lower semicontinuity (which means that they cannot also implode) is much more delicate to prove and requires some hyperbolicity assumptions (which are usually impossible to verify for concrete equations). In order to overcome these difficulties, Eden– Foias–Nicolaenko–Temam have introduced an intermediate object (between inertial manifolds and global attractors), namely an exponential attractor (also called an inertial set). Definition 3 A compact set M is an exponential attractor for the DS (S(t), ) if (i) M has finite fractal dimension: df (M) < 1; (ii) M is semi-invariant: S(t)M M, for all t 0; (iii) M attracts exponentially the images of all the bounded sets B : distH ðSðtÞB; MÞ QðkBk Þet
½18
where the positive constant and the monotonic function Q are independent of B. Thus, on the one hand, an exponential attractor remains finite dimensional (like the global attractor) and, on the other hand, estimate [18] allows one to control the rate of attraction (like an inertial manifold). We note, however, that the relaxation of strict invariance to semi-invariance allows this object to be nonunique. So, we have here the problem of the ‘‘best choice’’ of the exponential attractor. We also mention that an exponential attractor, if it exists, always contains the global attractor. Although the initial construction of exponential attractors is based on the so-called squeezing property (and requires Zorn’s lemma), we formulate below a simpler construction, due to Efendiev– Miranville–Zelik, which is similar to the method proposed by Ladyzhenskaya to verify the finite dimensionality of global attractors. This is done for discrete times and for a DS generated by iterations of some map S : ! , since the passage from discrete to continuous times usually arises no difficulty (without loss of generality, the reader may think that S = S(1) and (S(t), ) is one of the DS mentioned in the introduction). Theorem 3 Let the phase space 0 be a closed bounded subset of some Banach space H and let H1 be another Banach space compactly embedded into H. Assume also that the map S : 0 ! 0 satisfies the following ‘‘smoothing’’ property: kSu1 Su2 kH1 Kku1 u2 kH ;
u1 ; u2 2 0 ½19
for some constant K independent of ui . Then, the DS (S, 0 ) possesses an exponential attractor. In applications, 0 is usually a bounded absorbing/attracting set whose existence is guaranteed by the dissipative estimate [10], H := L2 () and H1 := H 1 (). Furthermore, estimate [19] simply follows from the classical parabolic smoothing property, but now applied to the equation of variations (as in [11], hyperbolic equations require a slightly more complicated analogue of [19]). These simple arguments show that exponential attractors are as general as global attractors and, to the best of our knowledge, exponential attractors exist indeed for all the equations of mathematical physics for which the finite dimensionality of the global attractor can be established. Moreover, since A M, this scheme can also be used to prove the finite dimensionality of global attractors. It is finally worth emphasizing that the control on the rate of convergence provided by [18] makes exponential attractors much more robust than global attractors. In particular, they are upper and lower semicontinuous under perturbations (of course, up to the ‘‘best choice,’’ since they are not unique), as shown by Efendiev–Miranville–Zelik.
Essentially Infinite-Dimensional Dynamical Systems – The Case of Unbounded Domains As already mentioned in the introduction, the theory of dissipative DS in unbounded domains is developing only now and the results given here are not as complete as for bounded domains. Nevertheless, we indicate below several of the most interesting (from our point of view) results concerning the general description of the dynamics generated by such problems by considering a system of reaction– diffusion equations [8] in Rn with phase space = L1 (Rn ) as a model example (although all the results formulated below are general and depend weakly on the choice of equation). Generalization of the Global Attractor and Kolmogorov’s e-Entropy
We first note that Definition 1 of a global attractor is too strong for equations in unbounded domains. Indeed, as seen earlier, the compactness of the attractor is usually based on the compactness of the embedding H 1 () L2 (), which does not hold in unbounded domains. Furthermore, an attractor, in the sense of Definition 1, does not exist for most of the interesting examples of eqns [8] in Rn .
Dissipative Dynamical Systems of Infinite Dimension 109
It is natural to use instead the concept of the so-called locally compact global attractor which is well adapted to unbounded domains. This attractor A is only bounded in the phase space = L1 (Rn ), but its restrictions Aj to all bounded domains are compact in L1 (). Moreover, the attraction property should also be understood in the sense of a local topology in L1 (R n ). It is known that this generalized global attractor A exists indeed for problem [8] in Rn (of course, under some ‘‘natural’’ assumptions on the nonlinearity f and the diffusion matrix a). As for bounded domains, its existence is based on the dissipative estimate [10], the smoothing property [11], and the compactness of the embedding 1 Hloc (Rn ) L2loc (Rn ) (we need to use the local topology only to have this compactness). The next natural question that arises here is how to control the ‘‘size’’ of the attractor A if its fractal dimension is infinite (which is usually the case in unbounded domains). One of the most natural ways to handle this problem (which was first suggested by Chepyzhov–Vishik in the different context of uniform attractors associated with nonautonomous equations in bounded domains and appears as extremely fruitful for the theory of dissipative PDE in unbounded domains) is to study the asymptotics of Kolmogorov’s "-entropy of the attractor. Actually, since the attractor A is compact only in a local topology, it is natural to study the n entropy of its restrictions, say, to balls BR x0 of R of radius R centered at x0 with respect to the three parameters R, x0 , and ". A more or less complete answer to this question is given by the following estimate: H" ðAjBRx Þ CðR þ log2 1="Þn log2 1="
½20
0
where the constant C is independent of " 1, R, and x0 . Moreover, it can be shown that this estimate is sharp for all R and " under the very weak additional assumption that eqn [8] possesses at least one exponentially unstable spatially homogeneous equilibrium. Thus, formula [20] (whose proof is also based on a smoothing property for the equation of variations) can be interpreted as a natural generalization of the heuristic principle of finite dimensionality of global attractors to unbounded domains. It is also worth recalling that the entropy of the embedding of a ball R Bk of the space Ck (BR x0 ) into C(Bx0 ) has the n=k asymptotic H" (B) CR (1=") , which is essentially worse than [20]. So, [20] is not based on the smoothness of the attractor A and, therefore, reflects deeper properties of the equation.
Spatial Dynamics and Spatial Chaos
The next main difference with bounded domains is the existence of unbounded spatial directions which can generate the so-called spatial chaos (in addition to the ‘‘usual’’ temporal chaos arising under the evolution). In order to describe this phenomenon, it is natural to consider the group {Th , h 2 R n } of spatial translations acting on the attractor A: ðTh u0 ÞðxÞ :¼ u0 ðx þ hÞ;
Th : A ! A
½21
as a DS (with multidimensional ‘‘times’’ if n > 1) acting on the phase space A and to study its dynamical properties. In particular, it is worth noting that the lower bounds on the "-entropy that one can derive imply that the topological entropy of this spatial DS is infinite and, consequently, the classical symbolic dynamics with a finite number of symbols is not adequate to clarify the nature of chaos in [21]. In order to overcome this difficulty, it was suggested by Zelik to use Bernoulli shifts with an infinite number of symbols, belonging to the whole interval ! 2 [0, 1]. To be more precise,n let us consider the Cartesian product Mn := [0, 1]Z endowed with the Tikhonov topology. Then, this set can be interpreted as the space of all the functions v : Zn ! [0, 1], endowed with the standard local topology. We define a DS {T l , l 2 Zn } on Mn by ðT l vÞðmÞ :¼ vðm þ lÞ;
v 2 Mn ; l; m 2 Zn
½22
Based on this model, the following description of spatial chaos was obtained. Theorem 4 Let eqn [8] in = Rn possess at least one exponentially unstable spatially homogeneous equilibrium. Then, there exist > 0 and a homeomorphic embedding : Mn ! A such that Tl ðvÞ ¼ T l ðvÞ;
8l 2 Zn ; v 2 Mn
½23
Thus, the spatial dynamics, restricted to the set (Mn ), is conjugated to the symbolic dynamics on Mn . Moreover, there exists a dynamical invariant (the so-called mean toplogical dimension) which is always finite for the spatial DS [22] and strictly positive for the Bernoulli scheme Mn . So, the embedding [23] clarifies, indeed, the nature of chaos arising in the spatial DS [21]. Spatio-Temporal Chaos
To conclude, we briefly discuss an extension of Theorem 4, which takes into account the temporal modes and, thus, gives a description of the spatiotemporal chaos. In order to do so, we first note
110 Donaldson–Witten Theory
that the spatial DS [21] commutes obviously with the temporal evolution operators S(t) and, consequently, an extended (n þ 1)-parametric semigroup {S(t, h), (t, h) 2 R þ Rn } acts on the attractor: Sðt; hÞ :¼ SðtÞ Th ; t 2 Rþ ; h 2 Rn
Sðt; hÞ : A ! A ½24
Then, this semigroup (interpreted as a DS with multidimensional times) is responsible for all the spatio-temporal dynamical phenomena in the initial PDE [8] and, consequently, the question of finding adequate dynamical characteristics is of a great interest. Moreover, it is also natural to consider the subsemigroups SVk (t, h) associated with the k-dimensional planes Vk of the spacetime Rþ Rn , k < n þ 1. Although finding an adequate description of the dynamics of [24] seems to be an extremely difficult task, some particular results in this direction have already been obtained. Thus, it has been proved by Zelik that the semigroup [24] has finite topological entropy and the entropy of its subsemigroups SVk (t, h) is usually infinite if k < n þ 1. Moreover (adding a natural transport term of the form (L, rx )u to eqn [8]), it was proved that the analog of Theorem 4 holds for the subsemigroups SVn (t, h) associated with the n-dimensional hyperplanes Vn of the spacetime. Thus, the infinite-dimensional Bernoulli shifts introduced in the previous subsection can be used to describe the temporal evolution in unbounded domains as well. In particular, as a consequence of this embedding, the topological entropy of the initial purely temporal evolution semigroup S(t) is also infinite, which
indicates that (even without considering the spatial directions) we have indeed here essential new levels of dynamical complexity which are not observed in the classical DS theory of ODEs. See also: Dynamical Systems in Mathematical Physics: An Illustration from Water Waves; Ergodic Theory; Evolution Equations: Linear and Nonlinear; Fractal Dimensions in Dynamics; Inviscid flows; Lyapunov Exponents and Strange Attractors.
Further Reading Babin AV and Vishik MI (1992) Attractors of Evolution Equations. Amsterdam: North-Holland. Chepyzhov VV and Vishik MI (2002) Attractors for Equations of Mathematical Physics. American Mathematical Society Colloquium Publications, vol. 49. Providence, RI: American Mathematical Society. Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in the Theory of Solitons. Springer Series in Soviet Mathematics. Berlin: Springer. Hale JK (1988) Asymptotic Behavior of Dissipative Systems. Mathematical Surveys and Monographs, vol. 25. Providence, RI: American Mathematical Society. Katok A and Hasselblatt B (1995) Introduction to the Modern Theory of Dynamical Systems. Encyclopedia of Mathematics and its Applications, vol. 54. Cambridge: Cambridge University Press. Ladyzhenskaya OA (1991) Attractors for Semigroups and Evolution Equations. Cambridge: Cambridge University Press. Temam R (1997) Infinite-Dimensional Dynamical Systems in Mechanics and Physics. Applied Mathematical Sciences, 2nd edn., vol. 68. pp. New York: Springer. Zelik S (2004) Multiparametrical semigroups and attractors of reaction–diffusion equations in R n . Proceedings of the Moscow Mathematical Society 65: 69–130.
Donaldson Invariants see Gauge Theoretic Invariants of 4-Manifolds
Donaldson–Witten Theory M Marin˜o, CERN, Geneva, Switzerland ª 2006 Elsevier Ltd. All rights reserved.
Introduction bi
Since they were introduced by Witten in 1988, topological quantum field theories (TQFTs) have had a tremendous impact in mathematical physics (see Birmingham et al. (1991) and Cordes et al. for a review). These quantum field theories are bi
bi
constructed in such a way that the correlation functions of certain operators provide topological invariants of the spacetime manifold where the theory is defined. This means that one can use the methods and insights of quantum field theory in order to obtain information about topological invariants of low-dimensional manifolds. Historically, the first TQFT was Donaldson–Witten theory, also called topological Yang–Mills theory. This theory was constructed by Witten (1998) starting from N = 2 super Yang–Mills by a procedure called
Donaldson–Witten Theory
‘‘topological twisting.’’ The resulting model is topological and the famous Donaldson invariants of 4-manifolds are then recovered as certain correlation functions in the topological theory. The analysis of Witten (1998) did not indicate any new method to compute the invariants, but in 1994 the progress in understanding the nonperturbative dynamics of N = 2 theories (Seiberg and Witten 1994 a, b) led to an alternative way of computing correlation functions in Donaldson–Witten theory. As Witten (1994) showed, Donaldson–Witten theory can be reduced to another, simpler topological theory consisting of a twisted abelian gauge theory coupled to spinor fields. This theory leads to a different set of 4-manifold invariants, the so-called ‘‘Seiberg– Witten invariants,’’ and Donaldson invariants can be expressed in terms of these invariants through Witten’s ‘‘magic formula.’’ The connection between Seiberg–Witten and Donaldson invariants was streamlined and extended by Moore and Witten by using the so-called u-plane integral (Moore and Witten 1998). This has led to a rather complete understanding of Donaldson–Witten theory from a physical point of view. In this article we provide a brief review of Donaldson–Witten theory. First, we describe the construction of the model, from both a mathematical and a physical point of view, and state the main results for the Donaldson–Witten generating functional. In the next section, we present the basic results of the u-plane integral of Moore and Witten and sketch how it can be used to solve Donaldson–Witten theory. In the final section, we mention some generalizations of the basic framework. For a complete exposition of Donaldson–Witten theory, the reader is referred to the book by Labastida and Marin˜o (2005). A short review of the u-plane integral can be found in Marin˜o and Moore (1998a). bi
bi
bi
bi
where Fþ (A) is the self-dual part of the curvature, and G is the group of gauge transformations. To construct the Donaldson polynomials, one considers the universal bundle P ¼ ðV A Þ=ðG GÞ
p1 ðPÞ 2 H ðB Þ H ðXÞ
: Hi ðXÞ ! H4i ðB Þ
x 2 H0 ðXÞ ! OðxÞ 2 H4 ðMASD Þ S 2 H2 ðXÞ ! I2 ðSÞ 2 H 2 ðMASD Þ
bi
bi
MASD ¼ fA : Fþ ðAÞ ¼ 0g=G
½1
½5
If the manifold X has b1 (X) 6¼ 0, there are also cohomology classes associated to 1-cycles and 3-cycles, but we will not consider them here. We can now formally define the Donaldson invariants as follows. Consider the space AðXÞ ¼ SymðH0 ðXÞ H2 ðXÞÞ
½6
with a typical element written as x‘ Si1 Sip . The Donaldson invariant corresponding to this element of A(X) is the following intersection number: Z
ðx‘ Si1 Sip Þ
MASD
Donaldson theory as formulated in Donaldson (1990), Donaldson and Kronheimer (1990), and Friedman and Morgan (1991) starts with a principal G = SO(3) bundle V ! X over a compact, oriented, Riemannian 4-manifold X, with fixed instanton number k and Stiefel–Whitney class w2 (V) (SO(3) bundles on a 4-manifold are classified up to isomorphism by these topological data). The moduli space of anti-self-dual (ASD) connections is then defined as
½4
After restriction to MASD , we obtain the following differential forms on the moduli space of ASD connections:
¼
bi
½3
One can then obtain differential forms on B by taking the slant product of p1 (P) with homology classes in X. In this way we obtain the Donaldson map:
w ðVÞ;k
Donaldson–Witten Theory According to Donaldson
½2
where A is the space of irreducible G-connections on V. This is a G-bundle over B X, where B = A =G is the space of irreducible connections modulo gauge transformations, and as such has a Pontrjagin class
DX2
Donaldson–Witten Theory: Basic Construction and Results
111
O‘ ^ I2 ðSi1 Þ ^ ^ I2 ðSip Þ
½7
where MASD is the moduli space of ASD connections with second Stiefel–Whitney class w2 (V) and instanton number k. The integral in [7] will be different from zero only if the degrees of the forms add up to dim(MASD ). It is very convenient to pack all Donaldson invariants in a generating functional. Let {Si }i = 1,...,b2 be a basis of 2-cycles. We introduce the formal sum S¼
b2 X i¼1
vi Si
½8
112 Donaldson–Witten Theory
where vi are complex numbers. We then define the Donaldson–Witten generating functional as w ðVÞ
2 ZDW ðp; vi Þ ¼
1 X
w ðVÞ;k
DX2
ðepxþS Þ
½9
k¼0
where on the right-hand side we are summing over all instanton numbers, that is, we are summing over all topological configurations of the SO(3) gauge field with a fixed w2 (V). This gives a formal power series in p and vi . For bþ 2 (X) > 1, the generating functional [9] is a diffeomorphism invariant of X; therefore, it is potentially a powerful tool in four-dimensional topology. When bþ 2 (X) = 1, Donaldson invariants are metric dependent. The metric dependence can be described in more detail as follows. Define the period point as the harmonic 2-form satisfying !2 ¼ 1
! ¼ !;
½10
which depends on the conformal class of the metric. As the conformal class of the metric varies, ! describes a curve in the cone Vþ ¼ f! 2 H2 ðX; RÞ : !2 > 0g
½11
Let 2 H 2 (X) satisfy w2 ðVÞ mod 2;
(1996). On the other hand, Donaldson theory can be formulated as a topological field theory, and many of these results can be obtained by using quantum field theory techniques. This will be our main focus for the rest of the article. Donaldson–Witten Theory According to Witten bi
Witten (1988) constructed a twisted version of N = 2 super Yang–Mills theory which has a nilpotent Becchi–Rouet–Stora–Tyutin (BRST) charge (modulo gauge transformations) _
Q ¼ _ A Q_ A_
½15
where Q_ A_ are the supersymmetric (SUSY) charges. Here _ is a chiral spinor index and A has its origin in the SU(2) R-symmetry. The field content of the theory is the standard twisted N = 2 vector multiplet: A;
¼
_ ; ;
þ Dþ ; ¼
_ _ ;
; ¼
_ _
½16
where (1=2)Dþ dx dx is a self-dual 2-form derived from the auxiliary fields, etc. All fields are valued in the adjoint representation of the gauge group. After twisting, the theory is well defined on any Riemannian 4-manifold, since the fields are naturally interpreted as differential forms and the Q charge is a scalar (Witten 1988). The observables of the theory are Q cohomology classes of operators, and they can be constructed from 0-form observables O(0) using the descent procedure. This amounts to solving the equations bi
2 < 0; ð; !Þ ¼ 0
½12
Such an element defines a ‘‘wall’’ in Vþ : W ¼ f! : ð; !Þ ¼ 0g
½13
The complements of these walls are called ‘‘chambers,’’ and the cone Vþ is then divided in chambers separated by walls. A class satisfying [12] is the first Chern class associated to a reducible solution of the ASD equations, and it causes a singularity in moduli space: the Donaldson invariants jump when we pass through such a wall. Therefore, when bþ 2 (X) = 1, Donaldson invariants are metric independent in each chamber. A basic problem in Donaldson–Witten theory is to determine the jump in the generating function as we cross a wall, Zþ ðp; SÞ Z ðp; SÞ ¼ WC ðp; SÞ
½14
The jump term WC (p, S, ) is usually called the ‘‘wall-crossing’’ term. The basic goal of Donaldson theory is to study the properties of the generating functional [9] and to compute it for different 4-manifolds X. On the mathematical side, many results have been obtained on ZDW , and some of them can be found in Donaldson and Kronheimer (1990), Friedman and Morgan (1991), Stern (1998), and Go¨ttsche bi
bi
bi
bi
dOðiÞ ¼ fQ; Oðiþ1Þ g;
i ¼ 0; . . . ; 3
½17
The integration over i-cycles (i) in X of the operators O(i) is then an observable. These descent equations have a canonical solution: the 1-formvalued operator K_ = iA Q_ A_ =4 verifies d ¼ fQ; Kg
½18
as a consequence of the supersymmetry algebra. The operators O(i) = Ki O(0) solve the descent equations [17] and are canonical representatives. When the gauge group is SU(2), the observables are obtained by the descent procedure from the operator O ¼ trð2 Þ
½19
The topological descendant O(2) is given by 1 þ Dþ ½20 Oð2Þ ¼ 12 tr p1ffiffi2 ðF Þ 4 dx ^ dx and the resulting observable is Z I2 ðSÞ ¼ Oð2Þ S
½21
113
Donaldson–Witten Theory
O and I2 (S) correspond to the cohomology classes in [5]. One of the main results of Witten (1988) is that the semiclassical approximation in the twisted N = 2 Yang–Mills theory is exact. The semiclassical evaluation of correlation functions of the observables above leads directly to the definition of Donaldson invariants, and the generating functional [9] can be written as a correlation function of the twisted theory. One then has D E w2 ðVÞ ZDW ðp; SÞ ¼ exp pO þ I2 ðSÞ ½22 bi
space. We will label Spinc structures by the class = c1 (L1=2 ) 2 H 2 (X, Z) þ w2 (X)=2. We say that is a Seiberg–Witten basic class if the corresponding Seiberg–Witten invariants are not all zero. If MS W is zero dimensional, the Seiberg–Witten invariant depends only on the Spinc structure associated to = c1 (L1=2 ), and is denoted by SW(). 3. A manifold X is said to be of Seiberg– Witten simple type if all the Seiberg–Witten basic classes have a zero-dimensional moduli space. For simply connected 4-manifolds of Seiberg–Witten simple type and with bþ 2 (X) > 1, Witten determined the Seiberg–Witten contribution and proposed the following ‘‘magic formula’’ for ZDW (Witten 1994b): bi
Results for the Donaldson–Witten Generating Function
ZDW ¼ 21þ7 =4þ11 =4
The basic results that have emerged from the physical approach to Donaldson–Witten theory are the following. 1. The Donaldson–Witten generating functional is in general the sum of the two terms, ZDW ¼ Zu þ ZSW
½23
X
h 2 2 e2ið0 þ0 Þ e2pþS =2 e2ðS;Þ
2
þ i h w2 ðVÞ e2pS
2
=2 2iðS;Þ
e
i SWðÞ
½25
In this equation, , are the Euler characteristic and signature of X, respectively, h = ( þ )=4 is the holomorphic Euler characteristic of X, and 0 is an integer lifting of w2 (V). This formula generalizes previous results by Witten (1994a) for Ka¨hler manifolds. It also follows from this formula that the Donaldson–Witten generating function of simply connected 4-manifolds of Seiberg–Witten simple type and with bþ 2 (X) > 1 satisfies bi
(We have omitted the Stiefel–Whitney class for convenience.) The first term, Zu , is called the ‘‘u-plane integral.’’ It is given by a complicated integral over C which can be written, in turn, as an integral over a fundamental domain of the congruence subgroup 0 (4) of SL(2, Z). Zu depends only on the cohomology ring of X, and therefore does not contain any information beyond the one provided by classical topology. Finally, Zu vanishes if bþ 2 (X) > 1, and it is responsible for the wallcrossing behavior of ZDW when bþ 2 (X) = 1. 2. The second term of [23], ZSW , is called the Seiberg–Witten contribution. This contribution involves the Seiberg–Witten invariants of X, which are obtained by considering the moduli problem defined by the Seiberg–Witten monopole equations (Witten 1994b): bi
ð_ M _ ¼ 0 Fþ_ _ þ 4iM
Þ DL_ M_ ¼ 0
½24
In these equations, M_ is a section of the spinor bundle Sþ L1=2 , L is the determinant line bundle of a Spinc structure on X, Fþ_ _ = Fþ is the _ _ self-dual part of the curvature of a U(1) connection on L, and DL is the Dirac operator for the bundle Sþ L1=2 . The solutions of these equations modulo gauge equivalence form the moduli space MS W, and the Seiberg–Witten invariants are defined by integrating suitable differential forms on this moduli
@2 4 ZDW ¼ 0 @p2
which is the Donaldson simple type condition introduced by Kronheimer and Mrowka (1994). 4. Using the u-plane integral, one can find explicit expressions for ZDW in more general situations (like non-simply-connected manifolds or manifolds which are not of Seiberg–Witten simple type). bi
In the next section we explain the formalism of the u-plane integral introduced by Moore and Witten (1998), which makes possible a detailed derivation of the above results. bi
The u -Plane Integral Definition of the u -Plane Integral
The evaluation of the Donaldson–Witten generating function can be made by using the results of Seiberg and Witten (1994 a, b) on the low-energy dynamics of SU(2), N = 2 Yang–Mills theory. In their work, Seiberg and Witten determined the exact low-energy effective action of the model up to two derivatives. bi
114 Donaldson–Witten Theory
From a physical point of view, there are certainly corrections to this effective action which are difficult to evaluate. Fortunately, the computation in the twisted version of the theory can be done by just considering the Seiberg–Witten effective action. This is because the correlation functions in the twisted theory are invariant under rescalings of the metric, so we can evaluate them in the limit of large distances or equivalently of very low energies. The effective action up to two derivatives is sufficient for that purpose. One way of describing the main result of the work of Seiberg and Witten is that the moduli space of Q-fixed points of the twisted SO(3) N = 2 theory on a compact 4-manifold has two branches, which we refer to as the Coulomb and Seiberg–Witten branches. On the Coulomb branch the expectation value
simply by twisting the physical theory. It can be written as o i 4 1 n 00 K F ðaÞ þ Q; F ðD þ Fþ Þ 6 pffiffiffi 16 pffiffiffi o n i 2 2i 0 Q; F d 5 32 3 n opffiffiffi 2 000 Q; F gd 4 x ~ þ AðuÞtrR ^ R þ BðuÞtrR ^ R
where A(u), B(u) describe the coupling to gravity, and after integration of the corresponding differential forms we obtain terms proportional to the signature and Euler characteristic of X. The data of the low-energy effective action can be encoded in an elliptic curve of the form y2 ¼ x3 ux2 þ 14 x
2
u¼
htr i 162
breaks SO(3) ! U(1) via the standard Higgs mechanism. The Coulomb branch is simply a copy of the complex u-plane. The low-energy effective theory on this branch is simply the abelian N = 2 gauge theory. However, at two points, u = 1, there is a singularity where the moduli space meets a second branch, the Seiberg–Witten branch. At these points, the effective action is given by the magnetic dual of the U(1), N = 2 gauge theory coupled to a monopole matter hypermultiplet. Therefore, this branch consists of solutions to the Seiberg–Witten equations [24]. Since the manifold X is compact, the partition function of the twisted theory is a sum over ‘‘all’’ vacuum states. Equation [23] then follows. In this equation, Zu comes from ‘‘integrating over the u-plane,’’ while ZSW corresponds to the points u = 1. As we stated before, Zu vanishes for manifolds of bþ 2 (X) > 1, but once this piece has been determined an argument originally presented at Moore and Witten (1998) allows one to derive the form of ZSW as well for arbitrary bþ 2 (X) 1. The computation of Zu is presented in detail in Moore and Witten (1998). The starting point of the computation is the untwisted low-energy theory, which has been described in detail in Seiberg and Witten (1994 a, b) and Witten (1995). It is an N = 2 theory characterized by a prepotential F which depends on an N = 2 vector multiplet. The effective gauge coupling is given by (a) = F 00 (a), where a is the scalar component of the vector multiplet. The Euclidean Lagrange density for the u-plane theory can be obtained bi
bi
½27
and is the modulus of the curve. The monodromy group of this curve is 0 (4). All the quantities involved in the action can be obtained by integrating a certain meromorphic differential on the curve, and they can be expressed in terms of modular forms. As for the operators, we have u = O(P) by definition. We may then obtain the 2-observables from the R procedure. The result is that I(S) ! R descent ~I(S) = K2 u = (du=da)(Dþ þ F ) þ . Here Dþ S S is the auxiliary field. Although one has I(S) ! ~I(S) in going from the microscopic theory to the effective theory, it does not necessarily follow that I(S1 )I(S2 ) ! ~I(S1 )~I(S2 ) because there can be contact terms. If S1 and S2 intersect, then in passing to the low-energy theory we integrate out massive modes. This can induce delta function corrections to the operator product expansion modifying the mapping to the low-energy theory as follows: expðIðSÞÞ ! exp ~IðSÞ þ S2 TðuÞ
½28
where T(u) is the contact term. Such contact terms were observed in Witten (1994a) and studied in detail in Losev et al. (1998). It can be shown that bi
bi
bi
bi
½26
TðuÞ ¼
2 1 du 1 E2 ðÞ þ u 24 da 3
½29
where E2 () is Eisenstein’s series and da=du is one of the periods of the elliptic curve [27]. The final result of Moore and Witten is the following expression: Z du d u 2^ Zu ðp; SÞ ¼ ðÞe2puþS TðuÞ ½30 1=2 y C
Donaldson–Witten Theory
bþ 2 (X) = 1. The discontinuity of the u-plane integral at these walls can be easily computed from eqn [33]:
Here, d da 1ð1=2Þ =8 ðÞ ¼ d u du ðdu=daÞ2 ^ TðuÞ ¼ TðuÞ þ 8y
115
½31
where y = Im and is the discriminant of the curve [27]. The quantity is essentially a Narain– Siegel theta function associated to the lattice H 2 (X, Z). Notice that this lattice is Lorentzian and has signature (1, (1)b2 (X) ) (since bþ 2 (X) = 1). The self-dual projection of a 2-form can be done with the period point ! as þ = (, !)!. The lattice is shifted by half the second Stiefel–Whitney class of the bundle, w2 (V), that is, ¼ H2 ðX; ZÞþ 12 w2 ðVÞ
WC¼2 ðp; SÞ h 2 i 2 1 ¼ ð1Þð0 ;w2 ðXÞÞ e2i0 q =2 h1 ðÞ2 # 4 f1 2 i exp 2pu1 þ S2 T1 ið; SÞ=h1 0 ½34 q
This expression involves the modular forms h1 , f21 , u1 , and T1 (the subscript 1 refers to the fact that they are computed at the ‘‘electric’’ frame which is appropriate for the Seiberg–Witten curve at u ! 1). They can be written in terms of Jacobi theta functions #i (q), with q = e2i , and their explicit expression is h1 ðqÞ ¼ 12 #2 ðqÞ#3 ðqÞ
and "
# 1 du 2 2 2i2 X ¼ exp S e 0 ð1Þð0 Þ w2 ðXÞ 8y da 2
i du ðS; !Þ ð; !Þ þ 4y da
du exp i ðþ Þ2 ið Þ2 i ðS; Þ ½32 da Here, w2 (X) is the second Stiefel–Whitney class of X, and 0 is a choice of lifting of w2 (V) to H 2 (X, Z). This expression can be extended to the non-simply-connected case (see Marin˜o and Moore (1999) and Moore and Witten (1998)). The study of the u-plane integral leads to a systematic derivation of many important results in Donaldson–Witten theory. We will discuss in detail two such applications, Go¨ttsche’s wallcrossing formula and Witten’s ‘‘magic formula.’’ bi
bi
Wall-Crossing Formula
As shown by Moore and Witten, the u-plane integral is well defined and does not depend on the period point (hence on the metric on X) except for discontinuous behavior at walls. There are two kinds of walls, associated, respectively, to the singularities at u = 1 (the semiclassical region of the underlying Yang–Mills theory) and at u = 1, given by u ¼ 1: þ ¼ 0; 2 H 2 ðX; ZÞþ 12 w2 ðVÞ u ¼ 1: þ ¼ 0; 2 H 2 ðX; ZÞþ 12 w2 ðXÞ
½33
The first type of walls is precisely the one that appears in Donaldson theory on manifolds of
f1 ðqÞ ¼ u1 ðqÞ ¼
#2 ðqÞ#3 ðqÞ 2#84 ðqÞ ½35
#42 ðqÞ þ #43 ðqÞ 2ð#2 ðqÞ#3 ðqÞÞ2
T1 ðqÞ ¼
1 E2 ðqÞ 1 þ u1 ðqÞ 24 h21 ðqÞ 3
The subindex q0 means that in the expansion in q of the modular forms, we pick the constant term. The formula [34] agrees with the formula of Go¨ttsche (1996) for the wall crossing of the Donaldson– Witten generating functional. bi
The Seiberg–Witten Contribution and Witten’s Magic Formula
At u = 1, Zu jumps at the second type of walls [33], which are called Seiberg–Witten (SW) walls. In fact, these walls are labeled by classes 2 H 2 (X; Z) þ (1=2)w2 (X), which correspond to Spinc structures on X. At these walls, the Seiberg– Witten invariants have wall-crossing behavior. Since the Donaldson polynomials do not jump at SW walls, it must happen that the change of Zu at u = 1 is canceled by the change of ZSW . As shown by Moore and Witten, this actually allows one to obtain a precise expression for ZSW for general 4-manifolds of bþ 2 (X) 1. On general grounds, ZSW is given by the sum of the generating functionals at u = 1. These involve a magnetic U(1), N = 2 vector multiplet coupled to a hypermultiplet (the monopole field). The twisted
116 Donaldson–Witten Theory
Lagrangian for such a system involves the magnetic prepotential FeD (aD ), and it can be written as Wg þ i ~D F ^ F þ pðuÞtrR ^ R fQ; 16 pffiffiffi i 2 d~ D þ ‘ðuÞtrR ^ R ð ^ Þ^F 32 daD þ
i d2 ~D 3 27 da2D
Other Applications of the u -Plane Integral
The u-plane integral makes possible to derive other results on the Donaldson–Witten generating functional. The blow-up formula. This relates the function b ZDW on X to ZDW on the blown-up manifold X. The u-plane integral leads directly to the general blow-up formula of Fintushel and Stern (1996). Direct evaluations. The u-plane integral can be evaluated directly in many cases, and this leads to explicit formulas for the Donaldson–Witten generating functional of certain 4-manifolds with bþ 2 (X) = 1, on certain chambers, and in terms of modular forms. For example, there are explicit formulas for the Donaldson–Witten generating functional of product ruled surfaces of the form S2 g in the limiting chambers in which S2 or g are very small (Moore and Witten 1998, Marin˜o and Moore 1999). Moore and Witten (1998) have also derived an explicit formula for the Donaldson invariants of CP2 in terms of Hurwitz class numbers. bi
^
^
^
½36
00
where ~D = FeD (aD ). Using the cancellation of wall crossings, one can actually compute the functions FeD (aD ), p(u), ‘(u) and determine the precise form of the Seiberg–Witten contributions. One finds that a Spinc structure at u = 1 gives the following contribution to the Donaldson–Witten generating functional:
bi
Zu¼1; SW
SWðÞ 2ið2 0 Þ 0 e ¼ 16 8þ aD h 2 =2 #2 qD 2i 2 aD hM hM
2 ½37 exp 2puM þ ið; SÞ=hM þ S TM q0D
Here, aD , hM , uM , and TM are modular forms that can be expressed as well in terms of Jacobi theta functions #i (qD ), where qD = exp (2iD ). The subscript M refers to the monopole point, and they are related by an S-transformation to the quantities obtained in the ‘‘electric’’ frame at u ! 1. Their explicit expression is aD ðqD Þ ¼
i 2E2 ðqD Þ #43 ðqD Þ #44 ðqD Þ 6 #3 ðqD Þ#4 ðqD Þ
1 #3 ðqD Þ#4 ðqD Þ 2i 1 #43 ðqD Þ þ #44 ðqD Þ uM ðqD Þ ¼ 2 ð#3 ðqD Þ#4 ðqD ÞÞ2
½38
The contribution at u = 1 is related to the contribution at u = 1 by a u ! u symmetry: 2
Zu¼1 ðp; SÞ ¼ e2i0 ið þ Þ=4 Zu¼1 ðp; iSÞ
½39
If the manifold has bþ 2 (X) > 1 and is of Seiberg– Witten simple type, [37] reduces to
e
2ið20 0 Þ
SWðÞ
Extensions of Donaldson–Witten Theory Donaldson–Witten theory is a twisted version of SU(2), N = 2 Yang–Mills theory. The twisting of more general N = 2 gauge theories, involving other gauge groups and/or matter content, leads to other topological field theories that give interesting generalizations of Donaldson–Witten theory. We now briefly list some of these extensions and their most important properties. Higher-rank theories. The extension of Donaldson–Witten to other gauge groups has been studied in detail in Marin˜o and Moore (1998b) and Losev et al. (1998). One can study the higher-rank generalization of the u-plane integral, and as shown in Marin˜o and Morre (1998b), this leads to a fairly explicit formula for the Donaldson–Witten generating function in the SU(N) case, for manifolds with bþ 2 > 1 and of Seiberg–Witten simple type. Mathematically, higher-rank generalizations of Donaldson theory turn out to be much more complicated, but they can be studied. In particular, higher-rank generalizations of the Donaldson invariants can be defined and computed (Kronheimer 2004), and the results so far agree with the predictions of Marin˜o and Moore (1998b). Unfortunately it seems that these higher-rank generalizations do not contain new topological information, besides the one encoded in the Seiberg–Witten invariants. Theories with matter. Twisted SU(2), N = 2 theories with hypermultiplets lead to generalizations of Donaldson–Witten theory involving nonabelian bi
bi
1 E2 ðqD Þ 1 þ uM ðqD Þ 24 h2M ðqD Þ 3
ð1Þ h 21þ7 =4þ11 =4 e2pþS
bi
bi
hM ðqD Þ ¼
TM ðqÞ ¼
bi
2
=2 2ðS;Þ
e
½40
This leads to Witten’s ‘‘magic formula’’ [25] which expresses the Donaldson invariants in terms of Seiberg–Witten invariants.
bi
bi
Donaldson–Witten Theory bi
monopole equations (see Marin˜o (1997) and Labastida and Marin˜o (2005) for a review of these models and some of their properties). The u-plane integral leads to explicit formulas for the generating functionals of these theories, which for manifolds of bþ 2 > 1 can be written in terms of Seiberg–Witten invariants. Again, no new topological information seems to be encoded in these theories. One can however exploit new physical phenomena arising in the theories with hypermultiplets (in particular, the presence of superconformal points) to obtain new information about the Seiberg–Witten invariants (see Marin˜o et al. (1999) for these developments). Vafa–Witten theory. The so-called Vafa–Witten theory is a close cousin of Donaldson–Witten theory, and was introduced by Vafa and Witten (1994) as a topological twist of N = 4 Yang–Mills theory. In some cases, the partition function of this theory counts the Euler characteristic of the moduli space of instantons on the 4-manifold X. For a review of some properties of this theory, see Lozano (1999). bi
bi
bi
bi
See also: Duality in Topological Quantum Field Theory; Mathai–Quillen Formalism; Seiberg–Witten Theory; Topological Quantum Field Theory: Overview.
Further Reading Birmingham D, Blau M, Rakowski M, and Thompson G (1991) Topological field theories. Physics Reports 209: 129. Cordes S, Moore G, and Rangoolam S (1994) Lectures on 2D Yang–Mills theory, equivariant cohomology and topological field theory (arXiv:hep-th/9411210). Donaldson SK (1990) Polynomial invariants for smooth fourmanifolds. Topology 29: 257. Donaldson SK and Kronheimer PB (1990) The Geometry of FourManifolds. Oxford: Clarendon. Fintushel R and Stern RJ (1996) The blowup formula for Donaldson invariants. Annals of Mathematics 143: 529 (arXiv:alg-geom/9405002). Friedman R and Morgan JW (1991) Smooth Four-Manifolds and Complex Surfaces. New York: Springer. Go¨ttsche L (1996) Modular forms and Donaldson invariants for four-manifolds with bþ = 1. Journal of American Mathematical Society 9: 827 (arXiv:alg-geom/9506018). Kronheimer P (2004) Four-manifold invariants from higher-rank bundles (arXiv:math.GT/0407518).
117
Kronheimer P and Mrowka T (1994) Recurrence relations and asymptotics for four-manifolds invariants. Bulletin of the American Mathematical Society 30: 215. Kronheimer P and Mrowka T (1995) Embedded surfaces and the structure of Donaldson’s polynomials invariants. Journal of Differential Geometry 33: 573. Labastida J and Marin˜o M (2005) Topological Quantum Field Theory and Four-Manifolds. New York: Springer. Losev A, Nekrasov N, and Shatashvili S (1998) Issues in topological gauge theory. Nuclear Physics B 534: 549 (arXiv:hep-th/9711108). Lozano C (1999) Duality in topological quantum field theories (arXiv:hep-th/9907123). Marin˜o M (1997) The geometry of supersymmetric gauge theories in four dimensions (arXiv:hep-th/9701128). Marin˜o M and Moore G (1998a) Integrating over the Coulomb branch in N = 2 gauge theory. Nuclear Physics Proceedings Supplement 68: 336 (arXiv:hep-th/9712062). Marin˜o M and Moore GW (1998b) The Donaldson–Witten function for gauge groups of rank larger than one. Communications in Mathematical Physics 199: 25 (arXiv:hep-th/ 9802185). Marin˜o M and Moore GW (1999) Donaldson invariants for non-simply connected manifolds. Communications in Mathematical Physics 203: 249 (arXiv:hep-th/9804104). Marin˜o M, Moore GW, and Peradze G (1999) Superconformal invariance and the geography of four-manifolds. Communications in Mathematical Physics 205: 691 (arXiv:hep-th/ 9812055). Moore G and Witten E (1998) Integration over the u-plane in Donaldson theory. Advances in Theoretical and Mathematical Society 1: 298 (arXiv:hep-th/9709193). Seiberg N and Witten E (1994a) Electric – magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang–Mills theory. Nuclear Physics B 426: 19. Seiberg N and Witten E (1994b) Electric – magnetic duality, monopole condensation, and confirement in N = 2 supersymmetric Yang–Mills theory – erratum. Nuclear Physics B 430: 485 (arXiv:hep-th/9407087). Stern R (1998) Computing Donaldson invariants. In: Gauge Theory and the Topology of Four-Manifolds, IAS/ Park City Mathematical Series. Providence, RI: American Mathematical Society. Vafa C and Witten E (1994) A strong coupling test of S duality. Nuclear Physics B 431: 3 (arXiv:hep-th/9408074). Witten E (1988) Topological quantum field theory. Communications in Mathematical Physics 117: 353. Witten E (1994a) Supersymmetric Yang–Mills theory on a four manifold. Journal of Mathematical Physics 35: 5101 (arXiv:hep-th/9403195). Witten E (1994b) Monopoles and four manifolds. Mathematical Research Letters 1: 769 (arXiv:hep-th/9411102). Witten E (1995) On S duality in Abelian gauge theory. Selecta Mathematica 1: 383 (arXiv:hep-th/9505186).
118 Duality in Topological Quantum Field Theory
Duality in Topological Quantum Field Theory C Lozano, INTA, Madrid, Spain J M F Labastida, CSIC, Madrid, Spain ª 2006 Elsevier Ltd. All rights reserved.
Introduction There have been many exciting interactions between physics and mathematics in the past few decades. A prominent role in these interactions has been played by certain field theories, known as topological quantum field theories (TQFTs). These are quantum field theories whose correlation functions are metric independent and, in fact, compute certain mathematical invariants (Birmingham et al. 1991, Cordes et al. 1996, Labastida and Lozano 1998). Well-known examples of TQFTs are, in two dimensions, the topological sigma models (Witten 1988a), which are related to Gromov–Witten invariants and enumerative geometry; in three dimensions, Chern–Simons theory (Witten 1989), which is related to knot and link invariants; and in four dimensions, topological Yang–Mills theory (or Donaldson–Witten theory) (Witten 1988b), which is related to the Donaldson invariants. The two- and four-dimensional theories above are examples of cohomological (also Witten-type) TQFTs. As such, they are related to an underlying supersymmetric quantum field theory (the N = 2 nonlinear sigma model, and the N = 2 supersymmetric Yang–Mills theory, respectively) and there is no difference between the topological and the standard version on flat space. However, when one considers curved spaces, the topological version differs from the supersymmetric theory on flat space in that some of the fields have modified Lorentz transformation properties (spins). This unconventional spin assignment is also known as twisting, and it comes about basically to preserve supersymmetry on curved space. In fact, the twisting gives rise to at least one nilpotent scalar supercharge Q, which is a certain linear combination of the original (spinor) supersymmetry generators. In these theories the energy momentum tensor is Q-exact, that is, bi
bi
bi
bi
moduli problem related to the computation of certain mathematical invariants. On the other hand, in Chern–Simons theory, as a representative of the so-called Schwarz-type topological theories, the topological character is manifest: one starts with an action which is explicitly independent of the metric on the 3-manifold, and thus correlation functions of metric-independent operators are topological invariants as long as quantization does not introduce any undesired metric dependence. Even though the primary motivation for introducing TQFTs may be to shed light onto awkward mathematical problems, they have proved to be a valuable tool to gain insight into many questions of interest in physics as well. One such question where TQFTs can (and in fact do) play a role is duality. In what follows, an overview of the manifestations of duality is provided in the context of TQFTs.
bi
bi
T ¼ fQ; g for some , which (barring potential anomalies) leads to the statement that the correlation functions of operators in the cohomology of Q are all metric independent. Furthermore, the corresponding path integrals are localized to field configurations that are annihilated by Q, and this typically leads to some
Duality The notion of duality is at the heart of some of the most striking recent breakthroughs in physics and mathematics. In broad terms, a duality (in physics) is an equivalence between different (and often complementary) descriptions of the same physical system. The prototypical example is electric– magnetic (abelian) duality. Other, more sophisticated, examples are the various string-theory dualities, such as T-duality (and its more specialized realization, mirror symmetry) and strong/weak coupling S-duality, as well as field theory dualities such as Montonen–Olive duality and Seiberg–Witten effective duality. Also, the original ’t Hooft conjecture, stating that SU(N) gauge theories are equivalent (or dual), at large N, to string theories, has recently been revived by Maldacena (1998) by explicitly identifying the string-theory duals of certain (supersymmetric) gauge theories. One could wonder whether similar duality symmetries work for TQFTs as well. As noted in the following, this is indeed the case. In two dimensions, topological sigma models come under two different versions, known as types A and B, respectively, which correspond to the two different ways in which N = 2 supersymmetry can be twisted in two dimensions. Computations in each model localize on different moduli spaces and, for a given target manifold, give different results, but it turns out that if one considers mirror pairs of Calabi–Yau manifolds, bi
Duality in Topological Quantum Field Theory
computations in one manifold with the A-model are equivalent to computations in the mirror manifold with the B-model. Also, in three dimensions, a program has been initiated to explore the duality between large N Chern–Simons gauge theory and topological strings, thereby establishing a link between enumerative geometry and knot and link invariants (Gopakumar and Vafa 1998). Perhaps the most impressive consequences of the interplay between duality and TQFTs have come out in four dimensions, on which we will focus in what follows. bi
119
BRST-inspired terminology which reflects the formal resemblance of topological cohomological field theories with some aspects of the BRST approach to the quantization of gauge theories. Before constructing the topological observables of the theory, we begin by pointing out that for each independent Casimir of the gauge group G it is possible to construct an operator W0 , from which operators Wi can be defined recursively through the descent equations {Q, Wi } = dWi1 . For example, for the quadratic Casimir, W0 ¼
1 trð2 Þ 82
½4
which generates the following family of operators:
Duality in Twisted N = 2 Theories As mentioned above, topological Yang–Mills theory (or Donaldson–Witten theory) can be constructed by twisting the pure N = 2 supersymmetric Yang–Mills theory with gauge group SU(2). This theory contains a gauge field A, a pair of chiral spinors 1 , 2 , and a complex scalar field B. The twisted theory contains a gauge field A, bosonic scalars , , a Grassman-odd scalar , a Grassman-odd vector , and a Grassmanodd self-dual 2-form . On a 4-manifold X, and for gauge group G, the twisted action has the form Z 2 4 pffiffiffi S¼ d x g tr Fþ i D þ iD X
1 i þ f ; g þ f ; 4 4 i 1 þ f; g þ ½; 2 2 8
g D D ½1
where Fþ is the self-dual part of the Yang–Mills field strength F. The action [1] is invariant under the transformations generated by the scalar supercharge Q: fQ; A g ¼ ; fQ; g ¼ dA ; fQ; g ¼ 0;
fQ; g ¼
þ F
fQ; g ¼ i½; fQ; g ¼
½2
In these transformations, Q2 is a gauge transformation with gauge parameter , modulo field equations. Observables are, therefore, related to the G-equivariant cohomology of Q (i.e., the cohomology of Q restricted to gauge invariant operators). Auxiliary fields can be introduced so that the action [1] is Q-exact, that is, S ¼ fQ; g
½3
for a certain functional of the fields of the theory which comes under the name of gauge fermion, a
1 trð Þ 42 1 1 ^ þ^F W2 ¼ 2 tr 4 2 1 W3 ¼ 2 trð ^ FÞ 4
W1 ¼
½5
Using these one defines the following observables: Z OðkÞ ¼ Wk ½6 k
where k 2 Hk (X) is a k-cycle on the 4-manifold X. The descent equations imply that they are Q-closed and depend only on the homology class of k . Topological invariants are constructed by taking vacuum expectation values of products of the operators O(k) : D E Oðk1 Þ Oðk2 Þ Oðkp Þ Z 2 ½7 ¼ Oðk1 Þ Oðk2 Þ Oðkp Þ eS=e where the integration has to be understood on the space of field configurations modulo gauge transformations, and e is a coupling constant. Standard arguments show that due to the Q-exactness of the action S, the quantities obtained in [7] are independent of e. This implies that the observables of the theory can be obtained either in the weak-coupling limit e ! 0 (also short-distance or ultraviolet regime, since the N = 2 theory is asymptotically free), where perturbative methods apply, or in the strong-coupling (also long-distance or infrared) limit e ! 1, where one is forced to consider a nonperturbative approach. In the weak-coupling limit one proves that the correlation functions [7] descend to polynomials in the product cohomology of the moduli space of anti-self-dual (ASD) instantons Hk1 (MASD ) Hk2 (MASD ) Hkp (MASD ), which are precisely
120 Duality in Topological Quantum Field Theory
the Donaldson polynomial invariants of X. However, the weak- coupling analysis does not add any new ingredient to the problem of the actual computation of the invariants. The difficulties that one has to face in the field theory representation are similar to those in ordinary Donaldson theory. Nevertheless, the field theory connection is very important since in this theory the strong- and weakcoupling limits are exact, and therefore the door is open to find a strong-coupling description which could lead to a new, simpler representation for the Donaldson invariants. This alternative strategy was pursued by Witten (1994a), who found the strong-coupling realization of the Donaldson–Witten theory after using the results on the strong-coupling behavior of N = 2 supersymmetric gauge theories which he and Seiberg (Seiberg and Witten 1994a–c) had discovered. The key ingredient in Witten’s derivation was to assume that the strong-coupling limit of Donaldson–Witten theory is equivalent to the ‘‘sum’’ over the twisted effective low-energy descriptions of the corresponding N = 2 physical theory. This ‘‘sum’’ is not entirely a sum, as in general it has a part which contains a continuous integral. The ‘‘sum’’ is now known as integration over the u-plane after the work of Moore and Witten (1998). Witten’s (1994a) assumption can be simply stated as saying that the weak-/ strong-coupling limit and the twist commute. In other words, to study the strong-coupling limit of the topological theory, one first untwists, then works out the strong-coupling limit of the physical theory and, finally, one twists back. From such a viewpoint, the twisted effective (strong-coupling) theory can be regarded as a TQFT dual to the original one. In addition, one could ask for the dual moduli problem associated to this dual TQFT. It turns out that in many interesting situations (bþ 2 (X) > 1) the dual moduli space is an abelian system corresponding to the Seiberg–Witten or monopole equations (Witten 1994a). The topological invariants associated with this new moduli space are the celebrated Seiberg–Witten invariants. Generalizations of Donaldson–Witten theory, with either different gauge groups and/or additional matter content (such as, e.g., twisted N = 2 Yang–Mills multiplets coupled to twisted N = 2 matter multiplets) are possible, and some of the possibilities have in fact been explored (see Moore and Witten (1998) and references therein). The main conclusion that emerges from these analyses is that, in all known cases, the relevant topological information is captured by the Seiberg–Witten invariants, irrespectively of the gauge group and matter content of the theory under consideration. These cases are not reviewed bi
bi
bi
bi
bi
bi
here, but rather the attention is turned to the twisted theories which emerge from N = 4 supersymmetric gauge theories.
Duality in Twisted N = 4 Theories Unlike the N = 2 supersymmetric case, the N = 4 supersymmetric Yang–Mills theory in four dimensions is unique once the gauge group G is fixed. The microscopic theory contains a gauge or gluon field, four chiral spinors (the gluinos) and six real scalars. All these fields are massless and take values in the adjoint representation of the gauge group. The theory is finite and conformally invariant, and is conjectured to have a duality symmetry exchanging strong and weak coupling and exchanging electric and magnetic fields, which extends to a full SL(2, Z) symmetry acting on the microscopic complexified coupling (Montonen and Olive 1977) bi
¼
4i þ 2 e2
½8
As in the N = 2 case, the N = 4 theory can be twisted to obtain a topological model, only that, in this case, the topological twist can be performed in three inequivalent ways, giving rise to three different TQFTs (Vafa and Witten 1994). A natural question to answer is whether the duality properties of the N = 4 theory are shared by its twisted counterparts and, if so, whether one can take advantage of the calculability of topological theories to shed some light on the behavior and properties of duality. The answer is affirmative, but it is instructive to clarify a few points. First, as mentioned above, the topological observables in twisted N = 2 theories are independent of the coupling constant e, so the question arises as to how the twisted N = 4 theories come to depend on the coupling constant. As it turns out, twisted N = 2 supersymmetric gauge theories have an off-shell formulation such that the TQFT action can be expressed as a Q-exact expression, where Q is the generator of the topological symmetry. Actually, R this is true only up to a topological -term X tr(F ^ F), Z 1 pffiffiffi 4 S¼ 2 g d xfQ; g 2e X Z 1 trðF ^ FÞ ½9 2i
162 X bi
for some . However, the N = 2 supersymmetric gauge theories possess a global U(1) chiral symmetry which is generically anomalous, so one can actually
121
Duality in Topological Quantum Field Theory
get rid of the -term with a chiral rotation. As a result of this, the observables in the topological theory are insensitive to -terms (and hence to
and e) up to a rescaling. On the other hand, in N = 4 supersymmetric gauge theories -terms are observable. There is no chiral anomaly and these terms cannot be shifted away as in the N = 2 case. This means that in the twisted theories one might have a dependence on the coupling constant , and that – up to anomalies – this dependence should be holomorphic (resp. antiholomorphic if one reverses the orientation of the 4-manifold). In fact, on general grounds, one would expect for the partition functions of the twisted theories on a 4-manifold X and for gauge group G to take the generic form X ZX ðGÞ ¼ qcðX;GÞ qk ðMk Þ ½10 k
where q = e2i , c is a universal constant (depending R on X and G), k = (1=162 ) X tr(F ^ F) is the instanton number, and (Mk ) encodes the topological information corresponding to a sector of the moduli space of the theory with instanton number k. Now we can be more precise as to how we expect to see the Montonen–Olive duality in the twisted N = 4 theories. First, under ! 1= the gauge group G gets exchanged with its dual group ˆ Correspondingly, the partition functions should G. behave as modular forms ZG ð1= Þ ¼ ðX; GÞ w ZG^ ð Þ
½11
where is a constant (depending on X and G), and the modular weight w should depend on X in such a way that it vanishes on flat space. In addition to this, in the N = 4 theory all the fields take values in the adjoint representation of G. Hence, if H 2 (X, 1 (G)) 6¼ 0, it is possible to consider nontrivial G/Center(G) gauge configurations with discrete magnetic ’t Hooft flux through the 2-cycles of X. In fact, G/Center(G) bundles on X are classified by the instanton number and a characteristic class v 2 H2 (X, 1 (G)). For example, if ˆ = SU(2)=Z2 = SO(3) and v is G = SU(2), we have G the second Stiefel–Whitney class w2 (E) of the gauge bundle E. This Stiefel–Whitney class can be represented in de Rham cohomology by a class in H 2 (X, Z) defined modulo 2, that is, w2 (E) and w2 (E) þ 2!, with ! 2 H 2 (X, Z), represent the same ’t Hooft flux, so if w2 (E) = 2, for some 2 H 2 (X, Z), then the gauge configuration is trivial in SO(3) (it has no ’t Hooft flux). ˆ = SU Similarly, for G = SU(N) (for which G 2 (N)=ZN ), one can fix fluxes in H (X, ZN ) (the
corresponding Stiefel–Whitney class is defined modulo N). One has, therefore, a family of partition functions Zv ( ), one for each magnetic flux v. The SU(N) partition function is obtained by considering the zero flux partition function (up to a constant factor), while the (dual) SU(N)=ZN partition function is obtained by summing over all v, and both are to be exchanged under ! 1= . The action of SL(2, Z) on the Zv should be compatible with this exchange, and thus the ! 1= operation mixes the Zv by a discrete Fourier transform which, for G = SU(N) reads Zv ð1= Þ ¼ ðX; GÞ w
X
e2iuv=N Zu ð Þ
½12
u
We are now in a position to examine the (three) twisted theories in some detail. For further details and references, the reader is referred to Lozano (1999). The first twisted theory considered here possesses only one scalar supercharge (and hence comes under the name of ‘‘half-twisted theory’’). It is a nonabelian generalization of the Seiberg–Witten abelian monopole theory, but with the monopole multiplets taking values in the adjoint representation of the gauge group. The theory can be perturbed by giving masses to the monopole multiplets while still retaining its topological character. The resulting theory is the twisted version of the mass-deformed N = 4 theory, which preserves N = 2 supersymmetry and whose low-energy effective description is known. This connection with N = 2 theories, and its topological character, makes it possible to go to the long-distance limit and compute in terms of the twisted version of the low-energy effective description of the supersymmetric theory. Below, we review how the u-plane approach works for gauge group SU(2). The twisted theory for gauge group SU(2) has a U(1) global symmetry (the ghost number) which has an anomaly 3(2 þ 3 )=4 on gravitational backgrounds (i.e., on curved manifolds). Nontrivial topological invariants are thus obtained by considering the vacuum expectation value of products of observables with ghost numbers adding up to 3(2 þ 3 )=4. The relevant observables for this theory and gauge group SU(2) or SO(3) are precisely the same as in the Donaldson–Witten theory (eqns [4] and [5]). In addition to this, it is possible to enrich the theory by including sectors with nontrivial nonabelian electric and magnetic ’t Hooft fluxes which, as pointed out above, should behave under SL(2, Z) duality in a well-defined fashion. bi
122 Duality in Topological Quantum Field Theory
The generating function for these correlation functions is given as an integration over the moduli space of vacua of the physical theory (the u-plane), which, for generic values of the mass parameter, forms a one-dimensional complex compact manifold (described by a complex variable customarily denoted by u, hence the name), which parametrizes a family of elliptic curves that encodes all the relevant information about the low-energy effective description of the theory. At a generic point in the moduli space of vacua, the only contribution to the topological correlation functions comes from a twisted N = 2 abelian vector multiplet. Additional contributions come from points in the moduli space where the low-energy effective description is singular (i.e., where the associated elliptic curve degenerates). Therefore, the total contribution to the generating function thus consists of an integration over the moduli space with the singularities removed – which is nonvanishing for bþ 2 (X) = 1 (Moore and Witten 1998) only – plus a discrete sum over the contributions of the twisted effective theories at each of the three singularities of the low-energy effective description (Seiberg and Witten 1994a, b, c). The effective theory at a given singularity contains, together with the appropriate dual photon multiplet, one charged hypermultiplet, which corresponds to the state becoming massless at the singularity. The complete effective action for these massless states also contains certain measure factors and contact terms among the observables, which reproduce the effect of the massive states that have been integrated out as well as incorporate the coupling to gravity (i.e., explicit nonminimal couplings to the metric of the 4-manifold). How to determine these a priori unknown functions was explained in Moore and Witten (1998). The idea is as follows. At points on the u-plane where the (imaginary part of the) effective coupling diverges, the integral is discontinuous at anti-self-dual abelian gauge configurations. This is commonly referred to as ‘‘wall crossing.’’ Wall crossing can take place at the singularities of the moduli space – the appropriate local effective coupling eff diverges there – and, in the case of the asymptotically free theories, at the point at infinity – the effective electric coupling diverges owing to asymptotic freedom. On the other hand, the final expression for the invariants can exhibit a wall-crossing behavior at most at u ! 1, so the contribution to wall crossing from the integral at the singularities at finite values of u must cancel against the contributions coming from the effective theories there, which also display wall-crossing discontinuities. Imposing this bi
bi
bi
cancelation fixes almost completely the unknown functions in the contributions to the topological correlation functions from the singularities. The final result for the contributions from the singularities (which give the complete answer for the correlation functions when bþ 2 (X) > 1) is written explicitly and completely in terms of the fundamental periods da=du (written in the appropriate local variables) and the discriminant of the elliptic curve comprising the Seiberg–Witten solution for the physical theory. For simply connected spin 4-manifolds of simple type the generating function is given by hepOþIðSÞ iv ¼ 2ð=2þð2þ3 Þ=8Þ mð3þ7 Þ=8 ðð ÞÞ12 ( ðþ =4Þ 2 da ð 1 Þ e2pu1 þS T1 du 1 X ½x=2;v nx eði=2Þðdu=daÞ1 xS x
ðþ =4Þ da þ 2b2 =2 ð1Þ =8 ð 2 Þ du 2 X 2 e2pu2 þS T2 ð1Þvx=2 nx eði=2Þðdu=daÞ2 xS x 2
þ 2b2 =2 iv ð 3 Þ
X
da ðþ =4Þ 2pu3 þS2 T3 e du 3 )
ð1Þvx=2 nx eði=2Þðdu=daÞ3 xS
½13
x
where x is a Seiberg–Witten basic class (and nx is the corresponding Seiberg–Witten invariant), m is the mass parameter of the theory, = ( þ )=4, v 2 H 2 (X, Z2 ) is a ’t Hooft flux, S is the formal P sum S = a a a (and,R correspondingly, I(S) = a a I(a ), with I(a ) = a W2 ), where {a }a = 1,..., b2 (X) form a basis of H2 (X) and a are constant parameters, while ( ) is the Dedekind function, i = (du= dqeff )u = ui (with qeff = exp (2i eff ), and eff is the ratio of the fundamental periods of the elliptic curve), and the contact terms Ti have the form 1 du 2 ui m2 Ti ¼ E4 ð Þ þ E2 ð Þ þ 12 da i 6 72
½14
with E2 and E4 the Einstein series of weights 2 and 4, respectively. Evaluating the quantities in [13] gives the final result as a function of the physical parameters and m, and of topological data of X as the Euler characteristic , the signature and the basic classes x. The expression [13] has to be understood as a formal power series in p and a , whose coefficients give the vacuum expectation values of products of O = W0 and I(a ).
123
Duality in Topological Quantum Field Theory
The generating function [13] has nice properties under the modular group. For the partition function Zv , 2
Zv ð þ 1Þ ¼ ð1Þ =8 iv Zv ð Þ =2 Zv ð1= Þ ¼ 2b2 =2 ð1Þ =8 i X ð1Þwv Zw ð Þ
½15
w 1
Also, with ZSU(2) = 2 Zv = 0 and ZSO(3) =
P
v
Zv ,
ZSUð2Þ ð þ 1Þ ¼ ð1Þ =8 ZSUð2Þ ð Þ ZSOð3Þ ð þ 2Þ ¼ ZSOð3Þ ð Þ
½16
bi
ZSUð2Þ ð1= Þ ¼ ð1Þ =8 2=2 =2 ZSOð3Þ ð Þ Notice that the last of these three equations corresponds precisely to the strong–weak coupling duality transformation conjectured by Montonen and Olive (1977). As for the correlation functions, one finds the following behavior under the inversion of the coupling: SUð2Þ 1 1 SOð3Þ 2 tr ¼ hOiSUð2Þ ¼ 2 hOi1=
82
SUð2Þ Z 1 tr ð 2F þ ^ Þ ¼ hIðSÞiSUð2Þ
82 S
½17 1 SOð3Þ ¼ 2 hIðSÞi1=
4 SOð3Þ hIðSÞIðSÞiSUð2Þ ¼ hIðSÞIðSÞi1=
i i 1 SOð3Þ þ hOi1= ]ðS \ SÞ 2 3 bi
Therefore, as expected, the partition function of the twisted theory transforms as a modular form, while the topological correlation functions turn out to transform covariantly under SL(2, Z), following a pattern which can be reproduced with a far more simple topological abelian model. The second example considered next is the Vafa– Witten (1994) theory. This theory possesses two scalar supercharges, and has the unusual feature that the virtual dimension of its moduli space is exactly zero (it is an example of balanced TQFT), and therefore the only nontrivial topological observable is the partition function itself. Furthermore, the twisted theory does not contain spinors, so it is well defined on any compact, oriented 4-manifold. Now this theory computes, with the subtleties explained in Vafa and Witten (1994), the Euler characteristic of instanton moduli spaces. In fact, in this case in the generic partition function [10], X ZX ðGÞ ¼ qcðX;GÞ qk ðMk Þ ½18 bi
bi
k
(Mk ) is the Euler characteristic of a suitable compactification of the kth instanton moduli space Mk of gauge group G in X. As in the previous example, it is possible to consider nontrivial gauge configurations in G/Center (G) and compute the partition function for a fixed value of the ’t Hooft flux v 2 H 2 (X, 1 (G)). In this case, however, the Seiberg–Witten approach is not available, but, as conjectured by Vafa and Witten, one can nevertheless carry out computations in terms of the vacuum degrees of freedom of the N = 1 theory which results from giving bare masses to all the three chiral multiplets of the N = 4 theory. It should be noted that a similar approach was introduced by Witten (1994b) to obtain the first explicit results for the Donaldson–Witten theory just before the far more powerful Seiberg– Witten approach was available. As explained in detail by Vafa and Witten (1994), the twisted massive theory is topological on Ka¨hler 4-manifolds with h2, 0 6¼ 0, and the partition function is actually invariant under the perturbation. In the long-distance limit, the partition function is given as a finite sum over the contributions of the discrete massive vacua of the resulting N = 1 theory. In the case at hand, it turns that, for G = SU(N), the number of such vacua is given by the sum of the positive divisors of N. The contribution of each vacuum is universal (because of the mass gap), and can be fixed by comparing with known mathematical results (Vafa and Witten 1994). However, this is not the end of the story. In the twisted theory, the chiral superfields of the N = 4 theory are no longer scalars, so the mass terms cannot be invariant under the holonomy group of the manifold unless one of the mass parameters be a holomorphic 2-form !. (Incidentally, this is the origin of the constraint h(2, 0) 6¼ 0 mentioned above.) This spatially dependent mass term vanishes where ! does, and we will assume as in Vafa and Witten (1994) and Witten (1994b) that ! vanishes with multiplicity 1 on a union of disjoint, smooth complex curves Ci , i = 1, . . . , n of genus gi which represent the canonical divisor K of X. The vanishing of ! introduces corrections involving K whose precise form is not known a priori. In the G = SU(2) case, each of the N = 1 vacua bifurcates along each of the components Ci of the canonical divisor into two strongly coupled massive vacua. This vacuum degeneracy is believed to stem from the spontaneous breaking of a Z2 chiral symmetry which is unbroken in bulk (see, e.g., Vafa and Witten (1994) and Witten (1994b)). The structure of the corrections for G = SU(N) (see [19] below) suggests that the mechanism at work in this case is not chiral symmetry breaking. bi
bi
bi
bi
bi
bi
124 Duality in Topological Quantum Field Theory
Indeed, near any of the Ci ’s, there is an N-fold bifurcation of the vacuum. A plausible explanation for this degeneracy could be found in the spontaneous breaking of the center of the gauge group (which for G = SU(N) is precisely ZN ). In any case, the formula for SU(N) can be computed (at least when N is prime) along the lines explained by Vafa and Witten (1994) and assuming that the resulting partition function satisfies a set of nontrivial constraints which are described below. Then, for a given ’t Hooft flux v 2 H 2 (X, ZN ), the partition function for gauge group SU(N) (with prime N) is given by ! n N1 Y Y ð1gi Þ"i ; X Zv ¼ v;wN ðeÞ e i¼1 ¼0 =2 1 GðqN Þ þ N 1b1 N2 " !# N 1 Y n N 1 X X m; 1gi ð2i=NÞv½Ci N e m¼0 i¼1 ¼0 =2 Gðm q1=N Þ 2 eiððN1Þ=NÞmv ½19 N2 bi
where = exp (2i=N), G(q) = (q)24 (with (q) the Dedekind function), are the SU(N) characters at level 1 and m, are certain linear combinations thereof. [Ci ]N is the reduction modulo N of the Poincare´ dual of Ci , and wN ðeÞ ¼
n X
"i ½Ci N
½20
i¼1
where "i = 0, 1, . . . , N 1 are chosen independently. Equation [19] has the expected properties under the modular group: 2
Zv ð þ 1Þ ¼ eði=12ÞNð2þ3 Þ eiððN1Þ=NÞv Zv ð Þ =2 Zv ð1= Þ ¼ N b2 =2 X i eð2iuv=NÞ Zu ð Þ
ZXb;bv ð 0 Þ ¼ ZX;v ð 0 Þ
ð 0 Þ ð 0 Þ
½23
Precisely the same behavior under blow-ups of the partition function [19] has been proved for the generating function of Euler characteristics of instanton moduli spaces on Ka¨hler manifolds. This should not come as a surprise since, as mentioned above, on certain 4-manifolds, the partition function of Vafa–Witten theory computes the Euler characteristics of instanton moduli spaces. Therefore, [19] can be seen as a prediction for the Euler numbers of instanton moduli spaces on those 4-manifolds. Finally, the third twisted N = 4 theory also possesses two scalar supercharges, and is believed to be a certain deformation of the four-dimensional BF theory, and as such it describes essentially intersection theory on the moduli space of complexified gauge connections. In addition to this, the theory is ‘‘amphicheiral,’’ which means that it is invariant to a reversal of the orientation of the spacetime manifold. The terminology is borrowed from knot theory, where an oriented knot is said to be amphicheiral if, crudely speaking, it is equivalent to its mirror image. From this property, it follows that the topological invariants of the theory are completely independent of the complexified coupling constant .
½21
u b1 1 and Z0 and ZSU(N)=ZN = P also, with ZSU(N) = N v Zv , =2 ZSUðNÞ=ZN ð Þ ½22 ZSUðNÞ ð1= Þ ¼ N =2 i
which is, up to some correction factors that vanish in flat space, the original Montonen–Olive conjecture! There is a further property to be checked which concerns the behavior of [19] under blow-ups. This property was heavily used by Vafa and Witten (1994) and demanding it in the present case was bi
essential in deriving the above formula. Blowing up a point on a Ka¨hler manifold X replaces it with a b whose second cohomology new Ka¨hler manifold X b Z) = H 2 (X, Z) I , where I is the lattice is H 2 (X, one-dimensional lattice spanned by the Poincare´ dual of the exceptional divisor B created by the b is of the form blow-up. Any allowed ZN flux b v on X b v = v r, where v is a flux in X and r = B, = 0, 1, . . . , N 1. The main result concerning [19] is that under blowing up a point on a Ka¨hler 4-manifold with canonical divisor as above, the partition functions for fixed ’t Hooft fluxes have a factorization as
See also: Donaldson–Witten Theory; Electric–Magnetic Duality; Hopf Algebras and q-Deformation Quantum Groups; Large-N and Topological Strings; Seiberg– Witten Theory; Topological Quantum Field Theory: Overview.
Further Reading Birmingham D, Blau M, Rakowski M, and Thompson G (1991) Topological field theories. Physics Reports 209: 129–340. Cordes S, Moore G, and Rangoolam S (1996) Lectures on 2D Yang– Mills theory, equivariant cohomology and topological field theory. In: David F, Ginsparg P, and Zinn-Justin J (eds.) Fluctuating Geometries in Statistical Mechanics and Field Theory, Les Houches Session LXII, p. 505, hep-th/9411210. Elsevier.
Dynamical Systems and Thermodynamics Gopakumar R and Vafa C (1998) Topological gravity as large N topological gauge theory. Advances in Theoretical and Mathematical Physics 2: 413–442, hep-th/9802016; On the gauge theory/geometry correspondence. Advances in Theoretical and Mathematical Physics 3: 1415–1443, hep-th/9811131. Labastida JMF and Lozano C (1998) Lectures on topological quantum field theory. In: Falomir H, Gamboa R, and Schaposnik F (eds.) Proceedings of the CERN-Santiago de Compostela-La Plata Meeting, Trends in Theoretical Physics, pp. 54–93, hep-th/ 9709192. New York: American Institute of Physics. Lozano C (1999) Duality in Topological Quantum Field Theories. Ph.D. thesis, U. Santiago de Compostela, arXiv: hep-th/ 9907123. Maldacena J (1998) The large N limit of superconformal field theories and supergravity. Advances in Theoretical and Mathematical Physics 2: 231–252, hep-th/9711200. Montonen C and Olive D (1977) Magnetic monopoles as gauge particles? Physics Letters B 72: 117–120. Moore G and Witten E (1998) Integration over the u-plane in Donaldson theory. Advances in Theoretical and Mathematical Physics 1: 298–387, hep-th/9709193. Seiberg N and Witten E (1994a) Electric–magnetic duality, monopole condensation, and confinement in N = 2
125
supersymmetric Yang–Mills theory. Nuclear Physics B 426: 19–52, hep-th/9407087. Seiberg N and Witten E (1994b) Electric–magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang–Mills theory. – erratum. Nuclear Physics B 430: 485–486. Seiberg N and Witten E (1994c) Monopoles, duality and chiral symmetry breaking in N = 2 supersymmetric QCD. Nuclear Physics B 431: 484–550, hep-th/9408099. Vafa C and Witten E (1994) A strong coupling test of S-duality. Nuclear Physics B 431: 3–77, hep-th/9408074. Witten E (1988b) Topological quantum field theory. Communications in Mathematical Physics 117: 353–386. Witten E (1988a) Topological sigma models. Communications in Mathematical Physics 118: 411–449. Witten E (1989) Quantum field theory and the Jones polynomial. Communications in Mathematical Physics 121: 351–399. Witten E (1994a) Monopoles and four-manifolds. Mathematical Research Letters 1: 769–796, hep-th/9411102. Witten E (1994b) Supersymmetric Yang–Mills theory on a fourmanifold. Journal of Mathematical Physics 35: 5101–5135, hep-th/9403195.
Dynamical Systems and Thermodynamics A Carati, L Galgani and A Giorgilli, Universita` di Milano, Milan, Italy ª 2006 Elsevier Ltd. All rights reserved.
Introduction The relations between thermodynamics and dynamics are dealt with by statistical mechanics. For a given dynamical system of Hamiltonian type in a classical framework, it is usually assumed that a dynamical foundation for equilibrium statistical mechanics, namely for the use of the familiar Gibbs ensembles, is guaranteed if one can prove that the system is ergodic, that is, has no integrals of motion apart from the Hamiltonian itself. One of the main consequences is then that classical mechanics fails in explaining thermodynamics at low temperatures (e.g., the specific heats of crystals or of polyatomic molecules at low temperatures, or the black body problem), because the classical equilibrium ensembles lead to equipartition of energy for a system of weakly coupled oscillators, against Nernst’s third principle. This is actually the problem that historically led to the birth of quantum mechanics, equipartition being replaced by Planck’s law. At a given temperature T, the mean energy of an oscillator of angular frequency ! is not kB T (kB being the Boltzmann constant), and thus is not independent of frequency (equipartition), but
decreases to zero exponentially fast as frequency increases. Thus, the problem of a dynamical foundation for classical statistical mechanics would be reduced to ascertaining whether the Hamiltonian systems of physical interest are ergodic or not. It is just in this spirit that many mathematical works were recently addressed at proving ergodicity for systems of hard spheres, or more generally for systems which are expected to be not only ergodic but even hyperbolic. However, a new perspective was opened in the year 1955, with the celebrated paper of Fermi, Pasta, and Ulam (FPU), which constituted the last scientific work of Fermi. The FPU paper was concerned with numerical computations on a system of N (actually, 32 or 64) equal particles on a line, each interacting with the two adjacent ones through nonlinear springs, certain boundary conditions having been assigned (fixed ends). The model mimics a one-dimensional crystal (or also a string), and can be described in the familiar way as a perturbation of a system of N normal modes, which diagonalize the corresponding linearized system. The initial conditions corresponded to the excitation of only a few lowfrequency modes, and it was expected that energy would rather quickly flow to the high-frequency modes, thus establishing equipartition of energy, in agreement with the predictions of classical equilibrium statistical mechanics. But this did not occur within the available computation times, and the
126 Dynamical Systems and Thermodynamics
energy rather appeared to remain confined within a packet of low-frequency modes having a certain width, as if being in a state of apparent equilibrium of a nonstandard type. This fact can be called ‘‘the FPU paradox.’’ In the words of Ulam, written as a comment in Fermi’s Collected Papers, this is described as follows: ‘‘The results of the computations were interesting and quite surprising to Fermi. He expressed the opinion that they really constituted a little discovery in providing intimations that the prevalent beliefs in the universality of mixing and thermalization in nonlinear systems may not be always justified.’’ The FPU paper immediately had a very strong impact on the theory of dynamical systems, because it motivated all the modern theory of infinitedimensional integrable systems and solitons (KdV equation), starting from the works of Zabusky and Kruskal (1965). But in this way the FPU paradox was somehow enhanced, because the FPU system turned out to be associated to the class of integrable systems, namely the systems having a number of integrals of motion equal to the number of degrees of freedom, which are in a sense the most antithermodynamic systems. The merit of establishing a bridge towards ergodicity goes to Izrailev and Chirikov (1966). Making reference to the most advanced results then available in the perturbation theory for nearly integrable systems (KAM theory), these authors pointed out that ergodicity, and thus equipartition, would be recovered if one took initial data with a sufficiently large energy. And this was actually found to be the case. Moreover, it turned out that their work, and its subsequent completion by Shepelyanski, was often interpreted as supporting the conjecture that the FPU paradox would disappear in the thermodynamic limit (infinitely many particles, with finite density and energy density). The opposite conjecture was advanced in the year 1970 by Bocchieri, Scotti, Bearzi, and Loinger, and its relevance for the relations between classical and quantum mechanics was immediately pointed out by Cercignani, Galgani, and Scotti. A long debate then followed. Possibly, some misunderstandings occurred, because in the discussions concerning the dynamical aspects of the problem reference was generally made to notions involving infinite times. In fact, it had not yet been conceived that the FPU equilibrium might actually be an apparent one, corresponding to some type of intermediate metaequilibrium state. This was for the first time suggested by researchers in Parisi’s group in the year 1982. The analogy of such a situation with that occurring in glasses was pointed out more recently. bi
In the present article, the state of the art of the FPU problem is discussed. The thesis of the present authors is that the FPU phenomenon survives in the thermodynamic limit, in the last mentioned sense, namely that at sufficiently low temperatures there exists a kind of metaequilibrium state surviving for extremely long times. The corresponding thermodynamics turns out to be different from the standard one predicted by the equilibrium ensembles, inasmuch as it presents qualitatively some quantum-like features (typically, specific heats in agreement with Nernst’s third principle). The key point, with respect to equilibrium statistical mechanics, is that the internal thermodynamic energy should be identified not with the whole mechanical energy, but only with a suitable fraction of it, to be identified through its dynamical properties, as was suggested more than a century ago by Boltzmann himself, and later by Nernst. Here, it is first discussed why nearly integrable systems can be expected to present the FPU phenomenon. Then the latter is illustrated. Finally, some hints are given for the corresponding thermodynamics.
Nearly Integrable versus Hyperbolic Systems, and the Question of the Rates of Thermalization As mentioned above, it is usually assumed that the problem of providing a dynamical foundation to classical statistical mechanics is reduced to the mathematical problem of ascertaining whether the Hamiltonian systems of physical interest are ergodic or not. However, there remains open a subtler problem. Indeed, the notion of ergodicity involves the limit of an infinite time (time averages should converge to ensemble averages as t ! 1), while intermediate times might be relevant. In this connection it is convenient to distinguish between two classes of dynamical systems, namely the hyperbolic and the nearly integrable ones. The first class, in a sense the prototype of chaotic systems, should include the systems of hard spheres (extensively studied after the classical works of Sinai), or more generally the systems of mass points with mutual repulsive interactions. For such systems it can be expected that the time averages of the relevant dynamical quantities in an extremely short time converge to the corresponding ensemble averages, so that the classical equilibrium ensembles could be safely used. A completely different situation occurs for the dynamical systems such as the FPU systems, which are nearly integrable, that is, are perturbations of
Dynamical Systems and Thermodynamics
reference to Figures 1–8, which are the results of numerical integrations of the FPU dynamical system. If x1 , . . . , xN denote the positions of the particles (of unitary mass), or more precisely the displacements from their equilibrium positions, and pi the corresponding momenta, the Hamiltonian is H¼
We now illustrate the FPU phenomenon, following essentially its historical development. We will make
i
2
þ
N þ1 X
Vðri Þ
i¼1
0.02 8
10
20
80 60 40 Time (K-units)
100
200
800 600 400 Time (K-units)
1000
6 4 Time (K-units)
0.0
2
0
20 Mode
0
20 Mode
0
20 Mode
0.02 0.0 0.0
0.02
0.02
E1
0.04
0
0.04
0.0
0.02
E1
0.04
0
0.04
0.0
E1, . . . ,E8 0.04 0.02
0.04
where ri = xi xi1 and one has taken a potential V(r) = r2 =2 þ r3 =3 þ r4 =4 depending on two positive parameters and . Boundary conditions with fixed ends, namely x0 = xNþ1 = 0, are considered. We recall that the angular frequencies
0
The FPU Phenomenon: Historical and Conceptual Developments
N X p2 i¼1
0.0
systems having a number of integrals of motion equal to the number of degrees of freedom. Indeed, in such a case ergodicity means that the addition of an interaction, no matter how small, makes an integrable system lose all of its integrals of motion, apart from the Hamiltonian itself. And, in fact, this quite remarkable property was already proved to be generic by Poincare´, through a set of considerations which had a fundamental impact on the theory of dynamical systems itself. In view of its importance for the foundations of statistical mechanics, the proof given by Poincare´ was reconsidered by Fermi, who added a subtle contribution concerning the role of single invariant surfaces. It is just to such a paper that Ulam makes reference in his comment to the FPU work mentioned above, when he says: ‘‘Fermi’s earlier interest in the ergodic theory is one motive’’ for the FPU work. The point is that the picture which looks at the ergodicity induced on an integrable system by the addition of a perturbation, no matter how small, somehow lacks continuity. One might expect that, in situations in which the nonlinear interaction which destroys the integrals of motion is very small (i.e., at low temperatures), the underlying integrable structure should somehow be still appreciable, in some continuous way. In fact, continuity should be recovered by making a question of times, namely by considering the rates of thermalization (to use the very FPU phrase), or equivalently the relaxation times, namely the times needed for the time averages of the relevant dynamical quantities to converge to the corresponding ensemble averages. By continuity, one clearly expects that the relaxation times diverge as the perturbation tends to zero. But more complicated situations might occur, as, for example, the existence of two (or more) relevant timescales. The point of view that timescales of different orders of magnitude might occur in dynamical systems (with the exhibition of an interesting example) and that this might be relevant for statistical mechanics, was discussed by Poincare´ himself in the year 1906. Indeed, he denotes as ‘‘first-order very large time’’ a time which is sufficient for a system to reach a ‘‘provisional equilibrium,’’ whereas he denotes as ‘‘second-order very large time’’ a time which is necessary for the system to reach its ‘‘definitive equilibrium.’’
127
Figure 1 The FPU paradox: normal-mode energies Ej versus time (left) and energy spectrum, namely time average of Ej versus j (right) for three different timescales. The energy, initially given to the lowest-frequency mode, does not flow to the highfrequency modes within the accessible observation time. Here, N = 32 and E = 0:05.
100
102
0.05 0
2
4 6 Time (K-units)
8
10
2
4 6 Time (K-units)
8
10
2
6 4 Time (K-units)
8
10
104
0.0
0.0
E1, . . . ,E8 0.05
Time-averaged energies 10–2 10–4 10–3
0.10
0.10
128 Dynamical Systems and Thermodynamics
0
20 Mode
1.0
20 Mode
10
0
5
10 5
0
0
E1, . . . ,E8
0.0
0.5
E1, . . . ,E8 0.5 0 0
0
0
20 Mode
Time-averaged energies 10–1 100
101
Figure 3 The Izrailev–Chirikov contribution: for a fixed observation time, equipartition is attained if the initial energy E is high enough. Here, from top to bottom, E = 0:1, 1, 10.
10–2
of the corresponding normal modes are !j = 2 sin [ j=2(N þ 1)], with j = 1, . . . , N; it is thus convenient to take as time unit the value , which is essentially, for any N, the period of the fastest normal mode. The original FPU result is illustrated in Figures 1 and 2. Here N = 32, = = 1=4, and the total energy is E = 0.05; the energy was given initially to the first normal mode (with vanishing potential energy). Three timescales (increasing from top to bottom) are considered, the top one corresponding to the timescale of the original FPU paper. In the boxes on the left the energies Ej (t) of modes j are reported versus time (j = 1, . . . , 8 at top, j = 1 at center and bottom). In the boxes on the right we report the corresponding spectra, namely the time average (up to the respective final times) of the energy of mode j versus j, for 1 j N. In Figure 2 we report, for the same run of Figure 1, the time averages of the energies of the various modes versus time; this figure corresponds to the last one of the original FPU work. The facts to be noticed in connection with these two figures are the following: (1) the spectrum (namely the distribution of energy among the modes, in time average) appears to have relaxed very quickly to some form, which remains essentially unchanged up to the maximum observed time; (2) there is no global equipartition, but only a partial one, because the energy remains confined within a group of low-frequency modes, which form a small packet of a certain definite width; and (3) the time evolutions of the mode energies appear to be of quasiperiodic type, since longer and longer quasiperiods can be observed as the total time increases.
1.0
Time (K-units) Figure 2 The FPU paradox: time averages of the energies of the modes 1, 2, . . . , 8 (from top to bottom) versus time for the same run as Figure 1. The spectrum has reached an apparent equilibrium, different from that of equipartition predicted by classical equilibrium statistical mechanics. An exponential decay of the tail is clearly exhibited.
10–1
2
4
100
2 4 101 Time (K-units)
2
4
102
Figure 4 The Izrailev–Chirikov contribution: time averages of the mode energies versus time for the same run as at bottom of Figure 3.
129
Dynamical Systems and Thermodynamics
After the works of Zabusky and Kruskal, by which the FPU system was somehow assimilated to an integrable system, the bridge toward ergodicity was made by Izrailev and Chirikov (1966), through the idea that there should exist a stochasticity threshold. Making reference to KAM theory, which had just been formulated in the framework of perturbation theory for nearly integrable systems, their main remark was as follows. It is known that KAM theory, which essentially guarantees a behavior similar to that of an integrable system, applies only if the perturbation is smaller than a certain threshold; on the other hand, in the FPU model the natural perturbation parameter is the energy E of the system. Thus, the FPU phenomenon can be expected to disappear above a certain threshold energy Ec . This is indeed the case, as illustrated in Figures 3 and 4. The parameters , and the class of initial data are as in Figure 1. In Figure 3 the total time is kept fixed (at 10 000 units), whereas the energy E is increased in passing from top to bottom, actually from E = 0.1 to E = 1 and E = 10. One sees that at E = 10 equipartition is attained within the given observation time; correspondingly, the motion of the modes visually appears to be nonregular. The approach to equipartition at E = 10 is clearly exhibited in Figure 4, where the time averages of the energies are reported versus time. There naturally arose the problem of the dependence of the threshold Ec on the number N of degrees of freedom (and also on the class of initial data). Certain semianalytical considerations of Izrailev and Chirikov were generally interpreted as suggesting that the threshold should vanish in the thermodynamic limit for initial excitations of high-frequency modes. Recently, Shepelyanski completed the analysis by showing that the threshold should vanish also for initial excitations of the low-frequency modes, as in the original FPU work (see, however, the subsequent paper by Ponno mentioned below). If this were true, the FPU phenomenon would disappear in the thermodynamic limit. In particular, the equipartition principle would be dynamically justified at all temperatures. The opposite conjecture was advanced by Bocchieri et al. (1970). This was based on numerical calculations, which indicated that the energy threshold should be proportional to N, namely that the FPU phenomenon persists in the thermodynamic limit provided the specific energy = E=N is below a critical value c , which should be definitely nonvanishing. Actually, the computations were performed on a slightly different model, in which nearby particles were interacting through a more physical Lennard-Jones potential. By taking concrete values bi
bi
having a physical significance, namely the values commonly assumed for argon, for the threshold of the specific energy they found the value c ’ 0.04V0 , where V0 is the depth of the Lennard-Jones potential well. This corresponds to a critical temperature of the order of a few kelvin. The relevance of such a conjecture (persistence of the FPU phenomenon in the thermodynamic limit) was soon strongly emphasized by Cercignani, Galgani, and Scotti, who also tried to establish a connection between the FPU spectrum and Planck’s distribution. Up to this point, the discussion was concerned with the alternative whether the FPU system is ergodic or not, and thus reference was made to properties holding in the limit t ! 1. Correspondingly, one was making reference to KAM theory, namely to the possible existence of surfaces (Ndimensional tori) which should be dynamically invariant (for all times). The first paper in which attention was drawn to the problem of estimating the relaxation times to equilibrium was by Fucito et al. (1982). The model considered was actually a different one (the so-called 4 model), but the results can also be extended to the FPU model. Analytical and numerical indications were given for the existence of two timescales. In a short time the system was found to relax to a state characterized by an FPU-like spectrum, with a plateau at the low frequencies, followed by an exponential tail. This, however, appeared as being a sort of metastable state. In their words: ‘‘The nonequilibrium spectrum may persist for extremely long times, and may be mistaken for a stationary state if the observation time is not sufficiently long.’’ Indeed, on a second much larger timescale the slope of the exponential tail was found to increase logarithmically with time, with a rate which decreases to zero with the energy. This is an indication that the time for equipartition should increase as an exponential with the inverse of the energy. This is indeed the picture that the present authors consider to be essentially correct, being supported by very recent numerical computations, and by analytical considerations. Curiously enough, however, such a picture was not fully appreciated until quite recently. Possibly, the reason is that the scientific community had to wait until becoming acquainted with two relevant aspects of the theory of dynamical systems, namely Nekhoroshev theory and the relations between KdV equation and resonant normal-form theory. The first step was the passage from KAM theory to Nekhoroshev theory. Let us recall that, whereas in KAM theory one looks for surfaces which are invariant (for all times), in Nekhoroshev theory one bi
130 Dynamical Systems and Thermodynamics
bi
2 1.8
Specific energy u (× 10–3)
looks instead for a kind of weak stability involving finite times, albeit ‘‘extremely long’’ ones, as they are found to increase as stretched exponentials with the inverse of the perturbative parameter. Thus, one meets with situations in which one can have instability over infinite times, while having a kind of practical stability up to exponentially long times. Notice that Nekhoroshev’s theory was formulated only in the year 1974, and that it started to be known in the West only in the early 1980s, just because of its interest for the FPU problem. Another interesting point is that just in those years one started to become acquainted with a related historical fact. Indeed, the idea that equipartition might require extremely long times, so that one would be confronted with situations of a practical lack of equipartition, has in fact a long tradition in statistical mechanics, going back to Boltzmann and Jeans, and later (in connection with sound dispersion in gases of polyatomic molecules) to Landau and Teller. In this way the idea of the existence of extremely long relaxation times to equipartition came to be accepted. The ingredient that was still lacking is the idea of a quick relaxation to a metastable state. The importance of this should not be overlooked. Indeed, without it one cannot at all have a thermodynamics different from the standard equilibrium one corresponding to equipartition. This was repeatedly emphasized, against Jeans, by Poincare´ on general grounds and by Nernst on empirical grounds. The full appreciation of this latter ingredient was obtained quite recently (although it had been clearly stated by Fucito et al. (1982)). A first hint in this direction came from the realization (see Figure 5) of a deep analogy between the FPU phenomenon and the phenomenology of glasses. Then there came a strong numerical indication by Berchialla, Galgani, and Giorgilli. Finally, from the analytical point of view, there was a suitable revisitation (by Ponno) of the traditional connection between the FPU system and the KdV equation with its solitons. The relevant points are the following: (1) the KdV equation describes well the solutions of the FPU problem (for initial data of FPU type) only on a ‘‘short’’ timescale, which increases as a power of 1=, and so describes only a first process of quick relaxation; (2) the corresponding spectrum has a very definite analytical form, the energy being ’ 1=4 and spread up to a maximal frequency !() then decaying exponentially; and (3) the relevant formulas contain the energy only through the specific energy , and thus can be expected to hold also in the thermodynamic limit. It should be mentioned, however, that all the results of an
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Temperature T (× 10–3) Figure 5 Analogy with glasses: the specific energy u of an FPU system is plotted versus temperature T for a cooling process (upper curve) and a heating process (lower curve). The FPU system is kept in contact with a heat reservoir, whose temperature is changed at a given rate. At low temperatures the system does not have time to reach the equilibrium curve u = T (with kB = 1).
analytic type mentioned above have a purely formal character, because up to now none of them was proved, in the thermodynamic limit, in the sense of rigorous perturbation theory. This requires a suitable readaptation of the known techniques, which is currently being obtained both in connection with Nekhoroshev’s theorem (in order to explain the extreme slowness of a possible final approach to equilibrium) and in connection with the normalform theory for partial differential equations (in order to explain the fast relaxation to the metaequilibrium state). In conclusion, for the case of initial conditions of the FPU type (excitation of a few low-frequency modes) the situation seems to be as follows. The first phenomenon that occurs in a ‘‘short’’ time (of the order of (1=)3=4 is a quick relaxation to the formation of what can be called a ‘‘natural packet’’ of low-frequency modes extending up to a certain ’ 1=4 . This is a phenomenon maximal frequency ! which has nothing to do with any diffusion in phase space. In fact, it shows up also for an integrable system such as a Toda lattice (as will be illustrated below), and should be described by a suitable resonant normal form related to the KdV equation. One has then to take into account the fact that the domain of the frequencies in the FPU model is bounded (! < 2 in the chosen units). Now, as the function !() is monotonic, this fact leads to the existence of a critical value c of the specific energy c ) = 2. Indeed, for > c the quick , defined by !( relaxation process leads altogether to equipartition. Below the threshold, instead, the same quick process leads to the formation of an FPU-like spectrum,
131
10–1 2
4
8 16 32 63 127 255 511 1023
10–4
2
103
4
Time
106
Frequency
100 2
Dynamical Systems and Thermodynamics
10–3
10–2
10–1
100
103
Time
106
Specific energy
10–4
10–3
10–2 Specific energy
10–1
100
Figure 6 Time needed to form a packet versus specific energy for the FPU model (bottom) and the corresponding Toda model (top). Different symbols refer to packets of different width. The existence of two timescales below a critical specific energy in the FPU model is exhibited.
involving only modes of sufficiently low frequency. This should, however, be a metastable state (which might be mistaken for a stationary one), which should be followed, on a second timescale, by a relaxation to the final equilibrium, through a sort of Arnol’d diffusion requiring extremely long Nekhoroshev-like times. This is actually the way in which the old idea of a threshold, originally conceived in terms of KAM tori, is now recovered even for ergodic systems, in terms of timescales. The existence of a process of quick relaxation, and of a threshold in the above-mentioned sense, is illustrated in Figures 6 and 7. In Figure 6 the lower part refers to the FPU model, while the upper one refers to a corresponding Toda model. The latter is in a sense the prototype of an integrable nonlinear system; with respect to the FPU case, the difference is that the potential V(r) is now exponential. The
10–6
10–3 Specific energy
100
Figure 7 Width of the natural packet versus specific energy, for N ranging from 8 to 1023. Reproduced from Berchialla L, Galgani L, and Giorgilli A (2004) Localization of energy in FPU chains, Discr. Cont. Dyn. Systems B 11: 855–866, with permission from American Institute of Mathematical Sciences.
parameters of the exponential were chosen so that the two models coincide up to cubic terms in the potential. With the energy given to the lowestfrequency mode, the figure shows the time needed in for several order that energy spreads up to a mode k, values of k, as a function of . It is seen that in the Toda model (top) there is formed a packet extending up to rather well-defined width, and that this occurs within a relaxation time increasing as a power of 1=. An analogous phenomenon occurs for the FPU model (bottom). The only difference is that, below a critical specific energy c ’ 0.1, there exists a subsequent relaxation time to equipartition, which involves a time growing faster than any inverse power of . Such a second phenomenon is due to the nonintegrable character of the FPU model. In Figure 7 the width of the natural packet for the FPU model is exhibited, by reporting the frequency of its highest mode as a function of . As one sees, ! the numerical results clearly indicate the existence of ’ 1=4 , which holds for a number of a relation ! degrees of freedom N ranging from 8 to 1023. This is actually the law which is predicted by resonant normal-form theory.
Boltzmann and Nernst Revisited All the results illustrated above refer to initial data of FPU type, namely with an excitation of a few low-frequency modes. However, from the point of view of statistical mechanics, such initial data are exceptional, and one should rather consider initial data extracted from the Gibbs distribution at a certain temperature. One can then couple the FPU system to a heat bath at a slightly different temperature, and look at the spectrum of the FPU system after a certain time. The result, for the case of a heat bath at a higher temperature, is shown in
132 Dynamical Systems and Thermodynamics
Time-averaged mode energy
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
0
20
40
60
80
100
Mode number Figure 8 A case of an FPU system initially at equilibrium and thus in equipartition. Spectrum of the FPU system after it was kept in contact with a heat reservoir at a higher temperature.
Figure 8. Clearly, here one has a situation similar to that occurring for initial data of FPU type, because only a packet of low-frequency modes exhibits a reaction, each of its modes actually adapting itself to the temperature of the bath, whereas the high-frequency modes do not react at all, that is, remain essentially frozen. This capability of reacting to external disturbances (which seems to pertain only to a fraction of the mechanical energy initially inserted into the system) can be characterized in a quantitative way through an estimate of the fluctuations of the total energy of the FPU system. This is indeed the sense of the fluctuation– dissipation theorem, the precursor of which is perhaps the contribution of Einstein to the first Solvay conference (1911). Through such a method, the specific heat of the FPU system is estimated (apart from a numerical factor) by the time average of [E(t) E(0)]2 , where E(t) is the energy, at time t, of the FPU system in dynamical contact with a heat bath (at the same temperature from which the initial data are extracted). Usually, in the spirit of ergodic theory, one looks at the infinite-time limit of such a quantity. But in the spirit of the metastable picture described above, one can check whether the time average presents a previous stabilization to some value smaller than the one predicted at equilibrium. Such a result, which is in qualitative agreement with the third principle, has indeed been obtained (by Carati and Galgani) recently. In conclusion, in situations of metaequilibrium such as those existing in the FPU model at low temperatures, a thermodynamics can still be formulated. Indeed, by virtue of the quick relaxation process described above, the time averages of the relevant quantities are found to stabilize in rather short times. In this way, one overcomes the critique of Poincare´ to Jeans, namely that one cannot have a thermodynamics at all if reference is made only to the existence of
extremely long relaxation times to the final equilibrium. A relaxation to a ‘‘provisional equilibrium’’ within a ‘‘first-order very large time’’ (to quote Poincare´) is required . The difference with respect to the standard equilibrium thermodynamics relies now in the mechanical interpretation of the first principle. Indeed, the internal thermodynamic energy is identified not with the whole mechanical energy, but just with that fraction of it which is capable of reacting in short times to the external perturbations. This is the way in which the old idea of Boltzmann (and Jeans) might perhaps be presently implemented. For what concerns the fraction of the mechanical energy which is not included in the thermodynamic internal energy, as not being able to react in relatively short times, this should somehow play the role of a zero-point energy. This was suggested in the year 1971 by C Cercignani. But in fact, such a concept was put forward by Nernst himself in an extremely speculative work in 1916, where he also advanced the concept that, for a system of oscillators of a given frequency, there should exist both dynamically ordered (geordnete) and dynamically chaotic (ungeordnete) motions, the latter being prevalent above a certain energy threshold. According to him, this fact should be relevant for a dynamical understanding of the third principle and of Planck’s law. It is well known that the modern theory of dynamical systems has led to familiarity with the (sometimes abused) notions of order and chaos and of a transition between them. One might say that the FPU work just forced the scientific community to take into account such notions in connection with the principle of equipartition of energy. It is really fascinating to see that the same notions, with the same terminology, had already been introduced much earlier on purely thermodynamic grounds, in connection with the relations between classical and quantum mechanics. See also: Boundary Control Method and Inverse Problems of Wave Propagation; Central Manifolds, Normal Forms; Ergodic Theory; Fourier Law; Gravitational N-body Problem (Classical); Newtonian Fluids and Thermohydraulics; Nonequilibrium Statistical Mechanics: Interaction between Theory and Numerical Simulations; Quantum Statistical Mechanics: Overview; Regularization for Dynamical Zeta Functions; Stability Theory and KAM; Toda Lattices; Weakly Coupled Oscillators.
Further Reading Benettin G, Galgani L, and Giorgilli A (1984) Boltzmann’s ultraviolet cutoff and Nekhoroshev’s theorem on Arnol’d diffusion. Nature 311: 444–445. Bocchieri P, Scotti A, Bearzi B, and Loinger A (1970) Anharmonic chain with Lennard-Jones interaction. Physical Review A 2: 2013–2019.
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves Boltzmann L (1966) Lectures on Gas Theory (translated by Brush SG, Vol. II, Sects. 43–45), Berkeley: The University of California Press. Fermi E (1923) Beweiss das ein mechanisches Normalsystem im allgemeinen quasi-ergodisch ist. In: Fermi E (ed.) (1965) Note e Memorie (Collected Papers), no. 11, vol. I, pp. 79–87. Roma: Accademia Nazionale dei Lincei. Fermi E, Pasta J, and Ulam S (1955) Studies of nonlinear problems. In: Fermi E (ed.) (1965) Note e Memorie (Collected Papers), no. 266, vol. II, pp. 977–988. Roma: Accademia Nazionale dei Lincei. Fucito E, Marchesoni F, Marinari E, Parisi G, Peliti L et al. (1982) Approach to equilibrium in a chain of nonlinear oscillators. Journal de Physique 43: 707–713. Galgani L and Scotti A (1972) Recent progress in classical nonlinear dynamics. Rivista del Nuovo Cimento 2: 189–209. Izrailev FM and Chirikov BV (1966) Statistical properties of a nonlinear string. Soviet Physics Doklady X 11: 30–32.
133
Nernst W (1916) U¨ber einen Versuch, von quantentheoretischen Betrachtungen zur Annahme stetiger Energiea¨nderungen zuru¨ckzukehren. Verhandlungen der Deutschen Physikalischen Gesellschaft 18: 83. Poincare´ H (1996) Reflexions sur la the´orie cine´tique des gas. Journal de Physique Theoretical Application. 5: 369–403 (in (1954) Oeuvres de Henry Poincare´, Tome IX: 597–619. Paris: Gauthier-Villars). Ponno A (2003) Soliton theory and the Fermi–Pasta–Ulam problem in the thermodynamic limit. Europhysics Letters 64: 606–612. Shepelyansky DI (1997) Low-energy chaos in the Fermi–Pasta–Ulam problem. Nonlinearity 10: 1331–1338. Zabusky NJ and Kruskal MD (1965) Interaction of ‘‘solitons’’ in a collisionless plasma and the recurrence of initial states. Physical Review Letters 15: 240–243.
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves O Goubet, Universite´ de Picardie Jules Verne, Amiens, France ª 2006 Elsevier Ltd. All rights reserved.
Introduction The purpose of this article is to describe some basic problems related to the interplay between dynamical systems and mathematical physics. Since it is impossible to be exhaustive in these topics, the focus here is on water-wave models. These mathematical models are described by partial differential equations that can be understood as dynamical systems in a suitable infinite-dimensional phase space. We will not address the original equations for two-dimensional (2D) surface water waves, even if we know that dynamical-system methods can help to exhibit some solitary waves for the equations. The reader is referred to relevant articles in this encyclopedia for details. Another approach is to seek these 2D surface water waves as saddle points for some Hamiltonians, which too is discussed elsewhere in this work. This article presents these arguments on some asymptotical models for the propagation of surface water waves.
Asymptotical Models in Hydrodynamics To begin with, consider an irrotational fluid in a canal that is governed by the Euler equations and
that is subject to gravitational forces. For a canal of finite depth, Boussinesq (1877) and Korteweg–de Vries (KdV) (1890) obtained the following model for unidirectional long waves: ut þ ux þ uxxx þ uux ¼ 0
½1
Sometimes we drop the ux term on the left-hand side of [1], thanks to a suitable change of coordinates. Alternatively, we can also deal with the so-called generalized KdV equation, which reads ut þ uxxx þ uk ux ¼ 0
½2
where k is a positive integer. There are also other models designed to represent long waves in shallow water. Let us introduce the regularized long-wave equation (also referred to as the Benjamin–Bona– Mahony equation) that reads ut utxx þ ux þ uux ¼ 0
½3
or the Camassa–Holm equation ut utxx þ 3uux ¼ 2ux uxx þ uuxxx
½4
For deep water, a well-known model was introduced by Zakharov (1968) iut þ uxx þ "juj2 u ¼ 0
½5
which describes the slow modulations of wave packets. Here the unknown u(x, t) takes values in C, and this nonlinear Schro¨ dinger equation is in fact a system. In these equations, " is either 1 or 1; throughout this article, we shall refer to the former case as the focusing case and to the latter
134 Dynamical Systems in Mathematical Physics: An Illustration from Water Waves
as the defocusing case. We may also substitute juj2p u in the nonlinear term in [5] to obtain alternate models. The variable t represents the time and the space variable x belongs either to R or to a finite interval when we are dealing with periodic flows. The above models are intended to describe the propagation of unidirectional waves. For two-way waves, see Bona et al. (2002). Actually, these equations feature particular solutions, the so-called traveling waves. Let us recall, for instance, that for generalized KdV equation [2] these solutions are bi
uðt; xÞ ¼ Qc ðx ctÞ pffiffiffi Qc ðxÞ ¼ c1=p Qð cxÞ QðxÞ ¼ ð3 ch2 ðpxÞÞ1=p
½6 ½7 ½8
These so-called solitons (Figure 1) move to the right without changing their shape; c is the speed of propagation. In real life, this phenomenon was observed by Russel (1834). Riding his horse, he was able to follow for miles the propagation of such a wave on the canal from Edinburgh to Glasgow. On the other hand, Camassa–Holm equations are designed to describe the propagation of peaked solitons as shown in Figure 2. Focusing nonlinear Schro¨dinger equations also feature solitary waves that read u! (t, x) = exp(i!t)Q(x), where Q is solution to
There are numerous examples of equations or systems of equations that model 2D surface water waves. Among all these models, a first issue is to identify the relevant models insofar as the dynamical properties are concerned. Indeed, we address here the question of stability of solitary waves (up to the symmetries of the equation). For instance, the orbital stability for cubic Schro¨dinger reads: for any " > 0, there exists a neighborhood of u! (x, 0) such that any trajectory starting from satisfies supt inf inf y jjuðtÞ expðiÞu! ðt; : yÞjjH1 " ½10 Another issue consists in the interaction of N solitons. Schneider and Wayne (2000) have addressed the issue of the validity of water-wave models when this interaction is concerned. Assume now that the validity of these models is granted. To consider [1] or [5] as a dynamical system, the next issue is then to consider the initialvalue problem. bi
The Initial-Value Problem Let us supplement these equations with initial data u0 in some Sobolev space. We shall consider either H s ðRÞ ¼
Z u; ð1 þ jj2 Þs j^ uðÞj2 d < þ1
½11
R
Qxx !Q þ Q2pþ1 ¼ 0
Figure 1 A soliton.
½9
in the case where x belongs to the whole line, or the corresponding Sobolev space with periodic boundary conditions. It should be examined whether these equations provide a continuous flow S(t) : u0 ! u(t) in these functional spaces (at least locally in time). We would like to point out that for each Sobolev space under consideration, we may have a different flow. This fact is at the heart of infinite-dimensional dynamical systems. The initial-value problem was a challenge for decades for low norms, that is for small s. The last breakthrough was performed by Bourgain (1993). Let us present the method for KdV equation. Consider U(t)u0 the solution of the Airy equation ut þ uxxx ¼ 0;
uð0Þ ¼ u0
½12
Without going into further details, the idea is to perform a fixed-point argument to the Duhamel’s form of the equation,
Figure 2 Peaked soliton.
uðtÞ ¼ UðtÞu0
1 2
Z
t 0
Uðt sÞ@x ðu2 ðsÞÞds
½13
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves
in a suitable mixed-spacetime Banach space whose norm reads kU(t)u(t, x)kHtb Hx . This relies on fine properties in harmonic analysis. Thanks to this method, we know that the Schro¨dinger equation [5] and the KdV equation [1] are well-posed in, respectively, H s (R), s 0 and H s (R), s > 3/4, locally in time. For the periodic case, the results are slightly different. We would like to point out that both KdV and nonlinear Schro¨dinger equations provide semigroups S(t) that do not feature smoothing effect. A trajectory that starts from Hs remains in Hs ; indeed, we can also solve these partial differential equations backward in time. The next issue is to determine if these flows are defined for all times. Loosely speaking, the following alternative holds true: either the local flow in H s extends to a global one, or some blow-up phenomenon occurs, that is, jjS(t)u0 jjHs collapses in finite time. To this R end, let us observe that, for instance, the mass R ju(x)j2 dx is conserved for both KdV and nonlinear Schro¨dinger flows. Therefore, one can prove that the solutions in L2 are global in time. It is worthwhile to observe that the Bourgain method also provides some global existence results below the energy norm. Consider now the flow of the solutions in H 1 . The second invariant for nonlinear Schro¨dinger equations reads Z
jux ðxÞj2 dx R
" juðxÞj2pþ2 dx pþ1
Z R
pþ1
p
juðxÞj2pþ2 dx Cp kukL2 kux kL2
which is critical in 1D, this conformal invariance states that if u(t, x) is solution, then vðt; xÞ ¼ jtj1=2 exp
½15
Therefore, since the mass is constant, the second invariant controls the H1 norm of the solution if p < 2. Note that the critical power of the nonlinearity depends also on the dimension of the space; it is the cubic Schro¨dinger that is critical in H 1 (R 2 ). It is well known that, for some initial data, blow-up phenomena can occur for 2D cubic Schro¨dinger equations. Moreover, the behavior of blow-up solutions is more or less understood. This analysis was performed using the conformal invariance of the equation. For quintic Schro¨dinger equation,
2 ix 1 x ; u t t 4t
½16
is also solution. On the other hand, for the generalized KdV equation, there is no conformal invariance and the blow-up issue had been open for years. There was some numerical evidence that blow-up can occur for k = 4. Recently, Martel and Merle (2002) have given a complete description of the blow-up profile for this equation. Their methods are quite complex and rely on an ejection of mass at infinity in a suitable coordinate system. In the discussion so far we have presented some quantities that are invariant by the flow of the solutions. This is related to the Hamiltonian structure of the dynamical systems under consideration. bi
Hamiltonian Systems in Hydrodynamics The study of Hamiltonian systems has developed beyond celestial mechanics (the famous n-body problems) to other fields in mathematical physics. We focus here on dynamical systems that read
½14
Therefore, the local solutions in H 1 extend to global ones in the defocusing case (" = 1). In the focusing case, the situation is more contrasted. The solution is global if the nonlinearity is less than an H 1 -critical value (p = 2 for Schro¨dinger, and k = 4 for generalized KdV equation). This critical value depends on some Sobolev embeddings as
135
ut ¼ J
@ HðuÞ @u
½17
where H is the Hamiltonian and J some skewsymmetric operator. For instance, [1] is a Hamiltonian system with J = @x (i.e., an unbounded skewsymmetric operator) and HðuÞ ¼
1 2
Z
1 u2 u2x dx þ 6
Z
u3 dx
½18
There is a subclass of Hamiltonian systems that are integrable by inverse-scattering methods. For instance, [1] belongs to this class. Indeed, these methods give a complete description of the asymptotics when t ! 1. It is well known (Deift and Zhou 1993) that, asymptotically, any solution to KdV equation consists of a wave train moving to the right in the physical space up to a dispersive part moving to the left. On the other hand, a generic Hamiltonian system is not integrable. The study of the asymptotics and of the dynamical properties of such a system deserves another analysis. We say that a system features asymptotic completeness if there exist uþ
136 Dynamical Systems in Mathematical Physics: An Illustration from Water Waves
and u such that the solution u(t) of [17] supplemented with initial data u0 satisfies kuðtÞ UðtÞuþ k ! 0
½19
kuðtÞ UðtÞu k ! 0
½20
when, respectively, t ! þ1 or t ! 1. Here U(t)u0 is the solution of the free equation, that is, the associated linear equation, supplemented with initial data u0 ; for instance, the Airy equation is the free equation related to the KdV equation. The operators u ! u0 ! uþ are called wave operators. This is related to the Bohr’s transition in quantum mechanics. Loosely speaking, we are able to prove these scattering properties for high powers in the nonlinearity for subcritical defocusing Schro¨dinger equations. The asymptotics of trajectories can be more complicated. Let us recall that the stability of traveling waves is also an important issue in understanding the dynamical properties of these models. For instance, let us point out that Martel and Merle proved the asymptotic stability of the sum of N solitons for KdV in the subcritical case. Beyond these asymptotics we are interested in the case where the permanent regime is chaotic (or turbulent). A scenario is that there exist quasiperiodic solutions of arbitrarily order N for the system under consideration. The next challenge about these Hamiltonian systems is to apply the Kolmogorov– Arnol’d–Moser theory to exhibit this type of solutions to systems like [17]. Here we restrict our discussion to the case of bounded domains, with either periodic or homogeneous Dirichlet conditions. Then, let us introduce the following definition: a solution is quasiperiodic if there exist a finite number N of frequencies !k such that uðt; xÞ ¼
N X
ul ðxÞ expði!k tÞ
½21
l¼1
This extends the case of periodic solutions (N = 1), which are isomorphic to the torus. To prove the existence of such structures, one idea is then to imbed N-dimensional invariant tori into the phase space of solutions. One may approximate the infinite-dimensional Hamiltonian by a sequence of finite ones and consider the convergence of iterated symplectic transformations, or one solves directly some nonlinear functional equation. Actually, the difficulty is that resonances can occur. Resonances occur when there are some linear combinations of the frequencies that vanish (or that are arbitrarily close to 0). This introduces a small divisor problem
in a phase space that has infinite dimension. To overcome these difficulties, a Nash–Moser scheme can be implemented (Craig 1996). There are numerous such open problems. For instance, let us observe that known results are essentially only for the case where the dimension of the ambient space is 1. On the other hand, quasiperiodic solutions correspond to N-dimensional invariant tori for the flow of solutions; one may seek for Lagrangian invariant tori that correspond to the case where N = þ1. Current research is directed towards extending this analysis. Another issue is to seek invariant measures for these Hamiltonian dynamical systems, as in statistical mechanics. Bourgain was successful in performing this analysis for some nonlinear Schro¨dinger equations either in the case of periodic boundary conditions or in the whole space. This result is an important step in the ergodic analysis of our Hamiltonian dynamical systems. This could explain the Poincare´ recurrence phenomena observed numerically for these types of equations: some particular solutions seem to come back to their initial state after a transient time. This point will not be developed here. All these results are properties of conservative dynamical systems. We now address the case when some dissipation takes place. bi
Dissipative Water-Wave Models To model the effect of viscosity on 2D surface water waves, we go back to a flow governed by the Navier–Stokes equations and we proceed to obtain damped equations (Ott and Sudan 1970, Kakutani and Matsuuchi 1975). In fact, the damping in KdV equations can be either a diffusion term that leads to study the equation ut þ uxxx þ uux ¼ uxx
½22
where is a positive number analogous to the viscosity, or a zero-order term u on the righthand side of [22]. In the first case, we obtain a KdV–Burgers equation that has some smoothing effect in time. In the second case, we have a zeroorder dissipation term. A nonlocal term would be F 1 (jj2 u()) for 2 [0, 1], where F (u) = uˆ ˆ denotes the Fourier transform of u. A first issue concerning damped water-wave equations is to estimate the decay rate of the solutions towards the equilibrium (no decay) when t ! þ1. For [22] the ultimate result is that, for initial data u0 2 L1 (R) \ L2 (R), the L2 norm of the solution decays like t1=4 (Amick et al. 1989). Energy methods have been developed to handle these problems, as the Shonbeck’s splitting method.
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves
The center manifold theory is another approach that is employed in dynamical systems. The aim is to prove the existence of a finite-dimensional manifold that is invariant (in a neighborhood of the origin) by the flow of the solutions and that attracts the other trajectories with high speed. Therefore, this manifold, and the trajectories therein, monitor the decay rate of the solutions towards the origin. The construction of such a manifold relies on splitting properties of the spectrum of the associated linearized operator (Gallay and Wayne 2002). Using a suitable change of variables (that moves the continuous spectrum away from the origin), Gallay and Wayne were able to construct such a manifold in an infinite-dimensional phase space. Another issue is the understanding of the dynamics for damped–forced water-wave equations as bi
ut þ uux þ uxxx þ u ¼ f ðxÞ
½23
The dynamical system approach is the attractor theory (Temam 1997). Equations such as [23] provide dissipative semigroups S(t) in some energy spaces. The theory has developed for years and we know that these dynamical systems feature global attractors. A global attractor is a compact subset in the energy space under consideration which is invariant by the flow of the solutions and that attracts all the trajectories when t ! þ1. Moreover, if we deal with periodic boundary conditions, this global attractor has finite fractal (or Hausdorff) dimension. This dimension depends on the data concerning and f. Actually, eqn [23] provides semigroups either in L2 (R), H 1 (R), or in H 2 (R). These three dynamical systems feature global attractors A0 , A1 , A2 . From the viewpoint of physics, the attractors describe the permanent regime of the flow. One may wonder if this permanent regime depends on the space chosen for the mathematical study. Eventually, the last result for this issue establish that A0 = A1 = A2 . This property is equivalent to prove the asymptotical smoothing effect for the associated semigroup: even if S(t) is not a smoothing operator for finite t, then all solutions converge to a smooth set when t goes to the infinity. All these results are for subcritical nonlinearities. As already noted, dissipation provides smoothing at infinity. Nevertheless, damping does not prevent blow-up. Let us illuminate this by the following result due to Tsutsumi (1984). The damped Schro¨dinger equation bi
bi
iut þ iu þ uxx þ juj2p u ¼ 0
½24
137
features blow-up solutions in H1 (R) for p > 2, even if all solutions are damped in L2 (R) with exponential speed. This completes the discussion of damped–forced water-wave equations. We now consider equations that are forced with a random forcing term.
Stochastic Water-Wave Models During the modeling process that led to KdV or Schro¨dinger equations from Euler equation, we have neglected some low-order terms. We now model these terms by a noise and we are led to a new randomly forced dynamical system that reads ut þ ux þ uxxx þ uux ¼ _
½25
Here one may assume that (x, t) is a Gaussian process with correlations _ tÞðy; _ sÞÞ ¼ xy ts Eððx;
½26
that is, a spacetime white noise. The parameter is the amplitude of the process. Unfortunately, due to the lack of smoothing effect of KdV or Schro¨dinger equations, it is more convenient to work with a noise that is correlated in space, satisfying _ tÞðy; _ sÞÞ ¼ cðx yÞts Eððx;
½27
here c(x y) is some smooth ansatz for xy , defined from some Hilbert–Schmidt kernel K as Z cðx yÞ ¼ Kðx; zÞKðy; zÞ dz R
We also consider random perturbation of focusing Schro¨dinger equation, which reads either ut þ iuxx þ ijuj2p u ¼ u_
½28
(which represents a multiplicative noise) or ut þ iuxx þ ijuj2p u ¼ i _
½29
(which is an additive noise). In the former case, the noise acts as a potential, while in the latter case it represents a forcing term. These equations also model the propagation of waves in an inhomogeneous medium. Research is in progress to study these stochastic dynamical systems. To begin with, the theory of the
138 Dynamical Systems in Mathematical Physics: An Illustration from Water Waves
initial-value problem has to be established in this new context (see, e.g., de Bouard and Debussche (2003)). One challenge is to understand the effect of noise on dynamical properties of the particular solutions described above, for instance, the solitary waves for Schro¨dinger equation, either in the subcritical case p < 2 or in the critical case p = 2 and beyond. Results obtained both theoretically and numerically on the influence of the noise on blow-up phenomena (random process) for generalized Schro¨dinger equations are likely almost-sure results. On the one hand, if the noise is additive and the power supercritical, p > 1, there is some numerical evidence that a spacetime white noise can delay or even prevent the blow-up. However, if the noise is not so irregular (as for the correlated in space noise described above) it seems that any solution blows up in finite time. de Bouard and Debussche have proved that for either an additive or a multiplicative noise, any smooth and localized (in space) initial data give rise to a trajectory that collapses in arbitrarily small time with a positive probability. This contrasts with the deterministic case, where only particular initial data could lead to blow-up trajectories. Actually, the noise enforces that any trajectory must pass through this blow-up region, with a positive probability. bi
Acknowledgment The author acknowledges Anne de Bouard and Vicentiu Radulescu for helpful comments and remarks. See also: Bifurcations in Fluid Dynamics; Bifurcation Theory; Breaking Water Waves; Cellular Automata; Central Manifolds, Normal Forms; Dissipative Dynamical Systems of Infinite Dimension; Fractal Dimensions in Dynamics; Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects; Metastable States; Newtonian Fluids and Thermohydraulics; Nonlinear Schro¨dinger Equations; Quantum Calogero–Moser Systems; Random Dynamical Systems; Scattering, Asymptotic Completeness and Bound States; Stability
Problems in Celestial Mechanics; Stochastic Resonance; Symmetry and Symplectic Reduction.
Further Reading Bona JL, Chen M, and Saut J-C (2002) Boussinesq equations and other systems for small-amplitude long waves in nonlinear dispersive media. I. Derivation and linear theory. Journal of Nonlinear Sciences 12(4): 283–318. de Bouard A and Debussche A (2003) The stochastic nonlinear Schro¨dinger equation in H1 . Stochastic Analysis Applications 21(1): 97–126. Bourgain J (1999) Global solutions of nonlinear Schro¨dinger equations. American Mathematical Society Colloquium Publications, 46. Providence, RI: American Mathematical Society. Buffoni B, Se´re´ E´, and Toland JF (2003) Surface water waves as saddle points of the energy. Calcul of Variations in Partial Differential Equations 17(2): 199–220. Craig W (1996) KAM theory in infinite dimensions. Dynamical systems and probabilistic methods in partial differential equations (Berkeley, CA, 1994), 31–46. Lectures in Applied Mathematics, 31. Providence, RI: American Mathematical Society. Dangelmayr G, Fiedler B, Kirchga¨ssner K, and Mielke A (1996) Dynamics of nonlinear waves in dissipative systems: reduction, bifurcation and stability. With a contribution by G. Raugel. Pitman Research Notes in Mathematics Series, 352. Harlow: Longman. Dias F and Iooss G (2003) Water-waves as a spatial dynamical system. In: Friedlander S and Serre D (eds.) Handbook of Mathematical Fluid Dynamics, chap 10, pp. 443–499. Elsevier. Gallay T and Wayne CE (2002) Invariant manifolds and the long-time asymptotics of the Navier–Stokes and vorticity equations on R 2 . Archives for Rational and Mechanics Analysis 163(3): 209–258. Martel Y and Merle F (2002) Stability of blow-up profile and lower bounds for blow-up rate for the critical generalized KdV equation. Annals of Mathematics (2) 155(1): 235–280. Schneider G and Wayne CE (2000) The long-wave limit for the water wave problem. I. The case of zero surface tension. Communication in Pure and Applied Mathematics 53(12): 1475–1535. Sulem C and Sulem P-L (1999) The nonlinear Schro¨dinger equation. Self-focusing and wave collapse. Applied Mathematical Sciences, 139. New York: Springer-Verlag. Temam R (1997) Infinite-dimensional dynamical systems in mechanics and physics. Second edition. Applied Mathematical Sciences, 68. New York: Springer-Verlag. Tsutsumi M (1984) Nonexistence of global solutions to the Cauchy problem for the damped nonlinear Schro¨dinger equations. SIAM Journal of Mathematics Analysis 15(2): 357–366. Whitam GB (1999) Linear and nonlinear waves. Pure and Applied Mathematics. New York: Wiley.
E Effective Field Theories G Ecker, Universita¨t Wien, Vienna, Austria ª 2006 Elsevier Ltd. All rights reserved.
Introduction Effective field theories (EFTs) are the counterpart of the ‘‘theory of everything.’’ They are the field theoretical implementation of the quantum ladder: heavy degrees of freedom need not be included among the quantum fields of an EFT for a description of low-energy phenomena. For example, we do not need quantum gravity to understand the hydrogen atom nor does chemistry depend upon the structure of the electromagnetic interaction of quarks. EFTs are approximations by their very nature. Once the relevant degrees of freedom for the problem at hand have been established, the corresponding EFT is usually treated perturbatively. It does not make much sense to search for an exact solution of the Fermi theory of weak interactions. In the same spirit, convergence of the perturbative expansion in the mathematical sense is not an issue. The asymptotic nature of the expansion becomes apparent once the accuracy is reached where effects of the underlying ‘‘fundamental’’ theory cannot be neglected any longer. The range of applicability of the perturbative expansion depends on the separation of energy scales that define the EFT. EFTs pervade much of modern physics. The effective nature of the description is evident in atomic and condensed matter physics. The present article will be restricted to particle physics, where EFTs have become important tools during the last 25 years.
Classification of EFTs A first classification of EFTs is based on the structure of the transition from the ‘‘fundamental’’ (energies > ) to the ‘‘effective’’ level (energies < ). 1. Complete decoupling The fundamental theory contains heavy and light degrees of freedom.
Under very Appelquist Lagrangian light fields,
general conditions (decoupling theorem, and Carazzone 1975) the effective for energies , depending only on takes the form
Leff ¼ Ld4 þ
X 1 X gi d O i d d4 i d>4
½1
d
The heavy fields with masses > have been ‘‘integrated out’’ completely. Ld4 contains the potentially renormalizable terms with operator dimension d 4 (in natural mass units where Bose and Fermi fields have d = 1 and 3/2, respectively), the gid are coupling constants and the Oid are monomials in the light fields with operator dimension d. In a slightly misleading notation, Ld4 consists of relevant and marginal operators, whereas the Oid (d > 4) are denoted irrelevant operators. The scale can be the mass of a heavy field (e.g., MW in the Fermi theory of weak interactions) or it reflects the short-distance structure in a more indirect way. 2. Partial decoupling In contrast to the previous case, the heavy fields do not disappear completely from the EFT but only their high-momentum modes are integrated out. The main area of application is the physics of heavy quarks. The procedure involves one or several field redefinitions introducing a frame dependence. Lorentz invariance is not manifest but implies relations between coupling constants of the EFT (reparametrization invariance). 3. Spontaneous symmetry breaking The transition from the fundamental to the effective level occurs via a phase transition due to spontaneous symmetry breaking generating (pseudo-)Goldstone bosons. A spontaneously broken symmetry relates processes with different numbers of Goldstone bosons. Therefore, the distinction between renormalizable (d 4) and nonrenormalizable (d > 4) parts in the effective Lagrangian [1] becomes meaningless. The effective Lagrangian of type 3 is generically nonrenormalizable. Nevertheless, such Lagrangians define perfectly consistent quantum field theories at sufficiently low energies. Instead of the operator dimension as in [1], the number of derivatives of the fields and the number of symmetry-breaking
140 Effective Field Theories
insertions distinguish successive terms in the Lagrangian. The general structure of effective Lagrangians with spontaneously broken symmetries is largely independent of the specific physical realization (universality). There are many examples in condensed matter physics, but the two main applications in particle physics are electroweak symmetry breaking and chiral perturbation theory (both discussed later) with the spontaneously broken global chiral symmetry of quantum chromodynamics QCD. Another classification of EFTs is related to the status of their coupling constants. A. Coupling constants can be determined by matching the EFT with the underlying theory at short distances. The underlying theory is known and Green functions can be calculated perturbatively at energies both in the fundamental and in the effective theory. Identifying a minimal set of Green functions fixes the coupling constants gid in eqn [1] at the scale . Renormalization group equations can then be used to run the couplings down to lower scales. The nonrenormalizable terms in the Lagrangian [1] can be fully included in the perturbative analysis. B. Coupling constants are constrained by symmetries only. The underlying theory and therefore also the EFT coupling constants are unknown. This is the case of the standard model (SM) (see the next section). A perturbative analysis beyond leading order only makes sense for the known renormalizable part Ld4 . The nonrenormalizable terms suppressed by powers of are considered at tree level only. The associated coupling constants gid serve as bookmarks for new physics. Usually, but not always (cf., e.g., the subsection ‘‘Noncommutative spacetime’’), the symmetries of Ld4 are assumed to constrain the couplings. The matching cannot be performed in perturbation theory even though the underlying theory is known. This is the generic situation for EFTs of type 3 involving spontaneous symmetry breaking. The prime example is chiral perturbation theory as the EFT of QCD at low energies.
The SM as an EFT With the possible exception of the scalar sector to be discussed in the subsection ‘‘Electroweak symmetry breaking’’ the SM is very likely the renormalizable part of an EFT of type 1B. Except for nonzero neutrino masses, the SM Lagrangian Ld4 in [1]
accounts for physics up to energies of roughly 1=2 the Fermi scale GF ’ 300 GeV. Since the SM works exceedingly well up to the Fermi scale where the electroweak gauge symmetry is spontaneously broken it is natural to assume that the operators Oid with d > 4, made up from fields representing the known degrees of freedom and including a single Higgs doublet in the SM proper, should be gauge invariant with respect to the full SM gauge group SU(3)c SU(2)L U(1)Y . An almost obvious constraint is Lorentz invariance that will be lifted in the next subsection, however. These requirements limit the Lagrangian with operator dimension d = 5 to a single term (except for generation multiplicity), consisting only of a lefthanded lepton doublet LL and the Higgs doublet : 1 Od¼5 ¼ ij kl L> iL C LkL j l þ h:c:
½2
This term violates lepton number and generates nonzero Majorana neutrino masses. For a neutrino mass of 1 eV, the scale would have to be of the order of 1013 GeV if the associated coupling constant in the EFT Lagrangian [1] is of order 1. In contrast to the simplicity for d = 5, the list of gauge-invariant operators with d = 6 is enormous. Among them are operators violating baryon or lepton number that must be associated with a scale much larger than 1 TeV. To explore the territory close to present energies, it therefore makes sense to impose baryon and lepton number conservation on the operators with d = 6. Those operators have all been classified (Buchmu¨ller and Wyler 1986) and the number of independent terms is of the order of 80. They can be grouped in three classes. The first class consists of gauge and Higgs fields only. The corresponding EFT Lagrangian has been used to parametrize new physics in the gauge sector constrained by precision data from LEP. The second class consists of operators bilinear in fermion fields, with additional gauge and Higgs fields to generate d = 6. Finally, there are four-fermion operators without other fields or derivatives. Some of the operators in the last two groups are also constrained by precision experiments, with a certain hierarchy of limits. For lepton and/or quark flavor conserving terms, the best limits on are in the few TeV range, whereas the absence of neutral flavor changing processes yields lower bounds on that are several orders of magnitude larger. If there is new physics in the TeV range flavor changing neutral transitions must be strongly suppressed, a powerful constraint on model building. It is amazing that the most general renormalizable Lagrangian with the given particle content accounts
Effective Field Theories 141
for almost all experimental results in such an impressive manner. Finally, we recall that many of the operators of dimension 6 are also generated in the SM via radiative corrections. A necessary condition for detecting evidence for new physics is therefore that the theoretical accuracy of radiative corrections matches or surpasses the experimental precision. Noncommutative Spacetime
Noncommutative geometry arises in some string theories and may be expected on general grounds when incorporating gravity into a quantum field theory framework. The natural scale of noncommutative geometry would be the Planck scale in this case without observable consequences at presently accessible energies. However, as in theories with large extra dimensions the characteristic scale NC could be significantly smaller. In parallel to theoretical developments to define consistent noncommutative quantum field theories (short for quantum field theories on noncommutative spacetime), a number of phenomenological investigations have been performed to put lower bounds on NC . Noncommutative geometry is a deformation of ordinary spacetime where the coordinates, repre^ , do not commute: sented by Hermitian operators x ^ ¼ i ^ ; x ½3 x The antisymmetric real tensor has dimensions length2 and it can be interpreted as parametrizing the resolution with which spacetime can be probed. In practically all applications, has been assumed to be a constant tensor and we may associate an energy scale NC with its nonzero entries: 2 NC
½4
There is to date no unique form for the noncommutative extension of the SM. Nevertheless, possible observable effects of noncommutative geometry have been investigated. Not unexpected from an EFT point of view, for energies NC , noncommutative field theories are equivalent to ordinary quantum field theories in the presence of nonstandard terms containing (Seiberg–Witten map). Practically all applications have concentrated on effects linear in . Kinetic terms in the Lagrangian are in general unaffected by the noncommutative structure. New effects arise therefore mainly from renormalizable d = 4 interactions terms. For example, the Yukawa coupling gY generates the following interaction linear in : @ þ @ @ þ @ @ ½5 LNC Y ¼ gY @
These interaction terms have operator dimension 6 and they are suppressed by 2 NC . The major difference to the previous discussion on physics beyond the SM is that there is an intrinsic violation of Lorentz invariance due to the constant tensor . In contrast to the previous analysis, the terms with dimension d > 4 do not respect the symmetries of the SM. If is indeed constant over macroscopic distances, many tests of Lorentz invariance can be used to put lower bounds on NC . Among the exotic effects investigated are modified dispersion relations for particles, decay of high-energy photons, charged particles producing Cerenkov radiation in vacuum, birefringence of radiation, a variable speed of light, etc. A generic signal of noncommutativity is the violation of angular momentum conservation that can be searched for at the Large Hadron Collider (LHC) and at the next linear collider. Lacking a unique noncommutative extension of the SM, unambiguous lower bounds on NC are difficult to establish. However, the range NC . 10 TeV is almost certainly excluded. An estimate of the induced electric dipole moment of the electron (noncommutative field theories violate CP in general to first order in ) yields NC & 100 TeV. On the other hand, if the SM were CP invariant, noncommutative geometry would be able to account for the observed CP violation in K0 – K0 mixing for NC 2 TeV. Electroweak Symmetry Breaking
In the SM, electroweak symmetry breaking is realized in the simplest possible way through renormalizable interactions of a scalar Higgs doublet with gauge bosons and fermions, a gauged version of the linear model. The EFT version of electroweak symmetry breaking (EWEFT) uses only the experimentally established degrees of freedom in the SM (fermions and gauge bosons). Spontaneous gauge symmetry breaking is realized nonlinearly, without introducing additional scalar degrees of freedom. It is a lowenergy expansion where energies and masses are assumed to be small compared to the symmetrybreaking scale. From both perturbative and nonperturbative arguments we know that this scale cannot be much bigger than 1 TeV. The Higgs model can be viewed as a specific example of an EWEFT as long as the Higgs boson is not too light (heavy-Higgs scenario). The lowest-order effective Lagrangian takes the following form: ð2Þ
LEWSB ¼ LB þ LF
½6
142 Effective Field Theories
where LF contains the gauge-invariant kinetic terms for quarks and leptons including mass terms. In addition to the kinetic terms for the gauge bosons W , B , the bosonic Lagrangian LB contains the characteristic lowest-order term for the would-be-Goldstone bosons: LB ¼ Lkin gauge þ
v2 hD U y D Ui 4
½7
with the gauge-covariant derivative ^ D U ¼ @ U igW U þ ig0 UB ~ ^ ¼ 3 B W ¼ W ; B 2 2
½8
where h. . .i denotes a (two-dimensional) trace. The matrix field U() carries the nonlinear representation of the spontaneously broken gauge group and takes the value U = 1 in the unitary gauge. The Lagrangian [6] is invariant under local SU(2)L U(1)Y transformations: i W ! gL W gyL þ gL @ gyL g ^ þ i gR @ gy ^ ! B B R g0 fL ! gL fL ;
fR ! gR fR ;
½9 U ! gL UgyR
with gL ðxÞ ¼ expði~ L ðxÞ~ =2Þ gR ðxÞ ¼ expði Y ðxÞ3 =2Þ and fL(R) are quark and lepton fields grouped in doublets. As is manifest in the unitary gauge U = 1, the lowestorder Lagrangian of the EWEFT just implements the tree-level masses of gauge bosons (MW = MZ cos W = vg=2, tan W = g0 =g) and fermions but does not carry any further information about the underlying mechanism of spontaneous gauge symmetry breaking. This information is first encoded in the couplings ai of the next-to-leading-order Lagrangian ð4Þ LEWSB
¼
14 X
V ¼ D UUy T ¼ U3 Uy ; i W ¼ @ igW ; @ igW g
½12
In the unitary gauge, the monomials Oi reduce to polynomials in the gauge fields. The three examples in eqn [11] start with quadratic, cubic, and quartic terms in the gauge fields, respectively. The strongest constraints exist for the coefficients of quadratic contributions from the Large Electron–Positron collider LEP1, less restrictive ones for the cubic self-couplings from LEP2, and none so far for the quartic ones.
Heavy-Quark Physics EFTs in this section are derived from the SM and they are of type 2A in the classification introduced previously. In a first step, one integrates out W, Z, and top quark. Evolving down from MW to mb , large logarithms s (mb ) ln (M2W =m2b ) are resummed into the Wilson coefficients. At the scale of the b-quark, QCD is still perturbative, so that at least a part of the amplitudes is calculable in perturbation theory. To separate the calculable part from the rest, the EFTs below perform an expansion in 1=mQ , where mQ is the mass of the heavy quark. Heavy-quark EFTs offer several important advantages. 1. Approximate symmetries that are hidden in full QCD appear in the expansion in 1=mQ . 2. Explicit calculations simplify in general, for example, the summing of large logarithms via renormalization group equations. 3. The systematic separation of hard and soft effects for certain matrix elements (factorization) can be achieved much more easily. Heavy-Quark Effective Theory
ai Oi
½10
i¼0
with monomials Oi of O(p4 ) in the low-energy expansion. The Lagrangian [10] is the most general CP and SU(2)L U(1)Y invariant Lagrangian of O(p4 ). Instead of listing the full Lagrangian, we display three typical examples: v2 hTV i2 4 O3 ¼ ghW ½V ; V i O0 ¼
O5 ¼ hV V i2
where
½11
Heavy-quark effective theory (HQET) is reminiscent of the Foldy–Wouthuysen transformation (nonrelativistic expansion of the Dirac equation). It is a systematic expansion in 1=mQ , when mQ QCD , the scale parameter of QCD. It can be applied to processes where the heavy quark remains essentially on shell: its velocity v changes only by small amounts QCD =mQ . In the hadron rest frame, the heavy quark is almost at rest and acts as a quasistatic source of gluons. More quantitatively, one writes the heavy-quark momentum as p = mQ v þ k , where v is the hadron 4-velocity (v2 = 1) and k is a residual
Effective Field Theories 143
momentum of O(QCD ). The heavy quark field Q(x) is then decomposed with the help of energy projectors P
v = (1 v )=2 and employing a field redefinition: =
QðxÞ ¼ eimQ vx ðhv ðxÞ þ Hv ðxÞÞ hv ðxÞ ¼ eimQ vx Pþ v QðxÞ
½13
Hv ðxÞ ¼ eimQ vx P v QðxÞ In the hadron rest frame, hv (x) and Hv (x) correspond to the upper and lower components of Q(x), respectively. With this redefinition, the heavy-quark Lagrangian is expressed in terms of a massless field hv and a ‘‘heavy’’ field Hv : D 6 mQ ÞQ LQ ¼ Qði
Soft-Collinear Effective Theory
¼ hv iv D hv Hv ðiv D þ 2mQ ÞHv þ mixed terms
½14
At the semiclassical level, the field Hv can 6 be eliminated by using the QCD field equation (iD mQ )Q = 0 yielding the nonlocal expression 6 ? LQ ¼ hv iv Dhv þ hv i D
1 6 ? hv iD iv D þ 2mQ i
½15
with D? = (g v v )D . The field redefinition in [13] ensures that, in the heavy-hadron rest frame, derivatives of hv give rise to small momenta of O(QCD ) only. The Lagrangian [15] is the starting point for a systematic expansion in mQ . To leading order in 1=mQ (Q = b, c), the Lagrangian Lb;c ¼ bv iv D bv þ cv iv Dcv
½16
exhibits two important approximate symmetries of HQET: the flavor symmetry SU(2)F relating heavy quarks moving with the same velocity and the heavy-quark spin symmetry generating an overall SU(4) spin-flavor symmetry. The flavor symmetry is obvious and the spin symmetry is due to the absence of Dirac matrices in [16]: both spin degrees of freedom couple to gluons in the same way. The simplest spin-symmetry doublet consists of a pseudoscalar meson H and the associated vector meson H . Denoting the doublet by H, the matrix elements of the heavy-to-heavy transition current are determined to leading order in 1=mQ by a single form factor, up to Clebsch–Gordan coefficients: hHðv0 Þjhv0 hv jHðvÞi ðv v0 Þ
(v v0 = 1) = 1. The semileptonic decays B ! Dll and B ! D? ll are therefore governed by a single normalized form factor to leading order in 1=mQ , with important consequences for the determination of the Cabibbo–Kobayashi–Maskawa (CKM) matrix element Vcb . The HQET Lagrangian is superficially frame dependent. Since the SM is Lorentz invariant, the HQET Lagrangian must be independent of the choice of the frame vector v. Therefore, a shift in v accompanied by corresponding shifts of the fields hv and of the covariant derivatives must leave the Lagrangian invariant. This reparametrization invariance is unaffected by renormalization and it relates coefficients with different powers in 1=mQ .
½17
is an arbitrary combination of Dirac matrices and the form factor is the so-called Isgur–Wise function. Moreover, since hv hv is the Noether current of heavy-flavor symmetry, the Isgur–Wise function is fixed in the no-recoil limit v0 = v to be
HQET is not applicable in heavy-quark decays where some of the light particles in the final state have momenta of O(mQ ), for example, for inclusive decays like B ! Xs or exclusive ones like B ! . In recent years, a systematic heavy-quark expansion for heavy-to-light decays has been set up in the form of soft-collinear effective theory (SCET). SCET is more complicated than HQET because now the low-energy theory involves more than one scale. In the SCET Lagrangian, a light quark or gluon field is represented by several effective fields. In addition to the soft fields hv in [15], the so-called collinear fields enter that have large energy and carry large momentum in the direction of the light hadrons in the final state. In addition to the frame vector v of HQET (v = (1, 0, 0, 0) in the heavy-hadron rest frame), SCET introduces a lightlike reference vector n in the direction of the jet of energetic light particles (for inclusive decays), for example, n = (1, 0, 0, 1). All momenta p are decomposed in terms of lightcone coordinates (pþ , p , p? ) with p ¼
p np n þ n þ p? ¼ pþ þ p þ p? ½18 n 2 2
= 2v n = (1, 0, 0, 1). For large energies, where n the three light-cone components are widely separated, with p = O(mQ ) being large while p? and pþ are small. Introducing a small parameter p? =p , the light-cone components of (hard-)collinear particles scale like (pþ , p , p? ) = mQ ( 2 , 1, ). Thus, there are three different scales in the problem compared to only two in HQET. For exclusive decays, the situation is even more involved. The SCET Lagrangian is obtained from the full theory by an expansion in powers of . In addition to the heavy quark field hv , one introduces soft as well as collinear quark and gluon fields by field
144 Effective Field Theories
redefinitions so that the various fields have momentum components that scale appropriately with . Similar to HQET, the leading-order Lagrangian of SCET exhibits again approximate symmetries that can lead to a reduction of form factors describing heavy-to-light decays. As in HQET, reparametrization invariance implements Lorentz invariance and results in stringent constraints on subleading corrections in SCET. An important result of SCET is the proof of factorization theorems to all orders in s . For inclusive decays, the differential rate is of the form d HJ S
of heavy-quarkonium properties. Because the nonrelativistic fluctuations of order mQ v and mQ v2 have not been separated, the power counting in NRQCD is ambiguous in higher orders. To overcome those deficiencies, two approaches have been put forward: potential NRQCD (pNRQCD) and velocity NRQCD (vNRQCD). In pNRQCD, a two-step procedure is employed for integrating out quark and gluon degrees of freedom: QCD NRQCD
Nonrelativistic QCD
In HQET the kinetic energy of the heavy quark appears as a small correction of O(2QCD =mQ ). For systems with more than one heavy quark, the kinetic energy cannot be treated as a perturbation in general. For instance, the virial theorem implies that the kinetic energy in quarkonia QQ is of the same order as the binding energy of the bound state. NRQCD, the EFT for heavy quarkonia, is an extension of HQET. The Lagrangian for NRQCD coincides with HQET in the bilinear sector of the heavy-quark fields but it also includes quartic interactions between quarks and antiquarks. The relevant expansion parameter in this case is the In contrast to relative velocity between Q and Q. HQET, there are at least three widely separate scales in heavy quarkonia: in addition to mQ , the relative momentum of the bound quarks p mQ v with v 1 and the typical kinetic energy E mQ v2 . The main challenges are to derive the quark–antiquark potential directly from QCD and to describe quarkonium production and decay at collider experiments. In the abelian case, the corresponding EFT for quantum electrodynamics (QED) is called NRQED that has been used to study electromagnetically bound systems like the hydrogen atom, positronium, muonium, etc. In NRQCD only the hard degrees of freedom with momenta mQ are integrated out. Therefore, NRQCD is not enough for a systematic computation
mQ > > mQ v +
½19
where H contains the hard corrections. The so-called jet function J sensitive to the collinear region is convoluted with the shape function S representing the soft contributions. At leading order, the shape function drops out in the ratio of weighted decay spectra for B ! Xu ll and B ! Xs allowing for a determination of the CKM matrix element Vub . Factorization theorems have become available for an increasing number of processes, most recently also for exclusive decays of B into two light mesons.
> mQ + mQ v > > mQ v2
pNRQCD
The resulting EFT derives its name from the fact that the four-quark interactions generated in the matching procedure are the potentials that can be used in Schro¨dinger perturbation theory. It is claimed that pNRQCD can also be used in the nonperturbative domain where s (mQ v2 ) is of order 1 or larger. The advantage would be that also charmonium becomes accessible to a systematic EFT analysis. The alternative approach of vNRQCD is only applicable in the fully perturbative regime when mQ mQ v mQ v2 QCD is valid. It separates the different degrees of freedom in a single step leaving only ultrasoft energies and momenta of O(mQ v2 ) as continuous variables. The separation of larger scales proceeds in a similar fashion as in HQET via field redefinitions. A systematic nonrelativistic power counting in the velocity v is implemented.
The Standard Model at Low Energies At energies below 1 GeV, hadrons – rather than quarks and gluons – are the relevant degrees of freedom. Although the strong interactions are highly nonperturbative in the confinement region, Green functions and amplitudes are amenable to a systematic low-energy expansion. The key observation is that the QCD Lagrangian with Nf = 2 or 3 light quarks, 6 Mq q 14G G þ Lheavy quarks iD LQCD ¼ q 6 qR qL Mq qR ¼ qL i6DqL þ qR i D qR Mq qL þ qR;L ¼ 12ð1
5 Þq;
q> ¼ ðud½sÞ
½20
exhibits a global symmetry SUðNf ÞL SUðNf ÞR Uð1ÞV Uð1ÞA |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl ffl} chiral group G
½21
Effective Field Theories 145
in the limit of Nf massless quarks (Mq = 0). At the hadronic level, the quark number symmetry U(1)V is realized as baryon number. The axial U(1)A is not a symmetry at the quantum level due to the abelian anomaly. Although not yet derived from first principles, there are compelling theoretical and phenomenological arguments that the ground state of QCD is not even approximately chirally symmetric. All evidence, such as the existence of relatively light pseudoscalar mesons, points to spontaneous chiral symmetry breaking G ! SU(Nf )V , where SU(Nf )V is the diagonal subgroup of G. The resulting Nf2 1 (pseudo-)Goldstone bosons interact weakly at low energies. In fact, Goldstone’s theorem ensures that purely mesonic or single-baryon amplitudes vanish in the chiral limit (Mq = 0) when the momenta of all pseudoscalar mesons tend to zero. This is the basis for a systematic low-energy expansion of Green functions and amplitudes. The corresponding EFT (type 3B in our classification) is called chiral perturbation theory (CHPT) (Weinberg 1979, Gasser and Leutwyler 1984, 1985). Although the construction of effective Lagrangians with nonlinearly realized chiral symmetry is well understood, there are some subtleties involved. First of all, there may be terms in a chiral-invariant action that cannot be written as the fourdimensional integral of an invariant Lagrangian. The chiral anomaly for SU(3) SU(3) bears witness of this fact and gives rise to the Wess–Zumino– Witten action. A general theorem to account for such exceptional cases is due to D’Hoker and Weinberg (1994). Consider the most general action for Goldstone fields with symmetry group G, spontaneously broken to a subgroup H. The only possible non-G-invariant terms in the Lagrangian that give rise to a G-invariant action are in one-toone correspondence with the generators of the fifth cohomology group H5 (G=H; R) of the coset manifold G/H. For the relevant case of chiral SU(N), the coset space SU(N)L SU(N)R =SU(N)V is itself an SU(N) manifold. For N 3, H5 (SU(N); R) has a single generator that corresponds precisely to the Wess–Zumino–Witten term. At a still deeper level, one may ask whether chiralinvariant Lagrangians are sufficient (except for the anomaly) to describe the low-energy structure of Green functions as dictated by the chiral Ward identities of QCD. To be able to calculate such Green functions in general, the global chiral symmetry of QCD is extended to a local symmetry by the introduction of external gauge fields. The following invariance theorem (Leutwyler 1994) provides an answer to the above question. Except
for the anomaly, the most general solution of the Ward identities for a spontaneously broken symmetry in Lorentz-invariant theories can be obtained from gauge-invariant Lagrangians to all orders in the low-energy expansion. The restriction to Lorentz invariance is crucial: the theorem does not hold in general in nonrelativistic effective theories. Chiral Perturbation Theory
The effective chiral Lagrangian of the SM in the meson sector is displayed in Table 1. The lowestorder Lagrangian for the purely strong interactions is given by L p2 ¼
F2 hD UD Uy i 4 F2 B hðs þ ipÞUy þ ðs ipÞUi þ 2
½22
with a covariant derivative D = @ U i(v þ a )U þ iU(v a ). The first term has the familiar form [7] of the gauged nonlinear model, with the matrix field U() transforming as U ! gR UgyL under chiral rotations. External fields v , a , s, p are introduced for constructing the generating functional of Green functions of quark currents. To implement explicit chiral symmetry breaking, the scalar field s is set equal to the quark mass matrix Mq at the end of the calculation. The leading-order Lagrangian has two free parameters F, B related to the pion decay constant and to the quark condensate, respectively: F ¼ F 1 þ Oðmq Þ ½23 h0j uuj0i ¼ F2 B 1 þ Oðmq Þ The Lagrangian [22] gives rise to M2 = B(mu þ md ) at lowest order. From detailed studies of pion–pion scattering (Colangelo et al. 2001), we know that the leading term accounts for at least 94% of the pion mass. This supports the standard counting of CHPT, Table 1 The effective chiral Lagrangian of the SM in the meson sector Lchiral order (# of LECs) Lp 2 (2) þ
=1 LS GF p 2 (2)
þ
Lem e 2 p 0 (1)
Loop order þ
Lemweak G8 e 2 p 0 (1)
L=0
S = 1 S = 1 þLp 4 (10) þ Lodd p 6 (32) þ LG8 p 4 (22) þ LG27 p 4 (28) leptons em emweak þLe 2 p 2 (14) þ LG8 e 2 p 2 (14) þ Le 2 p (5)
L=1
þLp 6 (90)
L=2
The numbers in brackets refer to the number of independent couplings for Nf = 3:. The parameter-free Wess–Zumino–Witten action SWZW that cannot be written as the four-dimensional integral of an invariant Lagrangian must be added.
146 Effective Field Theories
with quark masses booked as O(p2 ) like the twoderivative term in [22]. The effective chiral Lagrangian in Table 1 contains the following parts: 1. strong interactions: Lp2 , Lp4 , Lodd p6 , Lp6 þ SWZW ; 2. nonleptonic weak interactions to first order in =1 the Fermi coupling constant GF : LS G F p2 , S = 1 S = 1 LG8 p4 , LG27 p4 ; 3. radiative corrections for strong processes: em Lem e2 p 0 , L e 2 p 2 ; 4. radiative corrections for nonleptonic weak emweak decays: Lemweak G8 e2 p0 , LG8 e2 p2 ; and 5. radiative corrections for semileptonic weak decays: Lleptons . e2 p Beyond the leading order, unitarity and analyticity require the inclusion of loop contributions. In the purely strong sector, calculations have been performed up to next-to-next-to-leading order. Figure 1 shows the corresponding skeleton diagrams of O(p6 ), with full lowest-order tree structures to be attached to propagators and vertices. The coupling constants of the various Lagrangians in Table 1 absorb the divergences from loop diagrams leading to finite renormalized Green functions with scale-dependent couplings, the so-called low-energy constants (LECs). As in all EFTs, the LECs parametrize the effect of ‘‘heavy’’ degrees of freedom that are not represented explicitly in the EFT Lagrangian. Determination of those LECs is a major task for CHPT. In addition to phenomenological information, further theoretical input is needed. Lattice gauge theory has already furnished values for some LECs. To bridge the gap between the low-energy domain of CHPT and the perturbative domain of QCD, large-Nc motivated interpolations with meson resonance exchange have been used successfully to pin down some of the LECs. Especially in cases where the knowledge of LECs is limited, renormalization group methods provide valuable information. As in renormalizable quantum field theories, the leading chiral logs ( ln M2 =2 )L
Figure 1 Skeleton diagrams of O(p 6 ).: Normal vertices are from Lp 2 , crossed circles and the full square denote vertices from Lp 4 and Lp 6 , respectively.
with a typical meson mass M, renormalization scale and loop order L can in principle be determined from one-loop diagrams only. In contrast to the renormalizable situation, new derivative structures (and quark mass insertions) occur at each loop order preventing a straightforward resummation of chiral logs. Among the many applications of CHPT in the meson sector are the determination of quark mass ratios and the analysis of pion–pion scattering where the chiral amplitude of next-to-next-to-leading order has been combined with dispersion theory (Roy equations). Of increasing importance for precision physics (CKM matrix elements, (g 2) , . . .) are isospin-violating corrections including radiative corrections, where CHPT provides the only reliable approach in the low-energy region. Such corrections are also essential for the analysis of hadronic atoms like pionium, a þ bound state. CHPT has also been applied extensively in the single-baryon sector. There are several differences to the purely mesonic case. For instance, the chiral expansion proceeds more slowly and the nucleon mass mN provides a new scale that does not vanish in the chiral limit. The formulation of heavy-baryon CHPT was modeled after HQET integrating out the nucleon modes of O(mN ). To improve the convergence of the chiral expansion in some regions of phase space, a manifestly Lorentz-invariant formulation has been set up more recently (relativistic baryon CHPT). Many single-baryon processes have been calculated to fourth order in both approaches, for example, pion– nucleon scattering. With similar methods as in the mesonic sector, hadronic atoms like pionic or kaonic hydrogen have been investigated. Nuclear Physics
In contrast to the meson and single-baryon sectors, amplitudes with two or more nucleons do not vanish in the chiral limit when the momenta of Goldstone mesons tend to zero. Consequently, the power counting is different in the many-nucleon sector. Multinucleon processes are treated with different EFTs depending on whether all momenta are smaller or larger than the pion mass. In the very low energy regime j~ pj M , pions or other mesons do not appear as dynamical degrees of freedom. The resulting EFT is called ‘‘pionless EFT’’ and it describes systems like the deuteron, pffiffiffiffiffiffiffiffiffiffiffiffiffi where the typical nucleon momenta are mN Bd ’ 45 MeV (Bd is the binding energy of the deuteron). The Lagrangian for the strong interactions between two nucleons has the form y LNN ¼ C0 N > Pi N N > Pi N þ ½24
Effective Field Theories 147
where Pi are spin–isospin projectors and higherorder terms contain derivatives of the nucleon fields. The existence of bound states implies that at least part of the EFT Lagrangian must be treated nonperturbatively. Pionless EFT is an extension of effective-range theory that has long been used in nuclear physics. It has been applied successfully especially to the deuteron but also to more complicated few-nucleon systems like the Nd and n systems. For instance, precise results for Nd scattering have been obtained with parameters fully determined from NN scattering. Pionless EFT has also been applied to the so-called halo nuclei, where a tight cluster of nucleons (like 4 He) is surrounded by one or more ‘‘halo’’ nucleons. In the regime j~ pj > M , the pion must be included as a dynamical degree of freedom. With some modifications in the power counting, the corresponding EFT is based on the approach of Weinberg (1990, 1991), who applied the usual rules of the meson and single-nucleon sectors to the nucleon–nucleon potential (instead of the scattering amplitude). The potential is then to be inserted into a Schro¨dinger equation to calculate physical observables. The systematic power counting leads to a natural hierarchy of nuclear forces, with only two-nucleon forces appearing up to next-toleading order. Three- and four-nucleon forces arise at third and fourth order, respectively. Significant progress has been achieved in the phenomenology of few-nucleon systems. The twoand n-nucleon (3 n 6) sectors have been pushed to fourth and third order, respectively, with encouraging signs of ‘‘convergence.’’ Compton scattering off the deuteron, d scattering, nuclear parity violation, solar fusion, and other processes have been investigated in the EFT approach. The quark mass dependence of the nucleon–nucleon interaction has also been studied. See also: Anomalies; Electroweak Theory; High Tc Superconductor Theory; Noncommutative Geometry and the Standard Model; Operator Product Expansion in Quantum Field Theory; Perturbation Theory and its Techniques; Quantum Chromodynamics; Quantum Electrodynamics and its Precision Tests; Renormalization: General Theory; Seiberg–Witten Theory; Standard Model of Particle Physics; Symmetries and Conservation Laws; Symmetry Breaking in Field Theory.
Further Reading Appelquist T and Carazzone J (1975) Infrared singularities and massive fields. Physical Review D 11: 2856. Bedaque PF and van Kolck U (2002) Effective field theory for few nucleon systems. Annual Review of Nuclear and Particle Science 52: 339 (nucl-th/0203055). Brambilla N, Pineda A, Soto J, and Vairo A (2004) Effective field theories for quarkonium. Reviews of Modern Physics (in press) (hep-ph/0410047). Buchmu¨ller W and Wyler D (1986) Effective Lagrangian analysis of new interactions and flavour conservation. Nuclear Physics B 268: 621. Colangelo G, Gasser J, and Leutwyler H (2001) scattering. Nuclear Physics B 603: 125 (hep-ph/0103088). D’Hoker E and Weinberg S (1994) General effective actions. Physical Review D 50: 6050 (hep-ph/9409402). Ecker G (1995) Chiral perturbation theory. Progress in Particle and Nuclear Physics 35: 1 (hep-ph/9501357). Feruglio F (1993) The chiral approach to the electroweak interactions. International Journal of Modern Physics A 8: 4937 (hep-ph/9301281). Gasser J and Leutwyler H (1984) Chiral perturbation theory to one loop. Annals of Physics 158: 142. Gasser J and Leutwyler H (1985) Chiral perturbation theory: expansions in the mass of the strange quark. Nuclear Physics B 250: 465. Hinchliffe I, Kersting N, and Ma YL (2004) Review of the phenomenology of noncommutative geometry. International Journal of Modern Physics A 19: 179 (hep/0205040). Hoang AH (2002) Heavy quarkonium dynamics. In: Shifman M (ed.) At the Frontier of Particle Physics/Handbook of QCD, vol. 4. Singapore: World Scientific. (hep-ph/0204299). Leutwyler H (1994) On the foundations of chiral perturbation theory. Annals of Physics 235: 165 (hep-ph/9311274). Mannel T (2004) Effective Field Theories in Flavour Physics, Springer Tracts in Modern Physics, vol. 203. Berlin: Springer. Manohar AV and Wise MB (2000) Heavy quark physics. Camb. Monogr. Part. Phys. Nucl. Phys. Cosmol. 10: 1. Meißner U (2005) Modern theory of nuclear forces. Nuclear Physics A 751: 149 (nucl-th/0409028). Pich A (1999) Effective field theory. In: Gupta R et al. (eds.) Proc. of Les Houches Summer School 1997: Probing the Standard Model of Particle Interactions. Amsterdam: Elsevier. (hep-ph/ 9806303). Scherer S (2002) Introduction to chiral perturbation theory. Advances in Nuclear Physics 27: 277 (hep-ph/0210398). Weinberg S (1979) Phenomenological Lagrangians. Physica A 96: 327. Weinberg S (1990) Nuclear forces from chiral Lagrangians. Physics Letters B 251: 288. Weinberg S (1991) Effective chiral Lagrangians for nucleon– pion interactions and nuclear forces. Nuclear Physics B 363: 3.
148 Eigenfunctions of Quantum Completely Integrable Systems
Eigenfunctions of Quantum Completely Integrable Systems J A Toth, McGill University, Montreal, QC, Canada ª 2006 Elsevier Ltd. All rights reserved.
Introduction This article is an introduction to eigenfunctions of quantum completely integrable (QCI) systems. For these systems, one can understand asymptotics of eigenfunctions better than for other systems, so it is natural to study them. It is useful to begin the discussion with the most important geometric exampffiffiffiffi ple given by the quantum Hamiltonian, P1 = . We fix a basis of eigenfunctions, ’j , j = 1, 2, . . . , with pffiffiffiffi ’j ¼ j ’j ; h’i ; ’j i ¼ ij and assume that there exist functionally independent (pseudo)differential operators P2 , . . . , Pn with the property that ½Pi ; Pj ¼ 0;
i; j ¼ 1; . . . ; n
In this case, P1 is said to be QCI and the operators, Pk , k = 1, . . . , n, can be simultaneously diagonalized. It is therefore natural to study the special basis of Laplace eigenfunctions which are joint eigenvectors of the P0k s. From now on, the j ’s are always assumed to be joint eigenfunctions of the commuting operators, Pk , k = 1, . . . , n. The classical observables corresponding to the operators Pk , k = 1, . . . , n, are the respective principal symbols, pk 2 C1 (T M), j = 1, . . . , n. In particular, the bicharacteristic flow of p1 (x, ) = jjg is the classical ‘‘geodesic flow’’
Gt : T M ! T M Examples of manifolds with QCI Laplacians include tori and spheres of revolution, Liouville metrics on tori and spheres, large families of metrics on homogeneous spaces, as well as hyperellipsoids with distinct axes in arbitrary dimension. There are also many inhomogeneous QCI examples (see the next section). It is of interest to understand the asymptotics of both eigenvalues and eigenfunctions. There is a large literature devoted to eigenvalue asymptotics, including trace formulas and Bohr–Sommerfeld rules (see Colin de Verdiere (1994a, b), Helffer and Sjoestrand (1990), and Colin de Verdiere and Vu Ngoc (2003)). We will concentrate here on the corresponding problem of determining eigenfunction asymptotics. The key property of eigenfunctions in the QCI case is localization in phase space, T M. This allows one to study more effectively the concentration and blow-up properties than in any other setting. It is important to contrast
this with, for example, the situation in the ergodic case. Moreover, in the QCI case, there is a particularly strong connection between dynamics of the geodesic flow, Gt : T M ! T M, and the asymptotics of individual eigenfunctions. In the general case, one can usually only relate the dynamics to spectral averages, such as in the trace formula (Duistermaat and Guillemin 1975). For the most part, the literature on eigenfunction asymptotics addresses the following basic problems: 1. determining sharp upper and lower bounds for ’j as j ! 1 and 2. describing the link between the blow-up properties of ’j as j ! 1 and the dynamics of the geodesic flow, Gt . The starting point in the study of eigenfunction asymptotics in the QCI case is the fact that the joint eigenfunctions, ’j , have masses that localize on the level sets, P 1 (b) := {(x, ) 2 T M; pj (x, ) = bj , j = 1, . . . , n}. Moreover, by the Liouville–Arnol’d theorem, for generic levels (indexed by b 2 R), P 1 ðbÞ ¼
m X
k ðbÞ
½1
k¼1
where the k (b) T M are Lagrangian tori. The affine symplectic coordinates in a neighborhood of k (b) are called ‘‘action-angle variables’’ ((k) 1 ,..., (k) n n (k) (k) ; I , . . . , I ) 2 T R . Written in terms of n n 1 these coordinates, the classical Hamilton equations defining the geodesic flow assume the form d ¼ FðIÞ; dt
dI ¼0 dt
and this system of ordinary differential equations (ODEs) is solved by quadrature. This explains why one refers to such systems as completely integrable. At the quantum level, one can construct semiclassical Lagrangian distributions, Z ðkÞ ðxÞ :¼ ei’ ðx;Þ aðx; ; Þ d ½2 Rn
which microlocally concentrate on (k) (b) as ! 1 and satisfy Pj = bj þ O(1 ) in L2 (M). An important fact is that the actual joint eigenfunctions, ’j , are approximated to O(1 )-accuracy in L2 (M) by suitable linear combinations of the quasimodes, . However, there are subtleties underlying this correspondence which are often neglected in the physics literature: 3. The actual joint eigenfunctions j localize on the level sets P 1 (b) which usually consist of many
Eigenfunctions of Quantum Completely Integrable Systems 149
connected components. Consequently, the eigenfunctions are approximated by (sometimes large) linear combinations of Lagrangian quasimodes attached to the different component tori. The precise splitting of mass amongst these different components is a difficult and, in general, unsolved problem in microlocal tunneling. 4. The local torus foliation given by action-angle variables tends to degenerate and Lagrangian quasimodes are no longer approximate solutions to the (joint) eigenvalue equations near the singularities of the foliation. The singularities and their relative configurations can be complicated (Colin de Verdiere and Vu Ngoc 2003) and most of the interesting asymptotic blow-up properties of eigenfunctions tend to be associated with these degeneracies. The main tool for studying joint eigenfunctions near degeneracies is the quantum analog of the Eliasson normal form (Eliasson 1984, Vu Ngoc 2000). We will refer to this as the ‘‘quantum Birkhoff normal form’’ (QBNF).
Background on QCI Systems Let (Mn , g) be a compact, closed Riemannian manifold and P1 := Oph (p1 ) be a formally selfadjoint, elliptic (in the classical sense) h-pseudodifferential operator. In local coordinates, the Schwarz kernel of P1 is of the form, Z P1 ðx; y; hÞ ¼ ð2 hÞn eihxy;i=h p1 ðx; ; hÞ d Rn
m where h) 2 S0, h) cl (T M); that is, p1 (x, ; P1 p1 (x, ; j mjj j
p (x, ) h with @ @ p (x, ) = O hi 1,j 1,j ,
x j=0 (Dimassi and Sjoestrand 1999). It is often convenient to work with h-pseudodifferential operators rather than their classical counterparts.pffiffiffiffi In the homogeneous case, one chooses h1 2 Spec . m P1 2 Oph (S0, if there exist selfcl ) is said to be QCI m0 adjoint Pj = Oph (pj ) 2 Oph (S0, ), j = 2, . . . , n, for cl some m0 with [Pi , Pj ] = 0, i, j = 1, . . . , n, such that dp1 ^ ^ dpn 6¼ 0 on a dense open subset, reg T M, and P21 þ þ P2n is elliptic in the classical sense. There are many inhomogeneous QCI examples including quantum Euler, Lagrange, and Kowalevsky tops together with quantum Neumann and Rosochatius oscillators in arbitrary dimension. Since {pi , pj } = 0, the joint Hamilton flow of the pi ’s induces a symplectic Rn -action on T M:
t : T M ! T M t ðx; Þ ¼ exp t1 Hp1 exp t1 Hpn ðx; Þ t ¼ ðt1 ; . . . ; tn Þ 2 Rn
The associated moment map is just P : T M 0 ! Rn ;
P ¼ ðp1 ; . . . ; pn Þ
We denote the image P(T M 0) by B, the regular values (resp. singular values) by Breg (resp. Bsing ) of the moment map. To establish bounds for the joint eigenfunctions of P1 , . . . , Pn , one imposes a ‘‘finite-complexity’’ assumption (Toth and Zelditch 2002) on the classical integrable system. This condition holds for all systems of interest in physics. To describe it, for each b = (b(1) , . . . , b(n) ) 2 B, let mcl (b) denote the number of Rn -orbits of the joint flow t on the level set P 1 (b). Then, the finitecomplexity condition says that for some M0 > 0, mcl ðbÞ < M0 ð8b 2 BÞ In addition, when P is proper, P 1 ðbÞ ¼
mX cl ðbÞ
k ðbÞ
½3
k¼1
for any b 2 Breg , where the k (b) are Lagrangian tori. The starting point for analyzing joint eigenfunctions is the following correspondence principle (Zelditch 1990) which makes the eigenfunction localization alluded to in the introduction more precise: Theorem 1 Let Oph (a) 2 Oph (S0cl )(T M) and Pj , j = 1, . . . , n, be a QCI system of commuting operators. Then, for every b 2 Breg , there exists a subsequence of joint eigenfunctions ’ (x) := ’(x; (h)) with h 2 (0, h0 ] and joint eigenvalues (h) = (1 (h), . . . , n ( h)) 2 Spec(P1 , . . . , Pn ) with j(h) bj = O(h) such that Z hOph ðaÞ’ ; ’ i ¼ jcðhÞj2 aðx; Þ db þ OðhÞ ðbÞ
Here, db denotes Lebesgue measure on the torus, (b). The proof of Theorem 1 follows from the h-microlocal, regular quantum normal construction near (b) (see the section ‘‘Birkhoff normal forms’’).
Blow-Up of Eigenfunctions: Qualitative Results Before discussing quantitative bounds for joint eigenfunctions, it is useful to prove qualitative results. Here, p weffiffiffiffi review only the homogeneous case where P1 = h , although the general case can be dealt with similarly (Toth and Zelditch 2002). Two wellknown QCI examples which exhibit extremes in eigenfunction concentration are the round sphere and the flat torus. In the case of the sphere, the zonal harmonics blow-up like 1=2 at the poles, whereas, in the case of the flat torus, all the joint eigenfunctions are uniformly bounded. The rest of the article will be
150 Eigenfunctions of Quantum Completely Integrable Systems
essentially devoted to understanding these extreme blow-up properties (and intermediate ones) more systematically. When discussing blow-up of eigenfunctions, it is natural to start with the following: Question Do there exist QCI manifolds (other than the flat torus) for which all eigenfunctions are uniformly bounded in L1 ? Toth and Zelditch (2002) have proved that, up to coverings, the flat torus is the only example with uniformly bounded eigenfunctions. Their argument used the correspondence principle in Theorem 1 combined with some deep results from symplectic geometry. To deal with the issue of multiplicities, it is convenient to define L1 ð; gÞ ¼ sup k’kL1 ’2V
where V = {’; P1 ’ = ’ } and it is assumed that k’kL2 = 1. Theorem p 2 ffiffiffiffi(Toth and Zelditch 2002). Suppose that P1 = is QCI on a compact, Riemannian manifold (M, g) and suppose that the corresponding moment map satisfies the finite-complexity condition. Then, if L1 (, g) = O(1), (M, g) is flat. The proof of Theorem 2 follows by contradiction: that is, one assumes that all eigenfunctions are uniformly bounded. There are two main steps in the proof of Theorem 2: the first is entirely analytic and uses the correspondence principle in Theorem 1 and uniform boundedness to determine the topology of M. The second step uses two deep results from symplectic topology/geometry to determine the metric, g, up to coverings. Using a local Weyl law argument and the finitemultiplicity assumption, it can be shown that for each b 2 Breg , there exists a subsequence, ’ , of joint eigenfunctions such that Proposition 1 holds with jcð h; bÞj2
1 C
where C > 0 is a uniform constant not depending on b 2 Breg . With this subsequence, one applies Theorem 1 with a(x, ) = V(x) 2 C1 (M). It then easily follows by the boundedness assumption that for h sufficiently small and appropriate constants C0 , C1 > 0, Z ðbÞ V db ðbÞ Z 1 jVðxÞj j’ ðxÞj2 dVolðxÞ C0 M Z 1 jVðxÞj dVolðxÞ C1 M
½4
where (b) denotes the restriction of the canonical projection : T M ! M to the Lagrangian (b). The estimate in [4] is equivalent to the statement, ððbÞ Þ ðdb Þ dVolðxÞ where given two Borel measures d and d , one writes d d if d is absolutely continuous with respect to d . Consequently, (b) : (b) ! M has no singularities and thus, up to coverings, M is topologically a torus (since (b) is). Since there are many QCI systems on n-tori, it still remains to determine how the uniform-boundedness condition constrains the metric geometry of (M, g). First, by a classical result of Mane, if T M possesses a C1 -foliation by Lagrangians, (M, g) cannot have conjugate points. By the first step in the proof, it follows that under the uniform-boundedness assumption, M is a topological torus and T M possesses a smooth foliation by Lagrangian tori. Consequently, (M, g) has no conjugate points. Finally, the Burago–Ivanov proof of the Hopf conjecture says that metric tori without conjugate points are flat. Therefore, (M, g) is flat. Consistent with Theorem 2, one can show (Toth and Zelditch 2003, Lerman and Shirokova 2002) that if (M, g) is integrable and not a flat torus, then there must exist a compact t -orbit (i.e., an orbit of the joint flow of Xpj , j = 1, . . . , n) with dim = k < n. In the QCI case, these ‘‘singular’’ orbits trap eigenfunction mass for appropriate subsequences. To understand this statement in detail, it is necessary to review QBNF constructions in the context of QCI systems.
Birkhoff Normal Forms There are several excellent expositions on the topic of Birkhoff normal forms in the literature (see, e.g., Guillemin (1996), Iatchenko et al. (2002), and Zelditch (1998)), which discuss both the classical and quantum constructions. Here, we discuss the aspects which are most relevant for QCI systems. Consider the Schro¨dinger operator, P(x; hDx ) = h2 (d2 =dx2 ) þ V(x) with V(x þ 2) = V(x) acting on C1 (R=2Z). Assume that the potential, V(x), is Morse and that x = 0 is a potential minimum with V(0) = V 0 (0) = 0 and T (S1 ) an open neighborhood containing (0, 0). In its simplest incarnation, the classical Birkhoff normal-form theorem says that for small enough , there exists a symplectic diffeomorphism, 1 : (; (0, 0)) ! (; (0, 0)); 1 : (x, ) 7! (y, ), and a (locally defined) function F0 2 C1 (R) such that ðp Þðy; Þ ¼ F0 ð2 þ y2 Þ
½5
Eigenfunctions of Quantum Completely Integrable Systems 151
provided (y, ) 2 . At the quantum level, the analogous QBNF expansion says that there exist microlocally unitary h-Fourier integral operators, 1 U( h) : C1 () ! C () and a classical symbol P j F(x, h) 1 F (x) h , such that j=0 j Uð hÞ PðhÞ Uð hÞ ¼ Fð^Ie ; hÞ
½6
2
h D2y þ y2 . Given two h-pseudos P and Q, with ^Ie = the notation P = Q means that k(P Q)kL2 k0 = O( h1 ) and k(P Q)k0 = O( h1 ), for any 2 1 C0 (). Since it can be easily shown that eigenfunctions ’ , with ( h) = O( h ), 0 < 1, localize very sharply near x = 0, from the h-microlocal unitary equivalence in [6], the eigenfunction and eigenvalue asymptotics (including trace formulas) can all be determined by working with the model operator on the right-hand side (RHS) of [6]. Moreover, on the model side, the eigenfunctions and eigenvalues are explicitly known. At a potential maximum, there exist classical and quantum normal forms analogous to [5] and [6] (see Helffer and Sjoestrand (1990) and Colin de Verdiere and Parisse (1994a)) except that the harmonic oscillator action operator, ^Ie , is replaced by the hyperbolic action operator, ^Ih ¼ h yDy þ 1 ½7 2 The 1D Schro¨dinger operator is the simplest example of a QCI system where (0, 0) 2 T S1 is a nondegenerate critical point of the classical Hamiltonian, H(x, ) = 2 þ V(x). Under a mild nondegeneracy hypothesis (Vu Ngoc 2000), there is an analogous normal form for arbitrary QCI systems which is valid near nondegenerate rank k < n orbits of the joint flow, t . At the classical level, this result is due to Eliasson (1984) and the quantum analog is due to Vu-Ngoc (2000). To state the result is general, one has to define the appropriate model operators: these are ^Ie and ^Ih together with the loxodromic model operators 0 depends only on the scale of the cutoff function. It finally remains to deal with (2). Bounding the size of jc( h)j from below amounts to estimating the L2 -mass of the joint eigenfunction ’ which must be trapped near the orbit, Ok . Using a local (singular) Weyl law argument, it is shown in Toth and Zelditch (2003) that hj
jcð hÞj2 j log
½13
where > 0 indexes the number of hyperbolic and loxodromic model operators. The final result quantifies blow-up along a compact orbit: Theorem 3 (Toth and Zelditch 2003). Let Ok be a rank k < n orbit of the joint flow t . If this orbit is compact and nondegenerate, then there exists a subsequence of L2 -normalized joint eigenfunctions ’jk , k = 1, 2, . . . , of the QCI system Pj , j = 1, . . . , n, such that for any > 0, ðnk=4Þ
k’jk kL1 jk
By using the semiclassical scale h1=2 j log hj1=2 , one can (slightly) improve the lower bound in Theorem nk=4 3 to k’jk kL1 jk j log jk j for some 0 (see Sogge et al. (2005)). When (M, g) is not flat, there must exist a singular, compact orbit of dimension k with 1 k n 1 and so, as an immediate corollary of Theorem 3, it follows that for some 0, L1 ð; gÞ 1=4 j log j
½14
Since the bound in [14] is highly dependent on dimension, establishing the existence of highcodimension singular orbits would strengthen the estimate substantially. However, this appears to be a difficult and open problem.
Maximal Blow-Up of Modes and Quasimodes We review here a number of converses to a recent result of Sogge and Zelditch (2002) on Riemannian manifolds (M, g) with maximal eigenfunction growth. These authors proved that if there exists a sequence of L2 -normalized eigenfunctions of the Laplacian of (M, g) whose L1 -norms are comparable to zonal spherical harmonics on Sn , then there must exist a point comparable to the north pole of Sn , that is, a recurrent point z such that a positive measure of geodesics emanating from z return to it at a fixed time T. The most extreme kind of recurrent point is a ‘‘blow-down point’’ of period T, where by definition all geodesics leaving z return to z at time T, that is, form geodesic loops. Poles of surfaces of revolution are blow-down points where all geodesic loops at z are smoothly closed, while umbilic points of triaxial ellipsoids are examples of blow-down points where all but two geodesic loops are not smoothly closed. On real-analytic manifolds, all recurrent points are blow-down points. The converse question is the following: what kind of mode (eigenfunction) or quasimode growth must occur when a blow-down point exists? Sogge et al. (2005) proved that maximal quasimode growth (Colin de Verdiere 1977) implies the existence of a blow-down point. This generalizes the main result of Sogge and Zelditch (2002) from modes (which one rarely understands) to quasimodes (which one often understands better). Conversely, existence of a blow-down point insures near-maximal quasimode growth, that is, here, maximal up to logarithmic factors. If one assumes that the geodesic flow Gt : T M ! T M of (M, g) is completely integrable and that dim M = 2, then the results of Sogge et al. (2005) show that actual eigenfunctions have near maximal blow-up. Examples show that, in general, blow-up points do not necessarily cause modes to have near-maximal blow-up. An important geometric invariant of a blow-down point is the first-return map to the cotangent fiber over the blow-down point: GTz : Sz M ! Sz M
½15
GTz is also an important analytic invariant: the blowup rate of modes or quasimodes, specifically the occurrence of the logarithmic factors, depends on the fixed-point structure of this map. When all geodesic loops at z are smoothly closed, that is, when the first-return map is the identity, then there exist quasimodes of maximal growth. When the first-return map has fixed points, the maximal growth is modified by logarithmic factors.
Eigenfunctions of Quantum Completely Integrable Systems 153
To put these results in context, we first recall the local Weyl law of Avakumovich–Levitan (Duistermaat and Guillemin 1975), which states that Z X j’ ðxÞj2 ¼ ð2Þn d þ Rð; xÞ ½16 pðx;Þ
with uniform remainder bounds jRð; xÞj Cn1
x2M
It follows that L1 ð; gÞ ¼ 0ððn1Þ=2 Þ
½17
on any compact Riemannian manifold. Riemannian manifolds for which the equality L1 ð; gÞ ¼ ððn1Þ=2 Þ
½18
is achieved for some subsequence of eigenfunctions are said to be of maximal eigenfunction growth. In addition to modes, and almost inseparable from them, are the quasimodes of the Laplacian (Colin de Verdiere 1977). As the name suggests, quasimodes are approximate eigenfunctions. The crudest type of quasimode is quasimode { k } of order 0, namely a sequence of L2 -normalized functions which solve kð k Þ
k kL 2
¼ Oð1Þ
for a sequence of quasieigenvalues k . By the spectral theorem, it follows that there must exist true eigenvalues in the interval [k , k þ ] for some > 0. (M, g) is said to have maximal 0-order quasimode growth if there exists a sequence of quasimodes of order 0 for which k k kL1 = ((n1)=2 ). There are analogous definitions for more refined quasimodes, for example, quasimodes of higher order or (most refined) quasimodes defined by oscillatory integrals. It is natural to include quasimodes in this study because they often reflect the geometry and dynamics of the geodesic flow more strongly than actual modes. For quasimodes, there is the following result: Theorem 4 (Sogge et al. 2005). Let (Mn , g) be a compact Riemannian manifold with Laplacian . Then: (i) If there exists a quasimode sequence {( k , k )} (n1)=2 of order 0 with k k kL1 = (k ), then there exists a recurrent point z 2 M for the geodesic flow. If (M, g) is real analytic, then there exists a blow-down point. (ii) Conversely, if there exists a blow-down point and if the map GTz = id, then there exists a quasimode sequence {( k , k )} of order 0 with k k kL1 = (n1 k ).
(iii) Let n = 2 and (Mn , g) be real analytic. Then, if GTz has a finite number of nondegenerate fixed points, there exists a quasimode sequence 1=2 {( k , k )} of order 0 with k k kL1 = (k 1/2 j log k j ). The assumption that GTz = id is the same as saying that all geodesics leaving z smooth close up at z again. As mentioned above, poles of surfaces of revolution have this property. On the contrary, the umbilic points of triaxial ellipsoids in R3 are blow-down points for which GTz 6¼ id. That is, every geodesic leaving an umbilic point returns at the same time, but only two closed geodesics in this family are closed, and they give rise to fixed points of Gtz . One can show (see Toth 1996) that there exists a sequence of eigenfunctions in this case for which L1 (g, ) 1=2 j log j1/2 . Hence, the above result is sharp. Moreover, it is clear from the proof that the fixed points are responsible for the logarithmic correction to maximal eigenfunction growth: they cause a change in the normal form of the Laplacian near the blow-down point. Theorem 4 illustrates the intimate connection between maximal blow-up of quasimodes and existence of blow-down points. It is natural to ask, however, when blow-down points cause blow-up in modes, that is, actual eigenfunctions. As mentioned above, this is not generally the case and some further mechanism is needed to ensure it. In the case of QCI surfaces, one can prove: Theorem 5 (Sogge et al. 2005). pffiffiffiffi Let (M, g) be a smooth, compact surface, P1 = , P2 be an Eliasson nondegenerate QCI system on M and ’k be an 2 L -normalized joint eigenfunction of P1 , P2 with pffiffiffiffi ’k = k ’k . Suppose that there exists a blowdown point z 2 M for the geodesic flow Gt := exp tXp1 . Then, there exists a subsequence of (joint) Laplace eigenfunctions, ’jk , k = 1, 2, . . . , such that for any > 0, ð1=2Þ
k’jk kL1 jk
The role of complete integrability is to force joint eigenfunctions to localize on level sets of the moment map and thus to blow up at blow-down points. The proofs of Theorems 4 and 5 are similar. To prove the latter, by the same reasoning as in the orbit case (Theorem 3), one needs to bound from below the integral Z Bðz;h Þ
j’ j2 dVol
½19
154 Eigenfunctions of Quantum Completely Integrable Systems
for an appropriate subsequence of ’ s, where B(z; h ) denotes a ball of radius h centered at the blow-down point, z 2 M. The blow-down condition implies that Sz M P 1 (b) for some b 2 B. The relevant subsequence of eigenfunctions, ’ , are the ones with joint eigenvalues satisfying j( h) bj = O( h). Since the eigenfunctions ’ are microlocally concentrated on the set P 1 (b), by Ga¨rding, Z j’ j2 dVol hOph ððx; ; h ÞÞ’ ; ’ i ½20 Bðz;h Þ
where (x, , h ) is a cutoff localized on an h -neighborhood of = 1 (z) \ P 1 (b). The matrix elements on the RHS of [20] are estimated by passing to QBNF. The subtlety here lies in the choice of scale, . For 0 < < 1=2, the h-pseudodifferential operators Oph ((x, ; h )) are contained in a standard calculus (Dimassi and Sjoestrand 1999) and so they automatically satisfy the h-Egorov theorem. In particular, the passage to normal form by conjugating with the U ’s is automatic. The crucial point here is that to obtain the (near)-maximal blow-up near a blow-down point z 2 M, one needs to able to choose 0 < 1. Using second-microlocal methods similar to the ones in Sjoestrand and Zworski (1999), it is shown in Sogge et al. (2005) that the blow-down geometry implies that the microlocal cutoffs are contained in an h-pseudodifferential operator calculus and, in particular, the relevant h-Egorov theorem needed to pass to QBNF is satisfied for any 0 < 1. Then, by explicit computation for the model eigenfunctions, one can show that Oph ððx; ; h ÞÞ’ ; ’ i h
½21
for any with 0 < < 1. The result in Theorem 5 then follows from the bound k’ k2L1 VolðBðz; h1 ÞÞ h
½22
where one takes arbitrarily close to 1. By analyzing the U s carefully (Sogge et al. 2005), the lower bound in Theorem 5 can be improved slightly by replacing the by j log j for some > 0, although the sharp constant, > 0, appears to be difficult to determine in general. In cases where the geometry of the first-return map, GTz , is particularly simple, one can sometimes get sharp j log j-power improvements in Theorem 5 (see Theorem 4 (iii)).
Eigenfunction Upper Bounds: Quantitative Results In light of the -bounds in Theorem 5, it is natural to ask whether there are analogous upper bounds for L1 (; g) in the QCI case. The following result holds in the case of real-analytic surfaces:
Theorem 6 (Sogge et al. 2005). Let (M, g) be a pffiffiffiffi real-analytic Riemannian 2-manifold and P1 = and P2 be a QCI system on (M, g) where, the principal symbol, p2 , of P2 is a metric form on T M. (i) If M ffi T 2 , L1 ð; gÞ ¼ Oð1=4 Þ (ii) If M ffi S2 , let Mrec be the set of completely recurrent points for the geodesic flow, Gt : T M ! T M and let rec M be an open neighborhood of Mrec . Then, L1 ð; gÞjMrec ¼ Oð1=4 Þ An old result of Kozlov says that if the surface (M, g) is analytic, then topologically either M ffi S2 or M ffi T 2 , so that the estimates in Theorem 6 cover all possible cases in two dimensions. The assumptions in Theorem 6 are satisfied in many examples including surfaces of revolution, Liouville surfaces, and ellipsoids with distinct axes in R3 . The proof of Theorem 6 follows from a pointwise (joint) trace formula argument (Duistermaat and Guillemin 1975). Namely, in Sogge et al. (2005), it is shown that if there are no blow-down points for Gt , then for appropriate 2 S(R) with 0 and ˆ 2 C1 0 (R), 1 h i h i X ð1Þ ð2Þ h1 j ðhÞ b1 h1 j ðhÞ b2 j¼1
Þj2 ¼ Oðh1=2 Þ j’ ðx; h
½23
where the estimate in [23] is uniform in x 2 M and locally uniform in b = (b1 , b2 ) 2 B. Part (ii) follows from this. To prove part (i), one applies a simple homological argument to show that if M ffi T 2 , there cannot exist blow-down points for the geodesic flow (see also Sogge and Zelditch (2002)).
Open Problems Most questions related to eigenfunction blow-up are completely open and general results are rare (Sogge and Zelditch 2002). Specific results/conjectures in the ergodic case can be found in Quantum Ergodicity and Mixing of Eigenfunctions. We would like to point out here some specific questions related to the above results in the QCI case: 1. All the known examples with blow-down points turn out to be integrable. Is this necessarily always the case? 2. Does the maximal bound L1 (; g) (n1)=2 necessarily imply that (M, g) is QCI?
Eight Vertex and Hard Hexagon Models 155
3. At the other extreme, does the minimal bound L1 (; g) 1 necessarily imply that (M, g) is flat, or do there exist nonflat manifolds (which are necessarily not QCI) satisfying L1 (; g) 1?
Acknowledgmnt The research of J A Toth was partially supported by NSERC grant OGP0170280 and a William Dawson Fellowship. See also: Functional Equations and Integrable Systems; Quantum Ergodicity and Mixing of Eigenfunctions.
Further Reading Colin de Verdiere Y and Parisse B (1994a) Equilibre instable en re´gime semi-classique I: concentration microlocale. Communications in Partial Differential Equations 19: 1535–1563. Colin de Verdiere Y and Parisse B (1994b) Equilibre instable en re´gime semi-classique II: conditions de Bohr–Sommerfeld. Ann. Inst. Henri Poincare´. Phys. Theor. 61(3): 347–367. Colin de Verdiere Y and Vu Ngoc S (2003) Singular Bohr– Sommerfeld rules for 2D integrable systems. Annales Scientifiques de L’e´cole Normale Supe´rieure 4(36): 1–55. Colin de Verdiere Y (1977) Quasi-modes sur les varie´te´s Riemanniennes compactes. Inventiones Mathematicae 43: 15–52. Dimassi M and Sjoestrand J (1999) Spectral Asymptotics in the Semi-Classical Limit. London Math. Soc. Lecture Notes, vol. 1. 268. Cambridge: Cambridge University Press. Duistermaat J and Guillemin V (1975) The spectrum of positive elliptic operators and periodic bicharacteristics. Inventiones Mathematicae 29: 39–79.
Eliasson LH (1984) Hamiltonian Systems with Poisson Commuting Integrals. Ph.D. thesis, University of Stockholm. Guillemin V (1996) Wave-trace invariants. Duke Mathematical Journal 83(2): 287–352. Helffer B and Sjoestrand J (1990) Semiclassical analysis of Harper’s equation III. Bull. Soc. Math. France, Me´moire No. 39 (1990). Iatchenko A, Sjoestrand J, and Zworski M (2002) Birkhoff normal forms in semi-classical inverse problems. Mathematical Research Letters 9(2–3): 337–362. Lerman E and Shirokova N (2002) Completely integrable torus actions on symplectic cones. Mathematical Research Letters 9(1): 105–115. Sjoestrand J and Zworski M (1999) Asymptotic distribution of resonances for convex obstacles. Acta Mathematica 183: 191–253. Sogge CD, Toth JA, and Zelditch S (2005) Blow-up of modes and quasimodes at blow-down points. Preprint. Sogge CD and Zelditch S (2002) Riemannian manifolds with maximal eigenfunction growth. Duke Mathematical Journal 114(3): 387–437. Toth JA (1996) Eigenfunction localization in the quantized rigid body. Journal of Differential Geometry 43(4): 844–858. Toth JA and Zelditch S (2002) Riemannian manifolds with uniformly bounded eigenfunctions. Duke Mathematical Journal 111(1): 97–132. Toth JA and Zelditch S (2003) Lp -norms of eigenfunctions in the completely integrable case. Annales Henri Poincare´ 4: 343–368. Vu-Ngoc S (2000) Formes normales semi-classiques des syste`mes comple`tement integrables au voisinage d’un point critique de l’application moment. Asymptotic Analysis 24(3–4): 319–342. Zelditch S (1990) Quantum transition amplitudes for ergodic and completely integrable systems. Journal of Functional Analysis 94: 415–436. Zelditch S (1998) Wave invariants for non-degenerate closed geodesics. Geometric and Functional Analysis 8: 179–217.
Eight Vertex and Hard Hexagon Models P A Pearce, University of Melbourne, Parkville, VIC, Australia ª 2006 Elsevier Ltd. All rights reserved.
Introduction The goal of statistical mechanics is to calculate the macroscopic properties of matter from a knowledge of the fundamental interactions between the constituent microscopic components. For simplicity, let us assume discrete states. The mathematical problem, as formulated by Gibbs, is then to calculate the partition function X ZN ¼ e HðÞ ½1 states
where = 1=kB T is the inverse temperature, kB is the Boltzmann constant, and the Hamiltonian H describes the interaction energy of the state of the
N constituent degrees of freedom. The formidable nature of the problem ensues from the fact that ZN is needed in the limit of an arbitrarily large system to obtain the bulk free energy (T) or partition function per site in the thermodynamic limit 1 log ZN ¼ log N!1 N
ðTÞ ¼ lim
½2
This limit generally exists because the free energy of a finite system is extensive, that is, it grows proportionally with the system size. Once the bulk free energy is known, the other thermodynamic potentials are obtained, in principle, by taking derivatives with respect to the temperature T and other thermodynamic fields such as the volume V or the external magnetic field h. Phase transitions and the accompanying critical phenomena are associated with singularities of the bulk free energy as a function of the thermodynamic fields. Up until the beginning of the 1970s, there were
156 Eight Vertex and Hard Hexagon Models
only a handful of two-dimensional lattice models that had yielded exact solution, most notably, the Ising model (free-fermion or dimer model), the spherical model, the square ice, and six-vertex models. This situation changed dramatically with Baxter’s solution of the eight-vertex and hard-hexagon models. The methods developed by Baxter make it possible to solve an infinite plethora of two-dimensional lattice models. In this article, we compare and contrast the remarkable properties of these two prototypical models that played such a pivotal role in the emergence of the modern theory of Yang–Baxter integrability.
Definition of the Models
In the face representation, the arrow states are often called bond variables. Formally, the P Hamiltonian is a sum over local energies H = faces E(, , , ), where W(, , , ) = exp(E(, , , )) but we use face weights since E is infinite for excluded configurations. The general eight-vertex model includes many other ferroelectric models including the rectangular Ising model, Slater’s model of potassium dihydrogen phosphate (KDP), the Rys F model of an antiferroelectric, the square ice model and the sixvertex model solved by Lieb. In the case of the sixvertex model, !4 = !8 = 0, so arrows are conserved with ‘‘two in’’ and ‘‘two out’’ at each vertex. The eight-vertex model can be formulated as an Ising model with spins a, b, c, d = 1 at the corners of the elementary faces and Boltzmann face weights
Eight-Vertex Model
The eight-vertex model emerged from the study of two-dimensional ferroelectrics. The local degrees of freedom are arrow states , , , = 1 which live on the edges of the elementary faces of the square lattice and describe the local polarization within the ferroelectric material. Of the 16 possible configurations around a face, the local configurations of an elementary square face are restricted to the eight configurations shown in Figure 1. The partition function is 0 1 X Y ZN ¼ W@ A ½3 arrow states faces
W
d
c
a
b
a
γ
W
+
+
ω1
+
–
ω5
冢 冣 δ
α
β =δ
+
+
+
+
–
+
+
–
β=δ
½4
β
α +
–
+
+
+
–
–
+
+
–
–
ω3
ω7
+
ω4 –
+
–
+
+
–
+
ω8
Figure 1 The eight vertex configurations of the eight-vertex model showing one of the two corresponding configurations of the related Ising model. The model is solvable in the symmetric case, !1 = !5 , !2 = !6 , !3 = !7 , !4 = !8 , when the Boltzmann weights are equal in pairs under arrow reversal.
b
The four independent vertex weights are related to R, K, L, M by !1 ¼ !5 ¼ ReKþLþM !2 ¼ !6 ¼ ReKLþM
½6
!3 ¼ !7 ¼ ReKLM !4 ¼ !8 ¼ ReKþLM
This is not the usual rectangular Ising model since it involves four-spin interactions in addition to two-spin interactions. The spins and arrows are related by ¼ ab;
α
–
ω2
ω6
γ
½5
b
where the Boltzmann face weights are given alternative graphical representations as a face or vertex
γ
= R exp(Kac + Lbd + Mabcd) d c d c =a =
¼ bc;
¼ cd;
¼ da
½7
This mapping is one-to-two, since we can arbitrarily fix one spin somewhere on the lattice. It follows that ZIsing = 2Zvertex . The eight-vertex model obviously includes the six-vertex (!4 = !8 ) and the rectangular Ising models (M = 0). Although it is not at all obvious, the three-spin Ising model is also included as a special case (K = M, L = 0). Notice that the eight-vertex face weights are invariant under spin reversal of the spins on either diagonal. This Z2 Z2 symmetry, which the eightvertex model shares with the Ashkin–Teller model, is peculiar because it allows the model to exhibit continuously varying critical exponents. Because of symmetries and duality, it is sufficient to consider the regime !1 > !2 þ !3 þ !4 with !2 , !3 , !4 > 0. In terms of spins, this corresponds to the
Eight Vertex and Hard Hexagon Models 157
ferromagnetically ordered phase; in terms of vertices or arrows, this corresponds to the ferroelectric phase. The eight-vertex model is critical on the four surfaces !1 ¼ !2 þ !3 þ !4 ;
! 2 ¼ !1 þ !3 þ !4
!3 ¼ !1 þ !2 þ !4 ;
! 4 ¼ !1 þ !2 þ !3
½8
A convenient parameter to measure the departure from criticality t = (T Tc )=Tc is t¼
1 ½ð!1 !2 !3 !4 Þ 16!1 !2 !3 !4 ð!1 !2 þ !3 þ !4 Þ ð!1 þ !2 !3 þ !4 Þ ð!1 þ !2 þ !3 !4 Þ
½9
Because of the unusual four-spin interaction, it is difficult to realize the eight-vertex model experimentally in the laboratory. Hard-Hexagon Model
The hard-hexagon model is a two-dimensional lattice model of a gas of hard nonoverlapping particles. The particles are placed on the sites of a triangular lattice with nearest-neighbor exclusion so that no two particles are together or adjacent. Effectively, the triangular lattice is partially covered with nonoverlapping hard tiles of hexagonal shape. Let us draw the triangular lattice as a square lattice with one set of diagonals as in Figure 2. The partition function for the hardhexagon model is ZN ¼
N X
z n gðn; NÞ
½10
n¼0
where z > 0 is the activity and g(n, N) is the number of ways of placing n particles on the N sites such that no two particles are together or adjacent. To each lattice site j, assign a spin or occupation
number j ; if the site is empty, j = 0; if the site is full, j = 1. The partition function can then be written in terms of spins as X Y ZN ¼ zði þj Þ=6 ð1 i j Þ ½11 spins hiji
where the product is over all bonds hiji of the triangular lattice and the sum is over all configurations of the N spins or occupation numbers j = 0, 1. The exponent of z arises because the activity is shared out between the six bonds incident at each site. The remaining term, (1 i j ) = 0, 1, ensures that neighboring sites are not occupied simultaneously by excluding such terms from the sum. The activity z gives the a priori probability of finding a particle at a given site and can be written as z = e , where is the chemical potential. The density of particles increases monotonically as the activity increases but only a third of the total lattice sites can be occupied. At low activities, there are only a few particles scattered randomly so the S3 sublattice symmetry of the triangular lattice is preserved. However, at higher activities approaching the close-packing limit, there is a sudden change and one of the three sublattices is preferentially occupied so the S3 sublattice symmetry is spontaneously broken. This dramatic change signals an order– disorder phase transition at some critical value zc of the activity. The system is disordered below the critical activity but is ordered above it. The fundamental problem is to obtain the statistical properties of this model such as the bulk free energy and the sublattice densities
k ¼ hk i ¼ ffraction of spins sitting on sublattice k ¼ 1; 2; 3g
½12
in the thermodynamic limit N ! 1. The mean density is
¼ ð 1 þ 2 þ 3 Þ=3 1=3
½13
Assuming that sublattice k = 1 is preferentially occupied, an order parameter is defined by R ¼ 1 2
Figure 2 The triangular lattice drawn as a square lattice with one set of diagonals. The close-packed arrangement of particles (solid circles) fills one of the three independent sublattices. One of the nonoverlapping hard hexagons is shown shaded. At low activities, the hard hexagons are sparsely scattered on the lattice with no preferential occupation of a particular sublattice.
½14
The order parameter vanishes in the disordered regime but is nonzero in an ordered regime. Notice that the symmetry between sublattices k = 2 and 3 is not broken. Unlike the eight-vertex model, the hard-hexagon model can be realized by a physical system in the laboratory, namely helium adsorbed on a graphite surface. The graphite substrate is composed of hexagonal cells formed by six carbon atoms with
158 Eight Vertex and Hard Hexagon Models
˚ Energetically, the an interatom distance of 2.46 A. adsorbed helium atoms prefer to sit in the potential well at the center of the hexagonal cells. The ˚ diameter of the helium atom, however, is 2.56 A, which precludes the simultaneous occupation of neighboring cells by excluded volume effects. Some beautiful experiments carried out by Bretz indicate that this system undergoes a phase transition. Indeed, Bretz took precise measurements of the specific heat as the temperature or, equivalently, the activity z, is varied, and obtained a symmetric power-law divergence at the critical point C jz zc j ;
0:36
½15
restricted by certain constraints. The spins a, b, c, d are absent for the eight-vertex model and the arrows , , , are absent for the hard-hexagon model. The general Yang–Baxter equations take the following algebraic and graphical forms: f ζ g e δ d d γ c ⑀ u υ η ξ β υ−u μ W W W ξ g,η,ξ,ζ a α b f ζ g g η b
= g,η,ξ,ζ
with critical exponent close to 1/3. Of course, one does not actually see divergences experimentally. Rather, it is the presence of dramatic peaks in the specific heat that are the hallmarks of a secondorder transition.
Yang–Baxter Equations and Commuting Transfer Matrices Yang–Baxter Equations
The eight-vertex and hard-hexagon models were solved by Rodney Baxter at the beginning of the 1970s and 1980s, respectively. Although the two models are quite different in nature, they are quintessential of exactly solvable lattice models. The seminal work of Baxter gives a precise criterion to decide if a two-dimensional lattice model is exactly solvable: it is exactly solvable if its local face weights satisfy the celebrated Yang–Baxter equation. We present a general formulation of the Yang–Baxter equations and commuting transfer matrices and then show how Baxter implemented these for the eight-vertex and hard-hexagon models. The first important step in the exact solution of a two-dimensional lattice model is the parametrization of the Boltzmann weights in terms of a distinguished variable u called the spectral parameter. Typically, critical models involve trigonometric or hyperbolic functions and off-critical models involve elliptic functions of the spectral parameter. In terms of u, the local Boltzmann weights of a general twodimensional lattice model take the form
d γ c d γ c W δ β u =δ u β a α b a α b
½16
where the allowed values of the spins a, b, c, . . . and arrows (or bond variables) , , , . . . may be
g ζ c
e η g
e δ d
ξ υ−u W ξ γ u β υ W η W ⑀ f μ a a α b g ζ c e δ d e δ d ⑀ υ γ u γ ⑀ f c υ−u c = f υ−u μ υ β μ u β a α b a α b
[17]
Graphically, this equation can be interpreted as saying that the diamond-shaped face with spectral parameter v u can be pushed through from the right to the left with the effect of interchanging the spectral parameters u and v in the remaining two faces. Commuting Transfer Matrices
A square lattice is built up row-by-row using the row transfer matrix T(u) with matrix elements
〈a, α⏐T(u)⏐c, γ〉
cj γj cj + 1
N
=
c1 γ1 c2 γ2 c3 γ3 c4 = β1
½18
βj + 1 u W βj β1,β2,...,βN = ±1 j = 1 aj αj aj + 1
u
u
u
a1 α1 a2 α2 a3 α3 a4
cN γ1 c1
• • •
u
u • • •
u
u
β1
½19
aN α1 a1
Here there are N columns, and periodic boundary conditions are applied so that aNþ1 = a1 , Nþ1 = 1 , and so on. The significance of the Yang–Baxter equations is that they imply a one-parameter family of commuting transfer matrices TðuÞTðvÞ ¼ TðvÞTðuÞ
½20
Pictorially, the product on the left is represented by two rows, one above the other, the lower row with spectral parameter u and the upper row with spectral parameter v. The matrix product implies
Eight Vertex and Hard Hexagon Models 159
that the spins and arrows on the intervening row are summed out. Inserting a diamond-shaped face with spectral parameter v u and then using the local Yang–Baxter equation to progressively push it from right to left around the period interchanges all of the spectral parameters u with the spectral parameter v. At the end, the diamond-shaped face is removed again. This heuristic argument was made rigorous by Baxter, who showed quite generally, and for the eight-vertex and hard-hexagon models in particular, that the diamond faces are in fact invertible:
d μμ γ u g −u c a α ⑀ ⑀ β g,ε,μ b b = ρ(u) δ(a, c) δ(α, β) δ(γ, δ) δ
where ðK; L; MÞ ¼ sinh 2K sinh 2L þ tanh 2M cosh 2K cosh 2L
If M and are regarded as fixed, this is seen to be a symmetric biquadratic relation between e2K and e2L and is naturally parametrized in terms of elliptic functions. Unfortunately, many different notations and conventions for these elliptic functions appear in the literature which can be confusing to the uninitiated. Let
d
½21
independent of b, d where the scalar function (u) is model dependent. This equation is called the inversion relation. Invariably, the existence of commuting transfer matrices leads to functional equations satisfied by the transfer matrices. Typically, the transfer matrices can be simultaneously diagonalized and so the functional equations can be solved for the eigenvalues of the transfer matrices. Mathematically, this is where Yang–Baxter techniques derive their power. For example, building up the lattice row-by-row, we see that the partition function of an M N lattice is X ZMN ¼ tr TðuÞM ¼ Tn ðuÞM ½22 n
where Tn (u) are the eigenvalues of T(u). Typically, by the Perron–Frobenius theorem, the largest eigenvalue T0 (u) is real, positive, and nondegenerate: T0 ðuÞ > jT1 ðuÞj jT2 ðuÞj
½23
Consequently, X 1 log Tn ðuÞM N!1 M!1 MN n 1 log T0 ðuÞ ¼ lim N!1 N
¼ lim lim
½24
Thus the calculation of the bulk free energy is reduced to the problem of finding the largest eigenvalue of the transfer matrix.
½26
s¼
#1 ðuÞ ; #1 ðÞ
s ¼
#1 ð uÞ ; #1 ðÞ
¼
#21 ðÞ #24 ðÞ
½27
c¼
#4 ðuÞ ; #4 ðÞ
c ¼
#4 ð uÞ ; #4 ðÞ
¼
#4 ð0Þ #4 ðÞ
½28
where #1 (u) = #1 (u, q) and #4 (u) = #4 (u, q) are standard elliptic theta functions of nome q. Then the vertex weights can be parametrized as !1 ¼ R 1 cc ;
!2 ¼ R 1 ss
!3 ¼ R 1 cs ;
!4 ¼ R 1 c s
½29
In the ferromagnetic regime u, , and are all pure imaginary with 0 < q < 1 and 0 < Im u < Im < ( =2)Im . The critical line occurs in the limit q ! 1. In this sense, we are using a low-temperature elliptic parametrization. Another elliptic parametrization, which is useful to study the critical limit, is obtained by transforming to the conjugate nome q0 . If q = e then the conjugate nome is defined by q0 = e = so that q0 ! 0 as q ! 1. We regard the crossing parameter as constant, u as a variable, and write the transfer matrix as T(u). It follows from this parametrization that M and are constants, independent of u. Furthermore, any two transfer matrices T(u) and T(v) commute and hence T(u) is a one-parameter family of commuting transfer matrices. For interest, we point out that the integrable XYZ quantum spin chain belongs to this family. Specifically, the logarithmic derivative of the eight-vertex transfer matrix yields d ½log TðuÞu¼0 ¼ HXYZ du
½30
N 1X x Jx jx jþ1 þ Jy yj yjþ1 þ Jz zj zjþ1 2 j¼1
½31
where Parametrization of the Eight-Vertex Model
HXYZ
Using the spin formulation of the eight-vertex model, Baxter showed that two transfer matrices T(K, L, M), T(K0 , L0 , M0 ) commute whenever ðK; L; MÞ ¼ ðK0 ; L0 ; M0 Þ
½25
¼
and jx , jy , jz are the usual Pauli spin matrices.
160 Eight Vertex and Hard Hexagon Models Parametrization of the Hard-Hexagon Model
Actually, Baxter did not solve the hard-hexagon model directly. Instead, he solved a generalized hard-hexagon model, which is a model of hard squares with interactions along the diagonals of the elementary squares as shown in Figure 3. This in turn corresponds to the A4 case of the more general solvable AL restricted solid-on-solid (RSOS) models of Andrews, Baxter, and Forrester. The face weights of the generalized hard-hexagon model are d c W ¼ mzðaþbþcþdÞ=4 taþbcþd ð1 abÞ a b ð1 bcÞð1 cdÞð1 daÞ expðLac þ MbdÞ
½32
Here the activity z has been shared out between the four faces adjacent to a site, m is a trivial normalization constant, and t is a gauge parameter that cancels out of the partition function and transfer matrix. The anisotropy between L and M introduces an additional parameter which will play the role of the spectral parameter u. In fact, using the Yang–Baxter equation, Baxter showed that this model is exactly solvable on the manifold z ¼ ð1 eL Þð1 eM Þ=ðeLþM eL eM Þ
½33
Specifically, two transfer matrices T(z, L, M) and T(z0 , L0 , M0 ) commute whenever ðz; L; MÞ ¼ ðz0 ; L0 ; M0 Þ ðz; L; MÞ ¼ z1=2 ð1 zeLþM Þ
½34
The hard-hexagon model is recovered in the limit L = 0, M = 1 which forbids simultaneous occupation of sites joined by one set of diagonals. In this special limit, the activity z is unconstrained. It is curious to note that the pure hard-square model with L = M = 0 is not solvable. Eliminating z between the above relations gives a symmetric biquadratic relation between eL and eM ,
L
M
Figure 3 Interacting hard squares showing the diagonal interactions L and M. The hard-hexagon model corresponds to the limit L = 0, M = 1..
which is naturally parametrized in terms of elliptic functions. Choosing m and t appropriately, the Boltzmann weights are 0 0 ð2 þ uÞ W ¼ ð2Þ 0 0 0 0 0 1 ðuÞ W ¼W ¼ 1 0 0 0 ½ðÞð2Þ1=2 0 0 1 0 ð uÞ ½35 W ¼W ¼ ðÞ 0 1 0 0 0 1 ð2 uÞ W ¼ ð2Þ 1 0 1 0 ð þ uÞ W ¼ ðÞ 0 1 Here the crossing parameter is = =5, < u < 2, and ðuÞ ¼ ðu; q2 Þ 1 Y ¼ sin u ð1 q2n e2iu Þ n¼1
ð1 q2n e2iu Þð1 q2n Þ
½36
is a nonstandard elliptic theta function of nome q2 . Despite the deceiving notation, the nome q2 lies in the range 1 < q2 < 1 and is determined by the relation ðÞ 5 2 ¼ ¼ zð1 zeLþM Þ2 ½37 ð2Þ Regarding q2 as fixed and u as a variable, it follows that T(u) is a one-parameter family of commuting transfer matrices. The regimes relevant to the hard-hexagon model are: Regime I (disordered) :
1 < q2 < 0; c > ½1 2q2n cosð2 =5Þ þ q4n 3 ð1 p5n=3 Þ3 ð1 p10ð2n1Þ Þ < n¼1
z zc
1 > Y > ½1 2q2n cosð4 =5Þ þ q4n 2 ð1 q2n Þ2 ð1 p5n Þ > > ; > : c ½1 2q2n cosð2 =5Þ þ q4n 3 ð1 p5n=3 Þ3
z > zc
n¼1
½49
½51
162 Eight Vertex and Hard Hexagon Models
"
pffiffiffi #1=2 27ð25 þ 11 5Þ c ¼ 250 pffiffiffi 8 1 Y 1 3s2n1 þ s4n2 > > > pffiffiffi ; > 2n1 þ s4n2 < n¼1 1 þ 3s 1 " #2 e ¼
1 Y > 1 2s2n1 cos 2 þ s4n2 > 3 >
> min ; : 0 zc
3
n¼1
½53
¼
0; 1 2 kB T=;
z zc z > zc
½54
It follows that (z), (z), and (z) are analytic functions of z, except at the critical point z = zc . The associated critical exponents ðz zc Þ2 ; ðz zc Þ ;
ðz zc Þ ¼ 1=3;
¼ ¼ 5=6
½55
agree with experiments on helium adsorbed on graphite.
The one-point functions and order parameters of the eight-vertex and hard-hexagon models were obtained by Baxter by using corner transfer matrices (CTMs). The idea is to build up the square lattice quadrant-by-quadrant as shown in Figure 4. The partition function and one-point function are then tr SABCD h1 i ¼ tr ABCD
½56
where S is the diagonal matrix with entries S, = 0 and the entries A, 0 are labeled by half-rows of spins = (0 , 1 , 2 , . . . ) and 0 = (0 , 01 , 02 , . . . ).
C
D
B
A
½57
where A(u) is a commuting family of matrices. Since these are block matrices in the center spin 0 , they also commute with S. Moreover, Baxter showed that the eigenvalues of A(u) are exponentials of the form AðuÞ ¼ m expðuE Þ
½58
where the constants m and E can be evaluated in the low-temperature limit. It follows that P 4 2E 0 m e h0 i ¼ P ½59 4 2E m e When the Boltzmann weights do not exhibit symmetry about the diagonals, which is the case for hard hexagons, the above arguments need to be modified. One-Point Functions of the Eight-Vertex Model
For the eight-vertex model, Baxter showed that m ¼ 1;
Corner Transfer Matrices
Z ¼ tr ABCD;
The CTMs have some remarkable properties. If the Boltzmann weights are invariant under reflections about the diagonals, as is the case for the eight-vertex model, Baxter argued that, in the limit of a large lattice,
N 1 X E ¼ i jHðj1 ; j ; jþ1 Þ 2 j¼1
Hðj1 ; j ; jþ1 Þ ¼ 1 j1 jþ1
½60
subject to the boundary condition N = Nþ1 = þ1. Introducing a new set of spins j ¼ 1; 2; . . . ; N
j ¼ j1 jþ1 ;
½61
we have 0 = 1 3 5 . . . . Setting s = (xz)1=2 = e iu=2 , t = (x=z)1=2 = e i(u)=2 and taking the limit of large N, the diagonalized matrices are direct products of 2 2 matrices: 1 0 1 0 1 0 S¼ ½62 0 1 0 1 0 1 AðuÞ ¼ CðuÞ 1 0 1 ¼ 0 s 0
0 s2
BðuÞ ¼ DðuÞ 1 0 1 ¼ 0 t 0
0 t2
1 0
0 s3
1 0
0 t3
½63
½64
It follows that the magnetization is Figure 4 The square lattice divided into four quadrants corresponding to the CTMs A, B, C, D. The spin at the center is 0 .. The spins on the boundaries are fixed by the boundary conditions.
h0 i ¼
1 Y 1 x4n2
1 þ x4nþ2 n¼1
¼ ðk0 Þ1=4 ¼ ð1 k2 Þ1=8
½65
Eight Vertex and Hard Hexagon Models 163
where k0 = k0 (x2 ) is the conjugate elliptic modulus of nome x2 and the associated critical exponent is h0 i ðtÞ ;
¼ =16
After applying some Rogers–Ramanujan identities and introducing the elliptic functions
½66
QðxÞ ¼
The polarization of the eight-vertex model is hi ¼ h0 1 i ¼
1 Y n¼1
One-Point Functions of the Hard-Hexagon Model
For hard hexagons, the working is more complicated because one must keep track of the sublattice of the central spin 0 , but fascinating connections emerge with the Rogers–Ramanujan functions: 1 Y n¼1 1 Y
1 Y QðxÞ ¼ ð1 x2n1 Þ PðxÞ ¼ Qðx2 Þ n¼1
½67
This cannot be obtained by a direct application of CTMs but was conjectured by Baxter and Kelland and subsequently derived by Jimbo, Miwa, and Nakayashiki using difference equations.
GðxÞ ¼
ð1
x5n1 Þ
1 HðxÞ ¼ 5n3 ð1 x Þð1 x5n2 Þ n¼1
½68
¼ 1 ¼ 2 ¼ 3 ¼
1 ¼
HðxÞQðxÞ½GðxÞQðxÞ þ x2 Hðx9 ÞQðx9 Þ Qðx3 Þ2
x2 HðxÞHðx9 ÞQðxÞQðx9 Þ
2 ¼ 3 ¼ Qðx3 Þ2
R ¼ 1 2 ¼ ¼
½69
where k = 1, 2, 3 labels the sublattice of the triangular lattice. Here the spin configurations = (0 , 1 , 3 , . . . ) with j = 0, 1 are subject to the constraint j jþ1 = 0 for all j. If jq2 j = e and g(x) = H(x)=G(x) then x ¼ e 2
x ¼ e4
2
; r20 ¼ x=gðxÞ; w0 ¼ x3
for z zc
2 1 3=2 ; r0 ¼ x gðxÞ; w0 ¼ x
for z > zc
=5
=5
½70 and 8P 1 > jðj sj Þ; > > > < j¼1 1 E ¼ P jðj j1 jþ1 > > > j¼1 > : sj þ sj1 sjþ1 Þ;
z zc ½71 z > zc
For large N, j ! sj , where the ground-state values sj determined by the boundary conditions are z zc :
sj ¼ 0
½72
z > zc :
s3jþk ¼ 1; s3jþk1 ¼ 0; k ¼ 1; 2; 3
½73
xGðxÞHðx6 ÞPðx3 Þ ; z zc Pðx2 Þ
½75
in the disordered fluid phase and
QðxÞQðx5 Þ Qðx3 Þ2
1 Y ð1 xn Þð1 x5n Þ n¼1
For hard hexagons, Baxter showed that P 20 2E tr SðAk Bk Þ2 0 r0 w0
k ¼ ¼ P 20 2E tr ðAk Bk Þ2 r0 w0
½74
the expressions for the sublattice densities simplify in the limit of large N giving
1 x5n4 Þð1
ð1 xn Þ
n¼1
n 2
1 x2n 1 þ q 1 þ x2n 1 qn
1 Y
ð1 x3n Þ2
½76
½77 z > zc
;
in the triangular ordered phase. dependence on x can be eliminated ( x½HðxÞ=GðxÞ5 ; z¼ x1 ½GðxÞ=HðxÞ5 ;
In principle, the by observing that z zc z > zc
½78
In practice, this is quite nontrivial. Although it is far from obvious, because x ! 1 is a subtle limit, the critical exponent associated with the order parameter R is R ðz zc Þ ðq2 Þ ;
¼ 1=9
½79
Summary Baxter’s exact solutions of the eight-vertex and hard-hexagon models have been reviewed. These prototypical examples clearly illustrate the mathematical power and elegance of commuting transfer matrices and Yang–Baxter techniques. The results for the principal thermodynamic quantities, including free energies, correlation lengths, interfacial tensions, and one-point functions, have been summarized. For convenience in comparison, the associated critical exponents are collected in Table 1. All these exponents confirm the hyperscaling relation 2 = d for lattice dimensionality d = 2. More recently, Yang–Baxter techniques have been applied to solve an infinite variety of lattice models in two dimensions. Commuting transfer methods have
164 Eight Vertex and Hard Hexagon Models Table 1 Comparison of the exactly calculated critical exponents of the rectangular Ising, eight-vertex and hard-hexagon models. The rectangular Ising model corresponds to the special case = =2 of the eight-vertex model. The eight-vertex exponents vary continuously with 0 < < .. The critical exponents of the hard-hexagon model, with its S3 symmetry, lie in the universality class of the three-state Potts model. Model Rectangular Ising Eight vertex Hard hexagons
0log 2 = 1/3
1/8
=16 1/9
1
=2 5/6
1
=2 5/6
also been adapted to study integrable boundaries and associated boundary critical behavior. Lastly, it should be mentioned that, in the continuum scaling limit, there are deep connections with conformal field theory and integrable quantum field theory. On the one hand, the lattice can often provide a convenient way to regularize the infinities that occur in these continuous field theories. On the other hand, the field theories can predict and explain the universal properties of lattice models such as critical exponents. See also: Bethe Ansatz; Boundary Conformal Field Theory; Hopf Algebras and q-Deformation Quantum Groups; Integrability and Quantum Field Theory; q-Special Functions; Quantum Spin Systems; Two-Dimensional Ising Model; Yang–Baxter Equations.
Further Reading Andrews GE, Baxter RJ and Forrester PJ (1984) Eight-Vertex SOS model and generalized Rogers–Ramanujan-type identities. Journal of Statistical Physics 35: 193. Baxter RJ (1971) 8 Vertex model in lattice statistics. Physics Review Letters 26: 832–833. Baxter RJ (1972) Partition function of 8 vertex lattice model. Annals of Physics 70: 193–228. Baxter RJ (1972) One-dimensional anisotropic Heisenberg chain. Annals of Physics 70: 323–337. Baxter RJ (1973) 8-vertex model in lattice statistics and onedimensional anisotropic Heisenberg chain: 1. Some fundamental eigenvectors; 2. Equivalence to a generalized ice-type lattice model; 3. Eigenvectors of transfer matrix and Hamiltonian. Annals of Physics 76: 1–24, 25–47, 48–71. Baxter RJ (1980) Hard hexagons – exact solution. Journal of Physics A13: L61. Baxter RJ (1981) Rogers–Ramanujan identities in the hard hexagon model. Journal of Statistical Physics 26: 427–452.
Baxter RJ (1982) Exactly Solved Models in Statistical Mechanics. London: Academic Press. Baxter RJ (1982) The inversion relation method for some twodimensional exactly solved models in lattice statistics. Journal of Statistical Physics 28: 1. Baxter RJ (2001) Completeness of the Bethe ansatz for the six and eight-vertex models. Journal of Statistical Physics 108: 1–48. Baxter RJ (2004) The six and eight-vertex models revisited. Journal of Statistical Physics 116: 43–66. Baxter RJ and Kelland SB (1974) Spontaneous polarization of 8-vertex model. Journal of Physics C7: L403–L406. Baxter RJ and Pearce PA (1982) Hard hexagons – interfacial tension and correlation length. Journal of Physics A15: 897–910. Baxter RJ and Pearce PA (1983) Hard squares with diagonal attractions. Journal of Physics A16: 2239. Bazhanov VV and Reshetikhin NY (1989) Critical RSOS models and conformal field theory. International Journal of Modern Physics A4: 115. Bretz M (1977) Ordered helium films on highly uniform graphite – finite-size effects, critical parameters, and 3-state Potts model. Physical Review Letters 38: 501–505. Fabricius K and McCoy BM (2003) New developments in the eight vertex model. Journal of Statistical Physics 111: 323–337. Fabricius K and McCoy BM (2004) Functional equations and fusion matrices for the eight-vertex model. Publications of the Research Institute for Mathematical Sciences 40: 905–932. Forrester PJ and Baxter RJ (1985) Further exact solutions of the eight-vertex SOS model and generalizations of the Rogers– Ramanujan identities. Journal of Statistical Physics 38: 435–472. Jimbo M (1990) Yang–Baxter Equation in Integrable Systems. Advanced Series in Mathematical Physics, vol. 10. Singapore: World Scientific. Jimbo M and Miwa T (1995) Algebraic Analysis of Solvable Lattice Models. Regional Conference Series in Mathematics, No. 85. Providence: American Mathematical Society. Johnson JD, Krinsky S and McCoy BM (1973) Vertical arrow correlation length in 8-vertex model and low-lying excitations of XYZ Hamiltonian. Physical Review A 8: 2526–2547. Klu¨mper A and Pearce PA (1991) Analytic calculation of scaling dimensions: tricritical hard squares and critical hard hexagons. Journal of Statistical Physics 64: 13. Klu¨mper A and Pearce PA (1992) Conformal weights of RSOS lattice models and their fusion hierarchies. Physica A 183: 304. Lieb EH (1967) Residual entropy of square ice. Physical Review 162: 162. Lieb EH (1967) Exact solution of F model of an antiferroelectric. Physical Review Letters 18: 1046. Lieb EH (1967) Exact solution of 2-dimensional Slater KDP model of a ferroelectric. Physics Review Letters 19: 108. Lieb EH and Wu FY (1972) Two-dimensional ferroelectric models. In: Domb C and Green MS (eds.) Phase Transitions and Critical Phenomena, vol. 1, pp. 321–490. London: Academic Press. McCoy BM (1999) The Baxter revolution. The Physicist 36(6): 210–214. Pearce PA and Baxter RJ (1984) Deviations from critical density in the generalized hard hexagon model. Journal of Physics A17: 2095–2108.
Einstein Equations: Exact Solutions
165
Einstein Equations: Exact Solutions Jirˇı´ Bicˇa´k, Charles University, Prague, Czech Republic and Albert Einstein Institute, Potsdam, Germany ª 2006 Elsevier Ltd. All rights reserved.
(x ), p(x ), U (x ), and metric g (x ) satisfying [1]. In vacuum T = 0 and [1] implies R = 0. In 1917, Einstein generalized [1] by adding a cosmological term g ( = const.): R 12 g R þ g ¼ 8T
Introduction Even in a linear theory like Maxwell’s electrodynamics, in which sufficiently general solutions of the field equations can be obtained, one needs a good sample, a useful kit, of explicit exact fields like the homogeneous field, the Coulomb monopole field, the dipole, and other simple solutions, in order to gain a physical intuition and understanding of the theory. In Einstein’s general relativity, with its nonlinear field equations, the discoveries and analyses of various specific explicit solutions revealed most of the unforeseen features of the theory. Studies of special solutions stimulated questions relevant to more general situations, and even after the formulation of a conjecture about a general situation, newly discovered solutions can play a significant role in verifying or modifying the conjecture. The cosmic censorship conjecture assuming that ‘‘singularities forming in a realistic gravitational collapse are hidden inside horizons’’ is a good illustration. Albert Einstein presented the final version of his gravitational field equations (or the Einstein’s equations, EEs) to the Prussian Academy in Berlin on 18 November 1915: R
1 8G g R ¼ 4 T 2 c
½1
Here, the spacetime metric tensor g (x ), , , , . . . = 0, 1, 2, 3, determines the invariant line element g = g dx dx , and acts also as a dynamical variable describing the gravitational field; the Ricci tensor R = g R , where g g = , is formed from the Riemann curvature tensor R ; both depend nonlinearly on g and @ g , and linearly on @ @ g ; the scalar curvature R = g R . T (x ) is the energy– momentum tensor of matter (‘‘sources’’); and Newton’s gravitational constant G and the velocity of light c are fundamental constants. If not stated otherwise, we use the geometrized units in which G = c = 1, and the same conventions as in Misner et al. (1973) and Wald (1984). For example, in the case of perfect fluid with density , pressure p, and 4-velocity U , the energy–momentum tensor reads T = ( þ p) U U þ pg . To obtain a (local) solution of [1] in coordinate patch {x } means to find ‘‘physically plausible’’ (i.e., complying with one of the positive-energy conditions) functions
½2
A homogeneous and isotropic static solution of [2] (with metric [8], k = þ1, a = const.), in which the ‘‘repulsive effect’’ of > 0 compensates the gravitational attraction of incoherent dust (‘‘uniformly distributed galaxies’’) – the Einstein static universe – marked the birth of modern cosmology. Although it is unstable and lost its observational relevance after the discovery of the expansion of the universe in the late 1920s, in 2004 a ‘‘fine-tuned’’ cosmological scenario was suggested according to which our universe starts asymptotically from an initial Einstein static state and later enters an inflationary era, followed by a standard expansion epoch (see Cosmology: Mathematical Aspects). There are many other examples of ‘‘old’’ solutions which turned out to act as asymptotic states of more general classes of models.
Invariant Characterization and Classification of the Solutions Algebraic Classification
The Riemann tensor can be decomposed as R ¼ C þ E þ G
½3
where E and G are constructed from R , R, and g (see, e.g., Stephani et al. (2003)); the Weyl conformal tensor C can be considered as the ‘‘characteristic of the pure gravitational field’’ since, at a given point, it cannot be determined in terms of the matter energy–momentum tensor T (as E and G can using EEs). Algebraic classification is based on a classification of the Weyl tensor. This is best formulated using two-component spinors A (A = 1, 2), in terms of which any Weyl spinor ABCD determining C can be factorized: ABCD ¼ ðA B C DÞ
½4
brackets denote symmetrization; each of the spinors determines a principal null direction, say, 0 A (see Spinors and Spin Coefficients). The k = A Petrov–Penrose classification is based on coincidences among these directions. A solution is of type I (general case), II, III, and N (‘‘null’’) if all null directions are different, or two, three, and all four coincide, respectively. It is of type D (‘‘degenerate’’)
166 Einstein Equations: Exact Solutions
if there are two double null directions. The equivalent tensor equations are simplest for type N: C k ¼ 0;
C C ¼ 0;
C C ¼ 0
½5
where C = (1=2) C , is the Levi-Civita pseudotensor. Classification According to Symmetries
Most of the available solutions have some exact continuous symmetries which preserve the metric. The corresponding group of motions is characterized by the number and properties of its Killing vectors satisfying the Killing equation (£ g) = ; þ ; = 0 (£ is the Lie derivative) and by the nature (spacelike, timelike, or null) of the group orbits. For example, axisymmetric, stationary fields possess two commuting Killing vectors, of which one is timelike. Orbits of the axial Killing vector are closed spacelike curves of finite length, which vanishes at the axis of symmetry. In cylindrical symmetry, there exist two spacelike commuting Killing vectors. In both cases, the vectors generate a two-dimensional abelian group. The twodimensional group orbits are timelike in the stationary case and spacelike in the cylindrical symmetry. If a timelike is hypersurface orthogonal, = , for some scalar functions , , the spacetime is ‘‘static.’’ In coordinates with = @t , the metric is g ¼ e2U dt2 þ e2U ik dxi dxk
½6
where U, ik do not depend on t. In vacuum, U satisfies :a the potential equation U:a = 0, the covariant derivatives (denoted by :) are with respect to the three-dimensional metric ik . A classical result of Lichnerowicz states that if the vacuum metric is smooth everywhere and U ! 0 at infinity, the spacetime is flat (for refinements, see Anderson (2000)). In cosmology, we are interested in groups whose regions of transitivity (points can be carried into one another by symmetry operations) are threedimensional spacelike hypersurfaces (homogeneous but anisotropic models of the universe). The threedimensional simply transitive groups G3 were classified by Bianchi in 1897 according to the possible distinct sets of structure constants but their importance in cosmology was discovered only in the 1950s. There are nine types: Bianchi I to Bianchi IX models. The line element of the Bianchi universes can be expressed in the form g ¼ dt2 þ gab ðtÞ!a !b
½7
where the time-independent 1-forms !a = Ea dx satisfy the relations d!a = (1=2)Cabc !b ^ !c , d is
the exterior derivative and Cabc are the structure constants (see Cosmology: Mathematical Aspects for more details). The standard Friedmann–Lemaˆıtre–Robertson– Walker (FLRW) models admit in addition an isotropy group SO(3) at each point. They can be represented by the metric dr2 2 2 2 2 g ¼ dt2 þ ½aðtÞ2 þ r ðd þ sin d’ Þ ½8 1 kr2 in which a(t), the ‘‘expansion factor,’’ is determined by matter via EEs, the curvature index k = 1, 0, þ1, the three-dimensional spaces t = const. have a constant curvature K = k=a2 ; r 2 [0, 1] for closed (k= þ1) universe, r 2 [0, 1) in open (k = 0, 1) universes (for another description (see Cosmology: Mathematical Aspects). There are four-dimensional spacetimes of constant curvature solving EEs [2] with T = 0: the Minkowski, de Sitter, and anti-de Sitter spacetimes. They admit the same number [10] of independent Killing vectors, but interpretations of the corresponding symmetries differ for each spacetime. If satisfies £ g = 2g , = const., it is called a homothetic (Killing) vector. Solutions with proper homothetic motions, 6¼ 0, are ‘‘self-similar.’’ They cannot in general be asymptotically flat or spatially compact but can represent asymptotic states of more general solutions. In Stephani et al. (2003), a summary of solutions with proper homotheties is given; their role in cosmology is analyzed by Wainwright and Ellis (eds.) (1997); for mathematical aspects of symmetries in general relativity, see Hall (2004). There are other schemes for invariant classification of exact solutions (reviewed in Stephani et al. (2003)): the algebraic classification of the Ricci tensor and energy–momentum tensor of matter; the existence and properties of preferred vector fields and corresponding congruences; local isometric embeddings into flat pseudo-Euclidean spaces, etc.
Minkowski (M), de Sitter (dS), and Anti-de Sitter (AdS) Spacetimes These metrics of constant (zero, positive, negative) curvature are the simplest solutions of [2] with T = 0 and = 0, > 0, < 0, respectively. The standard topology of M is R4 . The dS has the topology R1 S3 and is best represented as a four-dimensional hyperboloid v2 þ w2 þ x2 þ y2 þ z2 = (3=) in a fivedimensional flat space with metric g = dv2 þ dw2 þ dx2 þ dy2 þ dz2 . The AdS has the topology S1 R3 ; it is a four-dimensional hyperboloid v2 w2 þ x2 þ y2 þ z2 = (3=), < 0, in flat five-dimensional space
Einstein Equations: Exact Solutions
with signature ( , , þ , þ, þ). By unwrapping the circle S1 and considering the universal covering space, one gets rid of closed timelike lines. These spacetimes are all conformally flat and can be conformally mapped into portions of the Einstein universe (see Asymptotic Structure and Conformal Infinity). However, their conformal structure is globally different. In M, one can go to infinity along timelike/null/spacelike geodesics and reach five qualitatively different sets of points: future/past timelike infinity i , future/past null infinity I , and spacelike infinity i0 . In dS, there are only past and future conformal infinities I , I þ , both being spacelike (on the Einstein cylinder, the dS spacetime is a ‘‘horizontal strip’’ with I þ =I as the ‘‘upper/lower circle’’). The conformal infinity in AdS is timelike. As a consequence of spacelike I in dS, there exist both particle (cosmological) and event horizons for geodesic observers (Hawking and Ellis 1973). dS plays a (doubly) fundamental role in the present-day cosmology: it is an approximate model for inflationary paradigm near the big bang and it is also the asymptotic state (at t ! 1) of cosmological models with a positive cosmological constant. Since recent observations indicate that > 0, it appears to describe the future state of our universe. AdS has come recently to the fore due to the ‘‘holographic’’ conjecture (see AdS/CFT Correspondence). Christodoulou and Klainermann, and Friedrich proved that M, dS, and AdS are stable with respect to general, nonlinear (though ‘‘weak’’) vacuum perturbations – result not known for any other solution of EEs (see Stability of Minkowski Space).
Schwarzschild and Reissner–Nordstro¨m Metrics These are spherically symmetric spacetimes – the SO3 rotation group acts on them as an isometry group with spacelike, two-dimensional orbits. The metric can be brought into the form g ¼ e2 dt2 þ e2 dr2 þ r2 ðd2 þ sin2 d’2 Þ
½9
(t, r), (t, r) must be determined from EEs. In vacuum, we are led uniquely to the Schwarzschild metric 2M 2M 1 2 2 g¼ 1 dr dt þ 1 r r þ r2 ðd2 þ sin2 d’2 Þ
½10
where M = const. has to be interpreted as mass, as test particle orbits show. The spacetime is static at r > 2M, that is, outside the Schwarzschild radius at r = 2M, and asymptotically (r ! 1) flat.
167
Metric [10] describes the exterior gravitational field of an arbitrary (static, oscillating, collapsing, or expanding) spherically symmetric body (spherically symmetric gravitational waves do not exist). It is the most influential solution of EEs. The essential tests of general relativity – perihelion advance of Mercury, deflection of both optical and radio waves by the Sun, and signal retardation – are based on [10] or rather on its expansion in M=r. Space missions have been proposed that could lead to measurements of ‘‘postpost-Newtonian’’ effects (see General Relativity: Experimental Tests, and Misner et al. (1973)). The full Schwarzschild metric is of importance in astrophysical processes involving compact stars and black holes. Metric [10] describes the spacetime outside a spherical body collapsing through r = 2M into a spherical black hole. In Figure 1, the formation of an event horizon and trapped surfaces is indicated in ingoing Eddington–Finkelstein coordinates (v, r, , ’) where v = t þ r þ 2M log (r=2M 1) so that (v, , ’) = const. are ingoing radial null geodesics. The interior of the star is described by another metric (e.g., the Oppenheimer–Snyder collapsing dust solution – see below). The Kruskal extension of the Schwarzschild solution, its compactification, the concept of the bifurcate Killing horizon, etc., are analyzed Singularity r = 0
Event horizon r = 2M Outgoing photon
Trapped surfaces
Q
Infalling photon ν = const.
P Surface of star
O Figure 1 Gravitational collapse of a spherical star (the interior of the star is shaded). The light cones of three events, O, P, Q, at the center of the star, and of three events outside the star are illustrated. The event horizon, the trapped surfaces, and the singularity formed during the collapse are also shown. Although the singularity appears to lie along the direction of time, from the character of the light cone outside the star but inside the event horizon we can see that it has a spacelike character. Reproduced from Bicˇa´k J (2000) Selected solutions of Einstein’s field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and their physical Implications, Lecture Notes in Physics, vol. 540, pp. 1–126. Heildelberg: Springer, with permission from Springer-Verlag.
168 Einstein Equations: Exact Solutions
hole with two event horizons at r = r = M (M2 Q2 )1=2 . The Killing vector @=@t is null at the horizons, timelike at r > rþ and r < r , but spacelike between the horizons. The character of the extended spacetime is best seen in the compactified form, Figure 2, in which world-lines of radial light rays are 45 lines. Again, two infinities (right and left, in regions I and III) arise (as in the Kruskal–Schwarzschild diagram, see Stationary Black Holes), however, the maximally extended geometry consists of an infinite chain of asymptotically flat regions connected by ‘‘wormholes’’ between the singularities at r = 0. In contrast to the Schwarzschild singularity, the singularities are timelike – they do not block the way to the future. The inner horizon r = r represents a Cauchy horizon for a typical initial hypersurface like (Figure 2): what is happening in regions V is in general influenced not only by data on but also at the singularities. The Cauchy horizon is unstable (for references, see Bicˇa´k (2000) and recent work by Dafermos (2005)).
The analytic extension of the electrovacuum metric [11] is qualitatively different from the Kruskal extension of the Schwarzschild metric. In the case Q2 > M2 there is a ‘‘naked singularity’’ (visible from r ! 1) at r = 0 where curvature invariants diverge. If Q2 < M2 , the metric describes a (generic) static charged black
r=
r+ r=
)
∞
I′
r+
III ′
=
–
(r
I
I–
)
½11
∞
þ r2 ðd2 þ sin2 d’2 Þ
(r =
in Stationary Black Holes and in Misner et al. (1973), Hawking and Ellis (1973), and Bicˇa´k (2000). The Reissner–Nordstro¨m solution describes the exterior gravitational and electromagnetic fields of a spherical body with mass M and charge Q. The energy-momentum tensor on the right-hand side of EE [2] is that of the electromagnetic field produced by the charge; the field satisfies the curved-space Maxwell equations. The metric reads 1 2M Q2 2M Q2 2 g¼ 1 þ 2 dt þ 1 þ 2 dr2 r r r r
World line of a shell i–
i–
r–
r=
r=
r–
IV ′
V
V
r+
) ∞
=
r=
)
(r
∞
i0
) ∞
r+
(r
)
r+
∞
r=
=
i–
Spacelike hypersurface t = 0
I
r=
(r –
+
=
III
I
I
(r
r+
i0
i+
II
r=
+
Cauchy horizon for Σ
r–
r=
r–
r=
i+
I
r=0 (Singularity)
=
r=0 (Singularity)
IV
I
–
i–
Figure 2 The compactified Reissner–Nordstro¨m spacetime representing a non-extreme black hole consists of an infinite chain of asymptotic regions (‘‘universes’’) connected by ‘‘wormholes’’ between timelike singularities. The world-line of a shell collapsing from ‘‘universe’’ I and re-emerging in ‘‘universe’’ I 0 is indicated. The inner horizon at r = r is the Cauchy horizon for a spacelike hypersurface : It is unstable and thus it will very likely prevent such a process. Reproduced from Bicˇa´k J (2000) Selected solutions of Einstein’s field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and their Physical Implications, Lecture Notes in Physics, vol. 540, pp. 1–126. Heildelberg: Springer, with permission from Springer-Verlag.
Einstein Equations: Exact Solutions
For M2 = Q2 the two horizons coincide at rþ = r = M. Metric [11] describes extreme Reissner– Nordstro¨m black holes. The horizon becomes degenerate and its surface gravity vanishes (see Stationary Black Holes). Extreme black holes play a significant role in string theory (Ortı´n 2004).
Stationary Axisymmetric Solutions Assume the existence of two commuting Killing vectors – timelike and axial ( 0), normalized at (asymptotically flat) infinity, at the rotation axis. They generate two-dimensional orbits of the group G2 . Assume there exist 2-spaces orthogonal to these orbits. This is true in vacuum and also in case of electromagnetic fields or perfect fluids whose 4-current or 4-velocity lies in the surfaces of transitivity of G2 (e.g., toroidal magnetic fields are excluded). The metric can then be written in Weyl’s coordinates (t,,’,z) g ¼ e2U ðdt þ Ad’Þ2 þ e2U ½e2k ðd2 þ dz2 Þ þ 2 d’2
½12
U, k, and A are functions of , z. The most celebrated vacuum solution of the form [12] is the Kerr metric for which U, k, A are ratios of simple polynomials in spheroidal coordinates (simply related to (, z)). The Kerr solution is characterized by mass M and specific angular momentum a. For a2 > M2 , it describes an asymptotically flat spacetime with a naked singularity. For a2 M2 , it represents a rotating black hole that has two horizons which coalesce into a degenerate horizon for a2 = M2 – an extreme Kerr black hole. The two horizons are located at r = M (M2 a2 )1=2 (r being the Boyer–Lindquist coordinate (see Stationary Black Holes)). As with the Reissner– Nordstro¨m black hole, the singularity inside is timelike and the inner horizon is an (unstable) Cauchy horizon. The analytic extension of the Kerr metric resembles Figure 2 (see Frolov and Novikov (1998), Hawking and Ellis (1973), Misner et al. (1973), Ortı´n (2004), Semera´k et al. (2002), Stephani et al. (2003), and Wald (1984) for details). Thanks to the black hole uniqueness theorems (see Stationary Black Holes), the Kerr metric is the unique solution describing all rotating black holes in vacuum. If the cosmic censorship conjecture holds, Kerr black holes represent the end states of gravitational collapse of astronomical objects with supercritical masses. According to prevalent views, they reside in the nuclei of most galaxies. Unlike with a spherical collapse, there are no exact solutions available which would represent the formation of a Kerr black hole. However,
169
starting from metric [12] and identifying, for example, z = b = const. and z = b (with the region b < z < b being cut off), one can construct thin material disks which are physically plausible and can be the sources of the Kerr metric even for a2 > M2 (see Bicˇa´k (2000) for details). In a general case of metric [12], EEs in vacuum imply the ‘‘Ernst equation’’ for a complex function f of and z: 1 ð a2 , A = a2 = const. for 0 u a2 . This example demonstrates, within exact theory, that the waves travel with the speed of light, produce relative accelerations of test particles, focus astigmatically generally propagating parallel rays, etc. The focusing effects have a remarkable consequence: there exists no global spacelike hypersurface on which initial data could be specified – plane wave spacetimes contain no global Cauchy hypersurface. ‘‘Impulsive’’ plane waves can be generated by boosting a ‘‘particle’’ at rest to the velocity of light by an appropriate limiting procedure. The ultrarelativistic limit of, for example, the Schwarzschild metric (the socalled Aichelburg–Sexl solution) can be employed as a ‘‘limiting incoming state’’ in black hole encounters (cf. monograph by d’Eath (1996)). Plane-fronted waves have been used in quantum field theory. For a review of exact impulsive waves, see Semera´k et al. (2002). A collision of plane waves represents an exceptional situation of nonlinear wave interactions which can be analyzed exactly. Figure 3 illustrates a typical case in which the collision produces a spacelike singularity. The initial-value problem with data given at v = 0 and u = 0 can be formulated in terms of the equivalent matrix Riemann–Hilbert problem (see Riemann–Hilbert Problem); it is related to the hyperbolic counterpart of the Ernst equation [13]. For reviews, see Griffiths (1991), Stephani et al. (2003), and Bicˇa´k (2000).
Figure 3 A spacetime diagram indicating a collision of two planefronted gravitational waves which come from regions II and III, collide in region I, and produce a spacelike singularity. Region IV is flat. Reproduced from Bicˇa´k J (2000) Selected solutions of Einstein’s field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and their Physical Implications, Lecture Notes in Physics, vol. 540, pp. 1–126. Heildelberg: Springer, with permission from Springer-Verlag.
Einstein Equations: Exact Solutions
spacetimes, dispersion of waves, quasilocal mass– energy, cosmic censorship conjecture, or quantum gravity in the context of midisuperspaces (see Bicˇa´k (2000) and Belinski and Verdaguer (2001)). In the metric g ¼ e2ð Þ ðdt2 þ d2 Þ þ e2 dz2 þ 2 e2 d’2
½15
(t, ) satisfies the flat-space wave equation and (, t) is given in terms of by quadratures. Admitting a ‘‘cross term’’ !(t, ) dz d, one acquires a second degree of freedom (a second polarization) which makes all field equations nonlinear. Boost-Rotation Symmetric Spacetimes
These are the only explicit solutions available which are radiative and represent the fields of finite sources. Figure 4 shows two particles uniformly accelerated in opposite directions. In the space diagram (left), the ‘‘string’’ connecting the particles is the ‘‘cause’’ of the acceleration. In ‘‘Cartesian-type’’ coordinates and the z-axis chosen as the symmetry axis, the boost Killing vector has a flat-space form, = z(@=@t) þ t(@=@z), the same is true for the axial Killing vector. The metric contains two functions of variables 2 x2 þ y2 and 2 z2 t2 . One satisfies the flat-space wave equation, the other is determined by quadratures. The unique role of these solutions is exhibited by the theorem which states that in axially symmetric, locally asymptotically flat spacetimes, in the sense that a null infinity (see Asymptotic Structure and Conformal Infinity) exists but not necessarily globally, the only additional symmetry that does not exclude gravitational t z
+
ξ y
x
z
+
–z Figure 4 Two particles uniformly accelerated in opposite directions. Orbits of the boost Killing vector (thinner hyperbolas) are spacelike in the region t 2 > z 2 : Reproduced from Bicˇa´k J (2000) Selected solutions of Einstein’s field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and their Physical Implications, Lecture Notes in Physics, vol. 540, pp. 1–126. Heildelberg: Springer, with permission from Springer-Verlag.
171
radiation is the boost symmetry. Various radiation characteristics can be expressed explicitly in these spacetimes. They have been used as tests in numerical relativity and approximation methods. The best-known example is the C-metric (representing accelerating black holes, in general charged and rotating, and admitting ), see Bonnor et al. (1994), Bicˇa´k (2000), Stephani et al. (2003), and Semera´k et al. (2002). Robinson–Trautman Solutions
These solutions are algebraically special but in general they do not possess any symmetry. They are governed (u is the retarded time, a by a function P(u, , ) complex spatial coordinate) which satisfies a fourthorder nonlinear parabolic differential equation. Studies by Chrus´ciel and others have shown that RT solutions of Petrov type II exist globally for all positive ‘‘times’’ u and converge asymptotically to a Schwarzschild metric, though the extension across the ‘‘Schwarzschild-like’’ horizon can only be made with a finite degree of smoothness. Generalization to the cases with > 0 gives explicit models supporting the cosmic no-hair conjecture (an exponentially fast approach to the dS spacetime) under the presence of gravitational waves. See Bonnor et al. (1994), Bicˇa´k (2000), and Stephani et al. (2003).
Material Sources Finding physically sound material sources in an analytic form even for some simple vacuum metrics remains an open problem. Nevertheless, there are solutions representing regions of spacetimes filled with matter which are of considerable interest. One of the simplest solutions, the spherically symmetric Schwarzschild interior solution with incompressible fluid as its source, represents ‘‘a star’’ of uniform density, = 0 = const.: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 2 2 1 AR2 1 Ar dt g¼ 2 2 þ
dr2 þ r2 ðd2 þ sin2 d’2 Þ 1 Ar2
½16
A = 80 =3 = const., R is the radius of the star. The equation of hydrostatic equilibrium yields pressure inside the star: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Ar2 1 AR2 8p ¼ 2A pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½17 3 1 AR2 1 Ar2 Solution [16] can be matched at r = R, where p = 0, to the exterior vacuum Schwarzschild solution [10] if the Schwarzschild mass M = (1=2)AR3 . Although ‘‘incompressible fluid’’ implies an infinite speed of sound, the above solution provides an instructive
172 Einstein Equations: Exact Solutions
model of relativistic hydrostatics. A Newtonian star of uniform densityffi can have an arbitrarily large radius qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi R = 3pc =220 and mass M = (pc =20 ) 6pc =, pc is the central pressure. However, [17] implies that (1) M and R satisfy the inequality 2M=R 8=9, (2) equality is reached as pc -becomes infinite and R and M attain their limiting values Rlim = (30 )1=2 = (9=4)Mlim . For a density typical in neutron stars, 0 = 1015 g cm3 , we : get Mlim ¼ 3.96M (M solar mass) – even this simple model shows that in Einstein’s theory neutron stars can only be a few solar masses. In addition, one can prove that the ‘‘Buchdahl’s inequality’’ 2M=R 8=9 is valid for an arbitrary equation of state p = p(). Only a limited mass can thus be contained within a given radius in general relativity. The gravitational redshift z = (1 2M=R)1=2 1 from the surface of a static star cannot be higher than 2. Many other explicit static perfect fluid solutions are known (we refer to Stephani et al. (2003) for a list), however, none of them can be considered as really ‘‘physical.’’ Recently, the dynamical systems approach to relativistic spherically symmetric static perfect fluid models was developed by Uggla and others which gives qualititative characteristics of masses and radii. The most significant nonstatic spacetime describing a bounded region of matter and its external field is undoubtedly the Oppenheimer–Snyder model of ‘‘gravitational collapse of a spherical star’’ of uniform density and zero pressure (a ‘‘ball of dust’’). The model does not represent any new (local) solution: the interior of the star is described by a part of a dust-filled FLRW universe (cf. [8]), the external region by the Schwarzschild vacuum metric (cf. eqn [10], Figure 1). Since Vaidya’s discovery of a ‘‘radiating Schwarzschild metric,’’ null dust (‘‘pure radiation field’’) has been widely used as a simple matter source. Its energy–momentum tensor, T = %k k , where k k = 0, may be interpreted as an incoherent superposition of waves with random phases and polarizations moving in a single direction, or as ‘‘lightlike particles’’ (photons, neutrinos, gravitons) that move along k . The ‘‘Vaidya metric’’ describing spherical implosion of null dust implies that in case of a ‘‘gentle’’ inflow of the dust, a naked singularity forms. This is relevant in the context of the cosmic censorship conjecture (cf., e.g., Joshi (1993)).
Cosmological Models There exist important generalizations of the standard FLRW models other than the above-mentioned Bianchi models, particularly those that maintain spherical symmetry but do not require homogeneity. The best known are the Lemaıˆtre–Tolman–Bondi
models of inhomogeneous universes of pure dust, the density of which may vary (Krasin´ski 1997). Other explicit cosmological models of principal interest involve, for example, the Go¨del universe – a homogeneous, stationary spacetime with < 0 and incoherent rotating matter in which there exist closed timelike curves through every point; the Kantowski–Sachs solutions – possessing homogeneous spacelike hypersurfaces but (in contrast to the Bianchi models) admitting no simply transitive G3 ; and vacuum Gowdy models (‘‘generalized Einstein–Rosen waves’’) admitting G2 with compact 2-tori as its group orbits and representing cosmological models closed by gravitational waves. See Cosmology: Mathematical Aspects and references Stephani et al. (2003), Belinski and Verdaguer (2001), Bicˇa´k (2000), Hawking and Ellis (1973), Krasin´ski (1997) and Wainwright and Ellis (1997). See also: AdS/CFT Correspondence; Asymptotic Structure and Conformal Infinity; Cosmology: Mathematical Aspects; Dirac Fields in Gravitation and Nonabelian Gauge Theory; Einstein Manifolds; Einstein’s Equations with Matter; General Relativity: Experimental Tests; General Relativity: Overview; Hamiltonian Reduction of Einstein’s Equations; Integrable Systems: Overview; Newtonian Limit of General Relativity; Pseudo-Riemannian Nilpotent Lie Groups; Reimann–Hilbert Problem; Spacetime Topology, Causal Structure and Singularities; Spinors and Spin Coefficients; Stability of Minkowski Space; Stationary Black Holes; Twistor Theory: Some Applications.
Further Reading Anderson MT (2000) On the structure of solutions to the static vacuum Einstein equations. Annales Henri Poincare´ 1: 995–1042. Belinski V and Verdaguer E (2001) Gravitational Solitons. Cambridge: Cambridge University Press. Bicˇa´k J (2000) Selected solutions of Einstein’s field equations: their role in general relativity and astrophysics. In: Schmidt BG (ed.) Einstein’s Field Equations and Their Physical Implications, Lecture Notes in Physics, vol. 540, pp. 1–126, (see also gr-qc/0004016). Heidelberg: Springer. Bonnor WB (1992) Physical interpretation of vacuum solutions of Einstein’s equations. Part I. Time-independent solutions. General Relativity and Gravitation 24: 551–574. Bonnor WB, Griffiths JB, and MacCallum MAH (1994) Physical interpretation of vacuum solutions of Einstein’s equations. Part II. Time-dependent solutions. General Relativity and Gravitation 26: 687–729. Dafermos M (2005) The interior of charged black holes and the problem of uniqueness in general relativity. Communications on Pure and Applied Mathematics LVIII: 445–504. D’Eath PD (1996) Black Holes: Gravitational Interactions. Oxford: Clarendon Press. Frolov VP and Novikov ID (1998) Black Hole Physics. Dordrecht: Kluwer Academic. Griffiths JB (1991) Colliding Plane Waves in General Relativity. Oxford: Oxford University Press.
Einstein Equations: Initial Value Formulation Hall GS (2004) Symmetries and Curvature Structure in General Relativity. Singapore: World Scientific. Hawking SW and Ellis GFR (1973) The Large Scale Structure of Space-Time. Cambridge: Cambridge University Press. Joshi PS (1993) Global Aspects in Gravitation and Cosmology. Oxford: Clarendon. Krasin´ski A (1997) Inhomogeneous Cosmological Models. Cambridge: Cambridge University Press. Misner C, Thorne KS, and Wheeler JA (1973) Gravitation. San Francisco: WH Freeman. Ortı´n T (2004) Gravity and Strings. Cambridge: Cambridge University Press.
173
ˇ ofka M (eds.) (2002) Semera´k O, Podolsky´ J, and Z Gravitation: Following the Prague Inpiration. Singapore: World Scientific. Stephani H, Kramer D, MacCallum MAH, Hoenselaers C, and Herlt E (2003) Exact Solutions of Einstein’s Field Equations – Second Edition. Cambridge: Cambridge University Press. Wainwright J and Ellis GFR (eds.) (1997) Dynamical Systems in Cosmology. Cambridge: Cambridge University Press. Wald RM (1984) General Relativity. Chicago: The University of Chicago Press.
Einstein Equations: Initial Value Formulation J Isenberg, University of Oregon, Eugene, OR, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Einstein’s theory of gravity models a gravitating physical system S using a spacetime (M4 , g, ) which satisfies the Einstein field equations G ðgÞ ¼ T ðg; Þ
½1
F ðg; Þ ¼ 0
½2
4
Here, M is a four-dimensional spacetime manifold, g is a Lorentz signature metric on M, represents the nongravitational (‘‘matter’’) fields of interest, G := R (1=2)g R is the Einstein curvature tensor, is a constant, T is the stress–energy tensor for the field , and F = 0 represents the nongravitational field equations (e.g., r F = 0 for the Einstein–Maxwell theory). By far the most widely used way to obtain and to study spacetime solutions (M4 , g, ) of equations [1]–[2] is via the initial-value (or Cauchy) formulation. The idea is as follows: 1. One chooses a set of initial data D which consists of geometric as well as matter information on a spacelike slice of M4 . This data must satisfy a system of constraint equations, which comprise a portion of the field equations [1]–[2], and are analogous to the Maxwell constraint equation r E = 0. 2. One fixes a time and coordinate choice to be used in evolving the fields into the spacetime (e.g., maximal time slicing and zero shift). This choice should result in a fixed set of evolution equations for the data. 3. Using the evolution equations, one evolves the data into the future and the past. From the evolved data, one constructs the spacetime solution (M4 , g, ). Why is this procedure so popular? First, because we have known for over 50 years that at least for a short time, it works. That is, as shown by Choquet-Bruhat (Foures-Bruhat 1952), the Cauchy
formulation is well posed. Second, because it fits with the way we like to model physical systems. That is, we first specify what the system is like now, and we then use the equations to determine the behavior of the system as it evolves into the future (or the past). Third, because the formulation is eminently amenable to numerical treatment. Indeed, virtually all numerical simulations of colliding black hole systems as well as of most other relativistic astrophysical systems are done using some version of the initial-value formulation. Finally, because the initial-value formulation casts the Einstein equations into a form which is readily accessible to many of the tools of geometric analysis. Questions such as cosmic censorship are turned into conjectures which can be analyzed and proved mathematically, and the proofs of both the positivity of mass and the Penrose mass inequality rely on an initial-value interpretation. There are of course drawbacks to the Cauchy formulation. Foremost, Einstein’s theory of general relativity is inherently a spacetime-covariant theory; why break spacetime apart into space plus time when covariance has played such a key role in the theory’s success? As well, we have learned over and over again that null cones and null hypersurfaces play a major role in general relativity; the initial-value formulation is not especially good at handling them. These drawbacks show that there are analyses in general relativity for which the initial-value formulation may not be well suited. However, there is a preponderance of applications for which this formulation is an invaluable tool, as evidenced by its ubiquitous use. A complete treatment of the initial-value formulation for Einstein’s equations would include discussion of each of the following topics: 1. A statement and proof of well-posedness theorems, including a discussion of the regularity of the data needed for such results. 2. A space þ time decomposition of the fields, and a formal derivation of the Einstein constraint equations and the Einstein evolution equations.
174 Einstein Equations: Initial Value Formulation
3. An outline of the Hamiltonian version of the initial-value formulation. 4. A listing of those choices of field variables and gauge choices for which the system is manifestly hyperbolic. 5. A description of the known methods for finding and parametrizing solutions of the Einstein constraint equations. 6. A comparison of the virtues and drawbacks of various choices of time foliation and coordinate threading. 7. A compendium of results concerning long-time behavior of solutions. 8. An account of the difficulties which arise in attempts to construct solutions numerically using the Cauchy formulation. 9. A recounting of cases in which the initial-value formulation has been used to model physically interesting systems. 10. A note regarding the extent to which the initialvalue formulation (and the various aspects of it just enumerated) generalize to dimensions other than 3 þ 1 (three space and one time). 11. A determination of which nongravitational fields may be coupled to Einstein’s theory in such a way that the resulting coupled theory admits an initial-value formulation. We do not have the space here for such a complete treatment. So we choose to focus on those topics directly related to the Einstein constraint equations. Generalizing a bit to the Einstein– Maxwell theory (thereby including representative nongravitational fields), we first carry out the space plus time ‘‘3 þ 1’’ decomposition of the gravitational and electromagnetic fields. Then, applying the Gauss–Codazzi–Mainardi equations to the spacetime curvature, we turn the spacetime-covariant Einstein–Maxwell equations into a set of constraint equations restricting the choice of initial data together with a set of evolution equations developing the data in time. Next, we discuss the most widely used approach for obtaining sets of initial data which satisfy the constraint equations: the conformal method. We include in this discussion an account of some of what is known about the extent to which the equations which are produced by the conformal method admit solutions in various situations (e.g., working on a closed manifold, or working with asymptotically Euclidean data). We then discuss alternate procedures which have been used to obtain and analyze solutions of the constraints, including the conformal thin sandwich approach, the quasispherical method, and various gluing procedures. Finally, we make concluding
remarks. For more details on some of the topics discussed here, and for treatment of some of the other topics listed above, see the recent review paper of Bartnik and Isenberg (2004).
Space þ Time Field Decomposition and Derivation of the Constraint Equations To understand what sort of initial data one needs to choose in order to construct a spacetime via the initialvalue formulation, it is useful to consider a spacetime (M4 , g) which satisfies the Einstein (–Maxwell) field equations and contains a Cauchy surface i0 : 3 ! M4 . We note that the existence of a Cauchy surface in (M4 , g, A) is not automatic; if one exists, the spacetime is said to be (by definition) ‘‘globally hyperbolic.’’1 Among its other properties, a Cauchy surface is a spacelike embedded submanifold of a Lorentz geometry. It immediately follows that the spacetime (M4 , g, A) induces on 3 a Riemannian metric , a timelike normal vector field e? , an intrinsic (-compatible) covariant derivative r, and a symmetric ‘‘extrinsic curvature’’ tensor field K (second fundamental form). It also follows that certain components of the spacetime curvature tensor can be written in terms of these Cauchy surface quantities (, e? , r, K) along with other geometric quantities related to them, such as the spatial curvature R corresponding to the induced covariant derivative r (Gauss–Codazzi equations). To complete the curvature 3 þ 1 decomposition (i.e., to carry it out for all components of the spacetime curvature), we need not just one Cauchy surface, but rather a full local foliation it : 3 ! M4 of the spacetime by such submanifolds. This foliation allows one to define e? as a smooth vector field on an open neighborhood of the Cauchy surface i0 (3 ) in M4 . It also results in a threading of spacetime by a congruence of timelike paths (see Figure 1). This threading may be viewed as a spacetime-filling family of observers. It also defines for the spacetime a set of coordinates relative to which one can measure and calculate the dynamics of the spacetime geometry. It is useful for later purposes to note that at each spacetime point p 2 t M4 (Here t := it (3 ).) the vector @=@t tangent to the threading path through p may be decomposed as @ ¼ Ne? þ X @t
½3
1 The Taub–NUT spacetime is an example of a spacetime which is not globally hyperbolic.
Einstein Equations: Initial Value Formulation
M4
one readily derives the (Gauss–Codazzi–Mainardi) 3 þ 1 decomposition of the curvature: 2 4
Reabc ¼ Reabc þ Kac Keb Kab Kec
it 1
4
it
2
Figure 1 3 þ 1 Foliation and threading of spacetime.
with the ‘‘shift vector’’ X tangent to the surface (X 2 Tp t ), and with the ‘‘lapse’’ N a scalar (see Figure 2). Using these quantities, we can write the spacetime metric in the form g ¼ ? ? ¼ ab ðdxa þ Xa dtÞðdxb þ Xb dtÞ N 2 dt2
½4
where ? is the unit length timelike 1-form which annihilates all vectors tangent to the hypersurfaces of the foliation. Relying on the following 3 þ 1 decomposition of the spacetime-covariant derivative 4 r (Here {@a } is a coordinate basis for the vectors tangent to the hypersurfaces of the foliation; {@a , e? } constitutes a basis for the full set of spacetime vectors at p.): 4
r@a @b ¼ r@a @b Kab e? 4
4
re? @b ¼ Km b @m þ ½e? ; @b þ 4
re? e? ¼ mn
@m N @n N
@b N e? N
½10
4
m R? a?b ¼ Le? Kab Kam Kb þ
A ¼ A þ ? ¼ Ab dxb þ ðN þ Ab Xb Þ dt
½12
for a scalar . Based on this decomposition, one has the following 3 þ 1 decomposition for the electromagnetic 2-form F: 4 4
F?a ¼ ac Ec
½13
Fab ¼ r@a Ab r@b Aa
½14
where Ec is the electric vector field. We may now use all of these decomposition formulas to write out the 14 field equations for the Einstein–Maxwell theory 4
½6
G ¼ F F 14g F F 4 r 4 F ¼ 0
½7
in terms of the spatial fields (, K, N, X; A, E, ) and their derivatives. We obtain
½5
r@a e? ¼ Km a @m
R? abc ¼ r@c Kab r@b Kac
½9
r@a r@b N ½11 N where L denotes the surface-projected Lie derivative. Since we are interested here in the 3 þ 1 formulation of the Einstein–Maxwell system, we need a 3 þ 1 decomposition for the electromagnetic as well as the gravitational field. The spacetime 1-form ‘‘vector potential’’ 4 A pulls back on each Cauchy surface t to a spatial 1-form A. One may then write 4
Σ3
175
½8
∂ ∂t Ne⊥ e⊥ X
Figure 2 Decomposition of the time evolution vector field @=@t.
R Kmn Kmn þ ðtr KÞ2 ¼ 12Em Em þ 12Bm Bm
½15 ½16
½17
m n rm K m a r@a ðtr KÞ ¼ amn E B
½18
Le? Kab ¼ Rab 2Km a Kmb þ ðtr KÞKab r @a r @b N þ Ea Eb þ Ba Bb N m r @m E ¼ 0
½19 ½20
Le? Ea ¼ amn r@m Bn
½21
where abc is the alternating Levi-Civita symbol (component representation of the Hodge dual), and where we have used Ba := mn a (r@m An r@n Am ) as a convenient shorthand. 2 Here and throughout this article, we use the Misner–Thorne– Wheeler (MTW) (Misner et al. 1973) conventions for the definition of the Riemann curvature, for the signature þ þþ of the metric, for the index labels (Greek indices run over {0, 1, 2, 3} while Latin indices run over {1, 2, 3}), etc.
176 Einstein Equations: Initial Value Formulation
It is immediately evident that nine of these equations ([19] and [21]) involve time derivatives of the spatial fields, while five of them ([17], [18], and [20]) do not. Thus, we may split the field equations of the Einstein– Maxwell theory into two sets: (1) the constraint equations [17], [18], and [20], which restrict our choice of the Einstein–Maxwell initial data (, K, A, E); and (2) the evolution equations, which describe how to evolve the data (, K, A, E) in time, presuming that one has also prescribed (freely!) the ‘‘atlas fields’’ (N, X, ).3 We note that the complete system of evolution equations for the Einstein– Maxwell field equations includes equations which are based on the definitions of K and E. Written in terms of (surface-projected) Lie derivatives along @=@t , the full system takes the form L @ ab ¼ 2NKab þ LX ab
L @ Kab @t
½22 m ¼ N Rab 2Km a Kmb þ Km Kab þ Ea Eb þ Ba Bb @t
r@a r@b N þ LX Kab
½23
(Here and below, for convenience, we replace r@a by ra .) This is an underdetermined problem, with five equations to be solved for 18 functions. The idea of the conformal method is to divide the initial data on 3 into two sets – the ‘‘free (conformal) data,’’ and the ‘‘determined data’’ – in such a way that, for a given choice of the free data, the constraint equations become a determined elliptic partial differential equation (PDE) system, to be solved for the determined data. There are a number of ways to do this; we focus here on one of them – the ‘‘semidecoupling split’’ or ‘‘method A.’’ After describing this version of the conformal method, and discussing what one can do with it, we note some of its drawbacks and then later (in the next section) consider some alternatives. (See Choquet-Bruhat and York (1980) and Bartnik and Isenberg (2004) for a more complete discussion of these alternatives.) For the Einstein–Maxwell theory, the split of the initial data is as follows:
As noted earlier, well-posedness theorems4 guarantee that initial data satisfying the constraint equations [17], [18], and [20] on a manifold 3 can always at least locally be evolved into a spacetime solution (3 I, g, 4 A) (for I some interval in R1 ) of the Einstein–Maxwell equations. We now turn our attention to the issue of finding sets of data which do satisfy the constraints.
Free (‘‘conformal’’) data
ij – a Riemannian metric, specified up to conformal factor; ij – a divergence-free5(ri ij = 0), tracefree ( ij ij = 0); symmetric tensor; – a scalar field; a – a 1-form; E b – a divergence-free vector field; Determined data
– a positive-definite scalar field; W i – a vector field; – a scalar field.
The Conformal Method
For a given choice of the free data, the five equations to be solved for the five functions of the determined data take the form
L @ Aa ¼ NðEa þ r@a Þ þ LX Aa
½24
L @ Ea ¼ N amn r@m Bn þ LX Ea
½25
@t
@t
We seek to find sets of data (, K, A, E) on a manifold 3 which satisfy the constraint equations R Kmn Kmn þ ðtr KÞ2 ¼ 12 Em Em þ 12 Bm Bm
¼ 0
½29
m n 2 6 rm ðLWÞm a ¼ 3 r@a þ amn E
½30
½26
¼ 18 R 18 ðmn þ LW mn Þðmn þ LWmn Þ 7
m n rm Km a ra ðtr KÞ ¼ amn E B
½27
1 1 2 5 þ 16 ðE m E m þ m m Þ 3 þ 12 ð Þ
rm Em ¼ 0
½28
3 The collective name ‘‘atlas field’’ for the lapse N, the shift X, the electric potential , and other such fields which are neither constrained by the constraint equations nor evolved by the evolution equations, derives from their role in controlling the evolution of coordinate charts and bundle atlases in the course of the construction of spacetime solutions of relativistic field equations like the Einstein Maxwell system. 4 While the work cited earlier (Foures-Bruhat 1952) proves well posedness for the vacuum Einstein equations only, the extension to the Einstein–Maxwell system is straightforward
½31
where the Laplacian and the scalar curvature R are based on the ab -compatible covariant derivative ri , where L is the corresponding conformal Killing operator, defined by ðLWÞab :¼ ra Wb þ rb Wa 23 ab rm W m
½32
5 In the free data, the divergence-free condition is defined using the Levi-Civita-covariant derivative compatible with the conformal metric ij .
Einstein Equations: Initial Value Formulation
and where a := mn a (r@m n r@n m ). Presuming that for the chosen free data one can indeed solve equations [29]–[31] for , , and W, then the initial data (, K, A, E) constructed via the formulas ab ¼ 4 ab
½33
Kab ¼ 2 ðab þ LWab Þ þ 13 4 ab
½34
A b ¼ b
½35
Ea ¼ 6 ðE a þ r@a Þ
½36
satisfy the Einstein–Maxwell constraint equations [26]–[28]. Before discussing the extent to which one can solve equations [29]–[31] and consequently use the conformal method to generate solutions, we wish to comment on how these equations are derived. Three formulas are key to this derivation. The first is the formula for the scalar curvature of the metric ab = 4 ab , expressed in terms of the scalar curvature for ab and derivatives of : RðÞ ¼ 4 Rð Þ 8
½37
We note that if we were to use a different power of as the conformal factor multiplying ab , then this formula would involve squares of first derivatives of as well. The second key formula relates the divergence of a traceless symmetric tensor ab with respect to the covariant derivatives r() and r( ) compatible with conformally related metrics. One obtains 2 m 2 rm ðÞ mb ¼ rð Þ ð mb Þ
½38
The third key formula does the same thing for a vector field a : 6 rðÞm m ¼ 6 rm ð Þ ð m Þ
½39
In addition to helping us derive equations [29]–[31] from the substitution of formulas [33]–[36] into [26]–[28], these key formulas indicate to some extent how the choice of the explicit decomposition of the initial data into free and determined data is made (see Isenberg, Maxwell, and Pollack for further elaboration). It is easy to see that there are some choices of the free data for which [29]–[31] do not admit any solutions. Let us choose, for example, 3 to be the 3-sphere, and let us set to be the round sphere metric, to be zero everywhere, to be unity everywhere, and both and E to vanish everywhere. We then readily determine that eqn [29] requires that be constant and that eqn [30] requires that LWab be zero. The remaining equation [31] now takes the form = (1=8)R þ (1=12) 5 . Since the right-hand side of this equation is positive definite
177
(recall the requirement that > 0), it follows from the maximum principle on closed (compact without boundary) manifolds that there is no solution. In light of this example, one would like to know exactly for which sets of free data eqns [29]–[31] can be solved, and for which sets they cannot. Since one readily determines that every set of initial data which satisfies the Einstein–Maxwell constraints [26]–[28] can be obtained via the conformal method, such a classification effectively provides a parametrization of the space of solutions of the constraints.6 What we know and do not know about classifying free data for the solubility of eqns [29]–[31] is largely determined by whether or not the function is chosen to be constant on 3 . If is chosen to be constant, then eqns [29]–[31] effectively decouple, and the classification is essentially completely known. Sets of initial data generated from free data with constant are called ‘‘constant mean curvature’’ (CMC) sets, since the mean curvature of the initial slice embedded in its spacetime development is given by . We also know a considerable amount about the classification if jr j is sufficiently small (‘‘near CMC’’), while virtually nothing is known for the general non-CMC case. A full account of the classification results known to date is beyond the scope of this article. Indeed, such an account must separately deal with a number of alternatives regarding manifold and asymptotic conditions (data on a closed manifold; asymptotically Euclidean data; asymptotically hyperbolic data; data on an incomplete manifold with boundaries) and regularity (analytic data, smooth data, Ck data, or data contained in various Ho¨lder or Sobolev spaces), among other things. We will, however, now summarize some of the results; see, for example, Bartnik and Isenberg (2004) or Choquet-Bruhat for more complete surveys. CMC Data on Closed Manifolds
Generalizing the S3 example given above, we note that for any set of free data (3 , ab , ab , , a , E b ) with constant and with no conformal Killing fields, eqn [29] is easily solved for , and then eqn [30] takes the form m n rm ðLWÞm a ¼ amn E
½40
6 Of course, in claiming that appropriate sets of the free data parametrize the space of solutions of the constraints, one needs to determine if inequivalent sets of free data are mapped to the same set of solutions. We discuss this below.
178 Einstein Equations: Initial Value Formulation
which is a linear elliptic PDE for Wm with invertible operator.7 This equation admits a unique solution, and then the problem of solving the constraints reduces to the analysis of the ‘‘Lichnerowicz equation’’ [31]. To determine if this equation admits a solution for the given set of free data, we use the following classification criteria: (1) The metric is labeled positive Y þ (3 ), zero Y 0 (3 ), or negative Y (3 ) Yamabe class depending upon whether the metric
ab on 3 can be conformally deformed so that its scalar curvature is everywhere positive, everywhere zero, or everywhere negative.8 (2) The (ab , a , E b ) portion of the data is labeled either or 6 , depending upon whether the quantity mn mn þ E m E m þ m m is identically zero, or not. (3) The mean curvature is labeled ‘‘max’’ or ‘‘nonmax’’ depending upon whether the constant is zero or not. In terms of these criteria, we have 12 classes of free data, and one can prove (Choquet-Bruhat and York 1980, Isenberg 1995) the following:
Solutions exist for the classes (Y þ , 6 , max), (Y þ , 6 , nonmax), (Y 0 , , max), (Y 0 , 6 , max), (Y , , nonmax), (Y , 6 , nonmax) and Solutions do not exist for the classes (Y þ , , max), (Y þ , , nonmax), (Y 0 , , nonmax), (Y 0 , 6 max), (Y , , max), (Y , 6 , max). This classification is exhaustive, in the sense that every set of CMC data on a closed manifold fits neatly into exactly one of the classes. We note that the proofs of existence of solutions can generally be done using the sub–super solution technique, while the nonexistence results follow from application of the maximum principle. Maximal Asymptotically Euclidean Data
Just as is the case for data on a closed manifold, the constraint equations [29] and [30] decouple from the Lichnerowicz equation [31] for asymptotically Euclidean data with constant . We note that 6¼ 0 is inconsistent with the data being 7 A metric has a conformal Killing field if the equation LY = 0 has a nontrivial solution Y. Geometrically, the existence of a conformal Killing field Y indicates that the flow of (3 , ab ) along Y is a conformal isometry. While free data with nonvanishing conformal Killing fields can be handled, for convenience we shall stick to data without them here. 8 Work on the Yamabe problem (Aubin 1998) shows that every Riemannian metric on a closed manifold is contained in one and only one of these classes. In fact, the Yamabe theorem (Schoen 1984) shows that every metric can be conformally deformed so that its scalar curvature is þ1, 0, or 1, but this result is not needed for the analysis of the constraint equations.
asymptotically Euclidean, so we restrict to the maximal case, = 0. The criterion for solubility of the constraints in conformal form for maximal asymptotically Euclidean free data is quite a bit simpler to state than that for CMC data on a closed manifold. It involves the metric only; the rest of the free data is irrelevant. Specifically, as shown by Brill and Cantor (with a correction by Maxwell (2005)), a solution exists if and only if for every nonvanishing, compactly supported, smooth function f on 3 , we have R inf
ff 60g
M ðjrf j
2
pffiffiffiffiffiffiffiffiffiffi þ Rf 2 Þ det
kf k2L2
>0
½41
Alternative Methods for Finding Solutions to the Constraint Equations While the conformal method has proved to be a very useful tool for generating and analyzing solutions of the Einstein constraint equations, it does have some minor drawbacks: (1) The free data is remote from the physical data, since the conformal factor can vastly change the physical scale on different regions of space. (2) While casting the constraints into a determined PDE form has the advantage of producing PDEs of a relatively familiar (elliptic) form, one does give up certain flexibilities inherent in an underdetermined set of PDEs. (We expand upon this point below in the course of discussing gluing.). (3) In choosing a set of free data, one does have to first project out a divergence-free vector field (E) and a divergencefree tracefree tensor field (). (4) While the choice of CMC free data for the conformal method is conformally covariant in the sense that conformally related sets of CMC free data (3 , ab , ab , , a , E b ) and (3 , 4 ab , 2 ab , , a , 6 E b ) produce the same physical solution to the constraints, this is not the case for non-CMC free data. Conformal Thin Sandwich
The last two of these problems can be removed by modifying the conformal method in a way which York (1999) has called the ‘‘conformal thin sandwich’’ (CTS) approach. The basic idea of the CTS approach is the same as that of the conformal method. However, CTS free data sets are larger – the divergence-free tracefree symmetric tensor field is replaced by a tracefree symmetric tensor field U,
Einstein Equations: Initial Value Formulation
and an extra scalar field is added – and after solving the CTS constraint equations ¼ 0
½42
m n 2 6 rm ðð2Þ1 ðLXÞÞm a ¼ 3 ra þ amn E þ rm ð2Þ1 Uam ½43
½44
for the vector field Y and the conformal factor , one obtains not just the full set of physical initial data satisfying the constraint equations [26]–[28] ab ¼ 4 ab
½45
Kab ¼ 2 ðUab þ LYab Þ þ 13 4 ab
½46
A b ¼ b
½47
a
6
E ¼ ðE a þ ra Þ
½48
but also the lapse N and shift X
Applying either the conformal method or the CTS approach to the constraint equations results in systems of elliptic equations. Another approach, pioneered by Bartnik (1993), produces instead parabolic equations. In the simplest version of this approach, known as the ‘‘quasispherical ansatz,’’ one works on a manifold 3 = R3 nB3 , where B3 is a 3-ball; one presumes that there exist coordinates (r, , ) on 3 in terms of which the metric takes the ‘‘quasispherical’’ form QS ¼ u2 dr2 þ ðrd þ drÞ2 þ ðr sin d þ drÞ2
N ¼ 6
½49
Xa ¼ Y a
½50
Clearly, in using the CTS approach, one need not project out a divergence-free part of a symmetric tracefree tensor. One also readily checks that the CTS method is conformally covariant in the sense discussed above: the physical data generated from CTS free data ( ab , Uab , , , a , E b ) and from data (4 ab , 2 Uab , , 6 , a , 6 E b ) are the same. Furthermore, since the mathematical form of eqns [42]–[44] is very similar to that of [29]–[31], the solvability results for the conformal method can be essentially carried over to the CTS approach. There is, however, one troubling feature of the CTS approach. The problem arises if we seek CMC initial data with the lapse function chosen so that the evolving data continue to have CMC (such a gauge choice is often used in numerical relativity). In the case of the conformal method, after solving [29]–[31] to obtain initial data (ab , Kab , Aa , Eb ) which satisfies the constraints, one achieves this by proceeding to solve a linear homogeneous elliptic PDE for the lapse function. One easily verifies that solutions to this extra equation always exist. By contrast, in the CTS approach, the extra equation takes the form ð7 Þ ¼ 18 7 R þ 52 ðÞ1 ðU LXÞ2 1 þ 16 ðÞ1 ðE 2 þ 2 Þ
þ 5 Y m r@m 5
which is coupled to the system [42]–[44]. The coupling is fairly intricate; hence little is known about the existence of solutions to the system, and it has been seen that there are problems with uniqueness. Such problems of course do not arise if one makes no attempt to preserve CMC. The Quasispherical Ansatz and Parabolic Methods
¼ 18 R 18 ðUmn þ LY mn ÞðUmn þ LYmn Þ7 1 1 2 5 þ 16 ðE m E m þ m m Þ3 þ 12
179
½51
½52
for functions u(r, , ), (r, , ), (r, , ), and then one attempts to satisfy the time-symmetric constraint R(QS ) = 0 on 3 .9 Calculating the scalar curvature for the metric in this form, one finds that the equation R(QS ) = 0 can be written as ðr@r @ @ Þu u2 u ¼ Qðu; ; ; r; ; Þ
½53
where Q is a polynomial in the positive function u. One can now show that if one specifies and everywhere on 3 (subject to an upper bound on the divergence of the vector field ( , )), and if one specifies regular initial data for u on the inner boundary of 3 , then one has a well-posed initialvalue problem (in terms of the ‘‘evolution’’ coordinate r) for the parabolic PDE [53]. Ideally, one can use this approach to extend solutions of the timesymmetric constraints from an isolated region (corresponding to B3 ) out to spatial infinity. The basic quasispherical ansatz approach just outlined can be generalized significantly (Sharples 2001, Bartnik and Isenberg 2004) to allow for more general spatial metrics, and to allow nonzero Kab , Ac , and Eb . It has been an especially valuable tool for the study of mass in asymptotically Euclidean data sets. It does not, however, purport to construct general solutions of the constraint equations. 9 This version of the constraints is called ‘‘time symmetric’’ since one is solving the full set of constraints with Kab assumed to be zero. Data with Kab = 0 is time symmetric.
180 Einstein Equations: Initial Value Formulation Gluing Solutions of the Constraint Equations
Starting around the year 2000, a number of new ‘‘gluing’’ procedures have been developed for constructing and studying solutions of the constraint equations. Unlike the conformal method, the CTS method, and the quasispherical ansatz, all of which construct solutions from scratch, the gluing procedures construct new solutions from given ones. This feature, and the considerable flexibility of the procedures, has resulted in a wealth of applications already in the short five-year history of gluing in general relativity. One of the gluing approaches, developed by Corvino (2000) and Corvino and Schoen (preprint) (see also Chrus´ciel and Delay (2002)), allows one to choose a compact region in almost any smooth, asymptotically Euclidean vacuum solution of the constraints, and from this produce a new smooth solution which is completely unchanged in the region and is identical to Schwarzschild or Kerr outside some larger region. In proving this result, one exploits the underdetermined character of the constraint equations: such a construction could not be carried out if the constraints were a determined PDE system.10 The other main gluing approach, developed first by Isenberg et al. (2001), and then further developed with Chrus´ciel (Chrus´ciel et al. 2005) and with Maxwell (Isenberg et al. 2005), starts with a pair of solutions of the (vacuum) constraints (31 , 1 , K1 ) and (32 , 2 , K2 ) together with a choice of a pair of points p1 2 31 , p2 2 32 , one from each solution. From these solutions, this gluing procedure produces a new set of initial data (3(12) , (12) , K(12) ) with the following properties: (1) (12) is diffeomorphic to the connected sum 31 #32 ; (2) (3(12) , (12) , K(12) ) is a solution of the constraints everywhere on 3(12) ; (3) On that portion of 3(12) which corresponds to 31 n{ball around p1 }, the data ((12) , K(12) ) is isomorphic to (1 , K1 ), with a corresponding property holding on that portion of 32 which corresponds to 32 n{ball around p2 } (see Figure 3).11 This connected sum gluing can be carried out for very general sets of initial data. The sets can be asymptotically Euclidean, asymptotically hyperbolic, specified on a closed manifold, or indeed anything 10
Hence if one tries to do Corvino Schoen-type gluing using a fixed conformal geometry, the gluing fails because the determined elliptic system satisfies the unique continuation property. 11 The connected sum of the two manifolds (see property (1)) is constructed as follows: first we remove a ball from each of the manifolds 31 and 32 . We then use a cylindrical bridge S2 I (where I is an interval in R1 ) to connect the resulting S2 boundaries on each manifold
3
{Σ 1, γ1, K1}
{Σ23, γ2, K2} p2
p1
{Σ(13 – 2), γ(1 – 2), K(1 – 2)} Figure 3 Connected sum gluing.
else. The only condition that the data sets must satisfy is that, in sufficiently small neighborhoods of each of the points at which the gluing is to be done, there do not exist nontrivial solutions to the equation D (, K) = 0, where D (, K) is the operator obtained by taking the adjoint of the linearized constraint operator.12 In work by Beig, Chrus´ciel, and Schoen, it is shown that this condition (sometimes referred to as ‘‘No KIDs,’’ meaning ‘‘no (localized) Killing initial data)’’ is indeed generically satisfied. While a discussion of the proof that connected sum gluing can be carried out to this degree of generality is beyond the scope of this paper (see Chrus´ciel et al. (2005), along with references cited therein for details of the proof), we note three features of it: first, the proof is constructive in the sense that it outlines a systematic, step-by-step mathematical procedure for doing the gluing. In principle, one should be able to carry out the gluing procedure numerically. Second, connected sum gluing relies primarily on the conformal method, but it also uses a nonconformal deformation at the end (dependent on the techniques of Corvino and Schoen, and of Chrus´ciel and Delay), so as to guarantee that the glued data is not just very close to the given data on regions away from the bridge, but is indeed identical to it. Third, while Corvino– Schoen gluing has not yet been proved to work for solutions of the constraints with source fields, connected sum gluing (up to the last step, which relies on Corvino–Schoen) has been shown to work for most matter source fields of interest (Isenberg et al.). It has also been shown to work for general dimensions greater than or equal to three. 12
When a solution to this equation does exist on some region 2 3 , it follows from the work of Moncrief that the spacetime development of the data on admits a nontrivial isometry.
Einstein Equations: Initial Value Formulation
While gluing is not an efficient tool for studying the complete set of solutions to the constraints, it has proved to be very valuable for a number of applications. We note a few here. 1. Spacetimes with regular asymptotic structure. Until recently, it was not known whether there is a large class of solutions which admit the conformal compactification and consequent asymptotically simple structure at null and spacelike infinity characteristic of the Minkowski and Schwarzschild spacetimes. Using Corvino–Schoen gluing, together with Friedrich’s analyses of spacetime asymptotic structures and an argument of Chrus´ciel and Delay (2002), one produces such a class of solutions. 2. Multi-black hole data sets. Given an asymptotically Euclidean solution of the constraints, connected sum gluing allows a sequence of (almost) flat space initial data sets to be glued to it. The bridges that result from this gluing each contain a minimal surface, and consequently an apparent horizon. With a bit of care, one can do this in such a way that indeed the event horizons which appear in the development of this glued data are disjoint, and therefore indicative of independent black holes. 3. Adding a black hole to a cosmological spacetime. Although there is no clear established definition for a black hole in a spatially compact solution of Einstein’s equations, one can glue an asymptotically Euclidean solution of the constraints to a solution on a compact manifold, in such a way that there is an apparent horizon on the bridge. Studying the nature of these solutions of the constraints, and their evolution, could be useful in trying to understand what one might mean by a black hole in a cosmological spacetime. 4. Adding a wormhole to your spacetime. While we have discussed connected sum gluing as a procedure which builds solutions of the constraints with a bridge connecting two points on different manifolds, it can also be used to build a solution with a bridge connecting a pair of points on the same manifold. This allows one to do the following: if one has a globally hyperbolic spacetime solution of Einstein’s equations, one can choose a Cauchy surface for that solution, choose a pair of points on that Cauchy surface, and glue the solution to itself via a bridge from one of these points to the other. If one now evolves this glued-together initial data into a spacetime, it will likely become singular very quickly because of the collapse of the bridge. Until the singularity develops, however, the solution is essentially as it was before the gluing,
181
with the addition of an effective wormhole. Hence, this procedure can be used to glue a wormhole onto a generic spacetime solution. 5. Removing topological obstructions for constraint solutions. We know that every closed threedimensional manifold M3 admits a solution of the vacuum constraint equations. To show this, we use the fact that M3 always admits a metric of constant negative scalar curvature. One easily verifies that the data ( = , K = ) is a CMC solution. Combining this result with connected sum gluing, one can show that for every closed 3 , the manifold 3 n {p} admits both an asymptotically Euclidean and an asymptotically hyperbolic solution of the vacuum constraint equations. 6. Proving the existence of vacuum solutions on closed manifolds with no CMC Cauchy surface. Based on the work of Bartnik (1988) one can show that if one has a set of initial data on the manifold T 3 #T 3 with the metric components symmetric across a central sphere and the components of K skew symmetric across that same central sphere, then the spacetime development of that data does not admit a CMC Cauchy surface. Using connected sum gluing, one can show that indeed initial data sets of this sort exist (Chrus´ciel et al. 2005).
Conclusion Much is known about the Einstein constraint equations and those sets of initial data which satisfy them. We know how to use the conformal method or the CTS approach to construct (and parametrize in terms of free data) the CMC and near CMC sets of data which solve the constraints, with or without matter fields present. We know how to use the quasispherical approach to explore extensions of solutions of the constraint equations from compact regions. We know how to use gluing techniques to produce new solutions of both physical and mathematical interest from old ones, and we know how to use gluing as a tool for proving such results as the existence of vacuum spacetimes with no CMC Cauchy surfaces. There is much that is not yet known as well. Very little is known about solutions of the constraint equations which have neither CMC nor near CMC. It is not known how to systematically extend solutions of the constraints from a compact region to all of R3 in such a way that the extension is asymptotically Euclidean (unless we know a priori that such an extension exists). Very little is known regarding how to control the constraints during the course of
182 Einstein Manifolds
numerical evolution of solutions.13 Most importantly, we do not yet know how to systematically find solutions of the constraint equations which serve as physically realistic model initial data sets for studying astrophysical and cosmological systems of interest. Many of these questions concerning the Einstein constraints and their solutions are fairly daunting. However, in view of the rapid progress in our understanding during the last few years, and in view of the pressing need to further develop the initialvalue formulation as a tool for studying general relativity and gravitational physics, we are optimistic that this progress will continue, and we will soon have answers to a number of these questions.
Acknowledgments The work of J Isenberg is supported by the NSF under Grant PHY-035 4659. See also: Asymptotic Structure and Conformal Infinity; Computational Methods in General Relativity: The Theory; Dirac Fields in Gravitation and Nonabelian Gauge Theory; Einstein Manifolds; Einstein’s Equations with Matter; General Relativity: Overview; Geometric Analysis and General Relativity; Hamiltonian Reduction of Einstein’s Equations; Spacetime Topology, Causal Structure and Singularities; Stationary Black Holes; Symmetric Hyperbolic Systems and Shock Waves.
Further Reading Aubin T (1998) Some Nonlinear Problems in Riemannian Geomertry. Springer. Bartnik R (1988) Remarks on cosmological spacetimes and constant mean curvature surfaces. Communications in Mathematical Physics 117: 615–624.
13
If the constraints are satisfied by an initial data set and if this data set is evolved completely accurately, then the constraints remain satisfied for all time. However, during the course of a numerical evolution, there are inevitable numerical inaccuracies which result in the constraints not being exactly zero. In practice, during the majority of such numerical simulations to date, the constraints have been seen to increase very rapidly in time, calling into question the reliability of the simulation.
Bartnik R (1993) Quasi-spherical metrics and prescribed scalar curvature. Journal of Differential Geometry 37: 31–71. Bartnik R and Isenberg J (2004) The constraint equations. In: Chrus´ciel PT and Friedrich H (eds.) The Einstein Equations and the Large Scale Behavior of Gravitational Fields, pp. 1–39. Basel: Birkha¨user. Choquet-Bruhat Y General Relativity (to be published). Choquet-Bruhat Y and York J (1980) The Cauchy problem. In: Held A (ed.) General Relativity and Gravitation – The Einstein Centenary, pp. 99–160. Plenum. Chrus´ciel P and Delay E (2002) Existence of non-trivial, vacuum, asymptotically simple spacetimes. Classical and Quantum Gravity 19: L71–L79. Chrus´ciel P and Delay E (2003) On mapping properties of the general relativistic constraints operator in weighted function spaces, with applications. Me´moires de la Socie´te´ Mathe´matique de France 93: 1–103. Chrus´ciel P, Isenberg J, and Pollack D (2005) Initial data engineering. Communications in Mathematical Physics 257: 29–42 (gr-qc/0403066). Corvino J (2000) Scalar curvature deformation and a gluing construction for the Einstein constraint equations. Communications in Mathematical Physics 214: 137–189. Corvino J and Schoen R On the asymptotics for the vacuum constraint equations. Preprint, gr-qc/0301071 (to appear Journal of Differential Geometry). Foures-Bruhat Y (1952) The´ore`me d’existence pour certains syste`mes d’e´quations aux de´rive´es partielles non line´aires. Acta Mathematicae 88: 141–225. Isenberg J (1995) Constant mean curvature solutions of the Einstein constraint equations on closed manifolds. Classical and Quantum Gravity 12: 2249–2274. Isenberg J, Maxwell D, and Pollack D, A gluing construction for nonvacuum solutions of the Einstein constraint equations. Advances in Theoretical and Mathmatical Physics (to be published). Isenberg J, Mazzeo R, and Pollack D (2001) Gluing and wormholes for the Einstein constraint equations. Communications in Mathematical Physics 231: 529–568. Maxwell D (2005) Rough solutions of the Einstein constraints on compact manifolds. Journal of Hyperbolic Differential Equations 2: 521–546. Misner C, Thorne K, and Wheeler JA (1973) Gravitation. Chicago: Freeman. Schoen R (1984) Conformal deformation of a Riemannian metric to constant scalar curvature. Journal of Differential Geometry 20: 479–495. Sharples J (2001) Spacetime Initial Data and Quasi-Spherical Coordinates. Ph.D. thesis, University of Canberra. York JW (1999) Conformal ‘‘Thin-Sandwich’’ data for the initialvalue problem of general relativity. Physical Review Letters 82: 1350–1353.
Einstein Manifolds A S Dancer, University of Oxford, Oxford, UK ª 2006 Elsevier Ltd. All rights reserved.
Introduction The Einstein condition on a manifold M with metric g says that the Ricci curvature should be proportional to the metric. Of course, this condition
originally appeared in relativity, but it is of tremendous interest from the point of view of pure mathematics. Demanding a metric of constant sectional curvature is a very strong condition, while metrics of constant scalar curvature always occur. The Einstein property, which is essentially a constant-Ricci-curvature condition, occupies an intermediate position between these conditions, and it is still not clear exactly how strong it is. In
Einstein Manifolds
dimensions higher than four, it is still unknown whether there are obstructions to a manifold admitting an Einstein metric. The study of Einstein manifolds is a vast and rapidly expanding area, and this article can merely touch on some points of particular interest. The focus of the article is very much on the Riemannian rather than Lorentzian case (see, e.g., Hawking and Ellis (1973) or the articles by Christodoulou and Tod in LeBrun and Wang (1999) for a discussion of the Lorentzian case in general relativity). For further reading, the books of Besse (1987) and LeBrun and Wang (1999) are strongly recommended.
183
hyperbolic space, and quotients of these by discrete groups of isometries. Remark 3 As noticed by Hilbert, the Einstein equations admit a variational interpretation. They are the variational equations for the total scalar curvature functional Z g 7! sg dg M
restricted to the space of volume 1 metrics (here dg denotes the volume form defined by g).
Obstructions Basic Properties Let (M, g) be a (pseudo)-Riemannian manifold. There is a unique connection r, the Levi-Civita connection of g, with the following properties: 1. the torsion vanishes and 2. rg = 0
T(X, Y) = rX Y rY X [X, Y]
We can now form the Riemann curvature tensor of g: RðX; YÞZ ¼ rX rY Z rY rX Z r½X;Y Z This is a type (3,1) tensor. There is one nontrivial contraction we can perform to obtain a (2, 0) tensor, that is, the Ricci curvature RicðX; YÞ ¼ trðZ 7! RðX; ZÞYÞ We may perform a further contraction and obtain the scalar curvature s = trg Ric. The Ricci curvature is a symmetric tensor of the same type as the metric, so we can make the following definition: Definition 1
A metric g is Einstein if Ric ¼ g
½1
for some constant . In this article, we shall take g to be a Riemannian (positive-definite) metric. Remark 1 In dimension higher than 2, we do not have to put in the assumption that is constant by hand. For, taking the divergence of [1] gives (1=2)ds = d, while taking instead the trace gives s = n, so if n 6¼ 2, we see d = 0. Remark 2 In dimension 2 and 3, the Einstein condition is equivalent to constant curvature. The only complete Einstein manifolds in these dimensions are therefore the model spaces Sn , Rn and
The most fundamental question, we can ask is: Given a smooth manifold M does it support an Einstein metric? One is also interested in the question of uniqueness of such a metric, or more generally of describing the moduli space of such metrics. In this section we discuss obstructions to existence. In dimension 2 Remark 2 shows that any compact manifold admits an Einstein metric, while in dimension 3 the only possibilities are space forms. In particular, there is no Einstein metric on S1 S2 . The picture is much less clear in higher dimensions. If 0, one obtains some elementary obstructions just by considering the sign of the Ricci curvature: 1. If M supports a complete Einstein metric with > 0, then by Myers’s theorem M is compact and 1 (M) is finite. Also there are obstructions coming from the positivity of the scalar curvature (e.g., if M is spin and 4m-dimensional, then the Aˆ genus vanishes). 2. If M supports a complete Ricci-flat metric, then every finitely generated subgroup of 1 (M) has polynomial growth. However, if dim M 5, there is, at the time of writing, no known obstruction to M supporting an Einstein metric of negative Einstein constant. In the borderline dimension 4, Hitchin and Thorpe observed that the Einstein condition put topological constraints on the manifold. for, we have the following expressions for the Euler characteristic and signature in terms of the curvature tensor: Z 1 jWþj2 jWj2 dg 122 M Z 1 s2 dg ¼ 2 jWþj2 þ jWj2 jRic0j2 þ 8 M 24 ¼
184 Einstein Manifolds
where Wþ and W are the self-dual and antiself-dual parts of the Weyl tensor, s is the scalar curvature, and Ric0 is the trace-free part of the Ricci tensor. The Einstein condition is just Ric0 = 0, so we immediately obtain the following inequality. Theorem (Hitchin 1974). A compact fourdimensional Einstein manifold satisfies the inequality j j 23 Note that equality is obtained if and only if g is Ricci-flat and (anti)-self-dual, which is equivalent to locally hyper-Ka¨hler for some orientation. The only examples are the flat torus, the K3 surface with the Yau metric (now = 16 and = 24), and two quotients of K3. Since the mid-1990s, LeBrun (2003) has obtained a series of results which sharpen the Hitchin–Thorpe inequality by obtaining estimates on the Weyl and scalar curvature terms. These estimates are obtained by using Seiberg–Witten theory, the general theme being that nonemptiness of the Seiberg–Witten moduli space gives lower bounds on the curvature terms. LeBrun shows there are infinitely many compact smooth simply connected 4-manifolds that satisfy the Hitchin–Thorpe inequality but nonetheless do not admit Einstein metrics.
Uniqueness and Moduli In Yang–Mills theory, there is a highly developed theory of moduli spaces of instantons, including formulas for the dimension. The situation for Einstein metrics is far less well understood. The relevant moduli space here is the set of Einstein metrics modulo the action of the diffeomorphism group, but there are very few manifolds for which the moduli space has been determined. In dimension 2, of course, this is essentially the subject of the Teichmuller theory. One example where the moduli space is understood is the K3 surface. As explained above, the Hitchin–Thorpe argument shows that any Einstein metric is hyper-Ka¨hler, and the moduli space of such structures on K3 is understood as an open set in a certain noncompact symmetric space. Some uniqueness results have been obtained in four dimensions. LeBrun used Seiberg–Witten techniques to show that the Einstein metric on a compact quotient of the complex hyperbolic plane CH 2 is unique up to homotheties and diffeomorphisms. The analogous result for compact quotients of real hyperbolic 4-space was obtained using entropy methods by Besson, Courtois, and Gallot. It is still
unknown, however, whether nonstandard Einstein metrics can exist on S4 . In higher dimensions, very little is known. One can, by analogy with the theory of instantons, consider the linearization of the Einstein equations together with a further linear equation expressing orthogonality to the orbits of the diffeomorphism group. This gives a notion of formal tangent space to the Einstein moduli space. However, Koiso has shown that formal tangent vectors need not integrate to a curve of Einstein metrics. The structure of the moduli space (dimension, possible singularities) remains quite mysterious in general. It is known from the Wang– Ziller torus bundle examples that the moduli space can have infinitely many components.
Special Holonomy Berger classified the possible holonomy groups of simply connected, irreducible, nonsymmetric ndimensional Riemannian manifolds. The generic case is that of holonomy SO(n), and there are six other possibilities, each of which corresponds to some special geometry. Interestingly, four of these are automatically Ricci-flat, while a fifth is Einstein with 6¼ 0. The remaining example, that of Ka¨hler geometry, is not automatically Einstein, but the Einstein equations with the additional Ka¨hler assumption reduce to a scalar Monge–Ampere equation and are therefore simpler than the general Einstein system. For further reading in this section, see the articles by Boyer–Galicki, Joyce, Salamon, Tian, Yau and the author in part I of LeBrun and Wang (1999), and also the book of Joyce (2000). For the Ka¨hler case, see also Tian (2000). Ka¨hler Manifolds (Holonomy U(n/2), SU(n/2))
A Ka¨hler manifold (M, g) admits a covariant constant complex structure I, and associated Ka¨hler 2-form ! defined by !(X, Y) = g(IX, Y). The Ricci form is defined by (X, Y) = Ric(IX, Y), so the Einstein condition for a Ka¨hler manifold becomes ¼ ! On a Ka¨hler manifold, is the curvature of the canonical bundle, so [=2] is a representative for the cohomology class c1 (M). We see that a necessary condition for a complex manifold (M, I) to admit a Ka¨hler–Einstein metric is that c1 has a definite sign. We consider, in turn, the three cases:
Einstein Manifolds
c1 < 0 In this case, we have: Theorem (Aubin, Yau). Let (M, I) be a compact complex manifold with c1 < 0. Then (M, I) admits a Ka¨hler–Einstein metric with < 0. The metric is unique up to homothety. c1 = 0 This is a special case of the Calabi conjecture, proved by Yau. Theorem (Yau). Let M be a compact Ka¨hler manifold with Ka¨hler form !. For any closed real form of type (1, 1) with [=2] = c1 (M), there exists a unique Ka¨hler metric with Ka¨hler form cohomologous to ! and Ricci form equal to . In particular, if M is a compact Ka¨hler manifold with c1 = 0, there exists a Ricci-flat Ka¨hler metric on M. Ricci-flat Ka¨hler metrics are called Calabi–Yau metrics, and are exactly the metrics with holonomy in SU(n=2). They admit two parallel spinors and are of great interest to string theorists, because in some string theories spacetime is expected to be a product of the four-dimensional macroscopic factor with a compact Calabi–Yau manifold of complex dimension 3. Yau’s theorem provides many examples of Calabi–Yau spaces. For example, we can take a nonsingular complex submanifold defined as a complete intersection by the vanishing of r polynomials of degree d1 , . . . , dr in CPn . Now, M has complex P dimension n r and c1 = 0 if and only if n þ 1 = ri = 1 di . We obtain examples of complex dimension 2 by considering a quartic in CP3 , the intersection of a quadric and a cubic in CP4 , or the intersection of three quadrics in CP5 ; these all give examples of K3 surfaces. A famous example of a Calabi–Yau manifold of complex dimension 3 is given by the quintic in CP4 . This technique can be extended, for example, by considering complete intersections in weighted projective space or constructing Calabi–Yau desingularizations of singular spaces. c1 > 0 This case is the most complicated and, at the time of writing, is not yet fully understood. It is known that not every compact manifold with c1 > 0 supports a Ka¨hler–Einstein metric. An early result of Matsushima was that the identity component of the automorphism group of a Ka¨hler– Einstein space with c1 > 0 must be reductive. This shows, for example, that the blow-up of CP2 at
185
one or two points does not admit a Ka¨hler–Einstein metric, despite having c1 > 0. (The one-point blow-up does admit a Hermitian–Einstein metric due to Page.) A second obstruction is the Futaki invariant, a character of the Lie algebra of the automorphism group. This character vanishes if there is a Ka¨hler– Einstein metric. Both the above obstructions depend on having a nontrivial algebra of holomorphic automorphisms of M. More recently, Tian has discovered further obstructions (in complex dimension 3 or higher) which can be present even if the automorphism algebra is trivial. However, for compact complex surfaces with c1 > 0, Tian has proved that vanishing of the Futaki invariant is sufficient. In particular, the blow-up of CP2 at k points in general position, where 3 k 8, admits a Ka¨hler–Einstein metric (note that c21 = 9 k so if k > 8 then c1 is no longer definite). LeBrun–Catanese and Kotschick used these results to give an example of a topological 4-manifold carrying Einstein metrics of different signs. A deformation of the Barlow surface (a surface of general type) has c1 < 0 and hence carries an Einstein metric with < 0. But this space is homeomorphic (though not diffeomorphic) to the blowup of CP2 at eight points, which carries an Einstein metric with > 0. One may use this example to construct higher-dimensional examples of diffeomorphic manifolds carrying Einstein metrics of opposite sign. Hyper-Ka¨hler Manifolds (Holonomy Sp(n/4))
These are always Ricci-flat. They have a triple (I, J, K) of covariant constant complex structures, satisfying the quaternionic multiplication relations IJ = K = JI, etc., and defining Ka¨hler forms !I , !J , !K . Hyper-Ka¨hler manifolds of dimension n = 4N have N þ 1 parallel spinors. The most effective way of producing complete hyper-Ka¨hler metrics has been the hyper-Ka¨hler quotient construction (Hitchin et al. 1987), which was motivated by the Marsden–Weinstein quotient in symplectic geometry. Let G be a group acting freely on a hyper-Ka¨hler manifold (M, g, I, J, K) preserving the hyper-Ka¨hler structure. Subject to mild assumptions, we obtain a G-equivariant moment map : M ! g R3 , satisfying dX ðYÞ ¼ ð!I ðX; YÞ; !J ðX; YÞ; !K ðX; YÞÞ Now the quotient 1 (0)=G is a hyper-Ka¨hler manifold of dimension dim M 4 dim G.
186 Einstein Manifolds
The power of this construction comes from the fact that even if M is just flat quaternionic space, one can obtain highly nontrivial quotients by suitable choice of group G (e.g., the asymptotically locally Euclidean four-dimensional examples of Kronheimer, which include as a subcase the multiinstanton metrics of Gibbons and Hawking). Many examples of interest in mathematical physics may be obtained by taking hyper-Ka¨hler quotients of an infinite-dimensional space of connections and Higgs fields (Hitchin 1987). Examples include moduli spaces of instantons over a hyperKa¨hler base, moduli spaces of monopoles on R 3 , and moduli spaces of Higgs pairs over a Riemann surface. The hyper-Ka¨hler manifolds produced so far by the quotient construction have all been noncompact. Examples of compact hyper-Ka¨hler manifolds are rarer but some are known. Beauville has produced examples in all dimensions as desingularizations of symmetric products of the basic four-dimensional compact examples (K3 and the 4-torus). Further material for this section may be found, for example, in Hitchin (1992) and in the chapter by the author on hyper-Ka¨hler manifolds in LeBrun and Wang (1999). Quaternionic Ka¨hler Manifolds (Holonomy Sp(n/4)) Sp(1))
These are always Einstein with nonzero Einstein constant. Instead of globally defined parallel complex structures as in the hyper-Ka¨hler case, we have a sub-bundle G of End(TM) with fiber isomorphic to the imaginary quaternions, parallel with respect to the Levi-Civita connection. Thus, we have locally defined almost-complex structures I, J, K, satisfying the quaternionic multiplication relations, such that covariant differentiation of one of I, J, K gives a linear combination of the other two. In particular, note that quaternionic Ka¨hler manifolds are not Ka¨hler. If the Einstein constant is positive, the only known complete examples are symmetric, the socalled compact Wolf spaces, which are in one-to-one correspondence with the compact simple Lie groups. It is conjectured that these are the only examples with > 0, and some results in this direction have been established (e.g., it is known if dim M 12). It is also known that for fixed dimension, there are only finitely many types of compact quaternionic Ka¨hler manifold with > 0. Many orbifold examples, however, are known to exist, for example, via the Galicki–Lawson quaternionic Ka¨hler quotient construction.
If < 0, more complete examples are known. In addition to the noncompact duals of the Wolf spaces, there are homogeneous, nonsymmetric examples due to Alekseevski, and infinite-dimensional families of inhomogeneous examples constructed via twistor methods by LeBrun (see also Biquard (2000)). Exceptional Holonomy (G2 or Spin(7))
Such metrics exist in dimension 7 or 8, respectively. They are always Ricci-flat and admit a parallel spinor. Local examples were constructed by Bryant using Cartan–Ka¨hler theory, and some explicit complete noncompact examples were produced by Salamon and Bryant using a cohomogeneity-1 construction. More complicated explicit noncompact examples have recently been produced by several authors (see Cveticˇ et al. (2003) for a survey). Compact examples were produced using analytical methods by Joyce, and later by Kovalev. Joyce starts with a flat singular metric on quotients of the seven- or eight-dimensional torus and constructs an approximate solution to the special holonomy condition on a resolution of this singular space. Then an analytic argument is used to show that an exact nearby solution exists. For further reading, consult Joyce (2000) as well as the article by Joyce in LeBrun and Wang (1999). There are also some interesting examples of Einstein metrics which, although not of special holonomy themselves, are closely related to special holonomy geometries. In recent years, these have yielded many new examples of compact Einstein manifolds in the work of Boyer, Mann, Galicki, Kollar, Rees, Piccinni, and Nakamaye. Einstein–Sasaki Structures
There are several different ways of defining these, but the simplest is to say that (M, g) is Einstein– Sasaki if the cone (R M, dt2 þ t2 g) is Ricci-flat Ka¨hler. Also, an Einstein–Sasaki manifold has a circle action with quotient a Ka¨hler–Einstein orbifold. Existence theorems for such orbifold metrics have led to many examples of Einstein–Sasaki metrics, including families on odd-dimensional spheres. 3-Sasakian Structures
Again, we can define these in terms of cones; (M, g) has a 3-Sasakian structure if the cone over it is hyper-Ka¨hler. The basic example is S4nþ3 with associated cone Hn {0}. A 3-Sasakian manifold is always Einstein with positive Einstein constant.
Einstein Manifolds
The hyper-Ka¨hler quotient construction induces a 3-Sasakian quotient, and many examples of compact 3-Sasakian manifolds have been produced as 3-Sasakian quotients of S4nþ3 . In particular, there are examples in dimension 7 with arbitrarily large second Betti number, showing that one cannot, in general, expect compactness/finiteness results for Einstein moduli spaces without further assumptions.
Homogeneous Examples Another strategy to study the Einstein equations is to reduce the difficulty of the problem by imposing symmetries. More precisely, we consider Einstein manifolds (M, g) with an isometric action of a Lie group G. In general, the Einstein equations with this symmetry will now involve r independent variables where r is the dimension of the stratified space M=G. We call r the cohomogeneity of the manifold. In this section, we consider the situation where (M, g) is homogeneous, that is, when the action of G is transitive so r = 0. The Einstein equations now reduce to a system of algebraic equations. We may now write M = G=K, where K is the stabilizer of a point of M. We choose an AdK invariant vector space complement p to k in g, and identify p with the tangent space to G=K at the identity coset. The key point is that G-invariant metrics on M = G=K may now be identified with AdK -invariant inner products on p, which may, in turn, be studied by looking at the decomposition of p into irreducible representations of K. In the special case when G=K is isotropy irreducible (i.e., p is an irreducible representation of K), both the metric g and its Ricci tensor are proportional by Schur’s lemma, and hence g is automatically Einstein. Isotropy-irreducible homogeneous spaces have been classified by Kramer, Manturov, Wolf, and Wang–Ziller. In the general case, the Einstein equations become a system of polynomial equations. Determining whether this system has a real positive solution is, in general, a highly nontrivial problem. However, the situation of homogeneous metrics is one area in which the variational formulation of the Einstein equations has proved highly successful. We are now considering the scalar curvature functional on the finite-dimensional space of unit G-invariant metrics on G=K. The behavior of the scalar curvature functional is related to the structure of the lattice of intermediate subalgebras between the Lie algebras of K and G. An early result along these lines (Wang and Ziller 1986) is that if K is maximal in G (compact), then G=K admits a G-invariant Einstein metric. The idea
187
of the proof is to show that maximality of K forces the scalar curvature functional on the space of volume-1 homogeneous metrics to be both bounded above and proper, and therefore to have a maximum. These ideas have been greatly extended by Bo¨hm, Wang, and Ziller. Given a compact connected homogeneous space G=K, they define a graph whose vertices are Ad(K)-invariant subalgebras strictly intermediate between g and k. The edges correspond to inclusions between subalgebras. A component of the graph is called toral if all subalgebras h in this component are such that the identity component of H=K is abelian. They now show that if the graph has at least two nontoral components, then G=K admits a G-invariant Einstein metric. The Einstein metrics in the theorem are produced by a mountain pass argument and may have co-index 1, contrasting with the maxima of the earlier theorem. Further advances in this direction have recently been made by Bo¨hm. He associates to G=K a simplicial complex, and shows that nonzero homology groups of the complex imply the existence of higher co-index Einstein metrics. One can also study homogeneous noncompact Einstein spaces with < 0. It is conjectured by Alekseevski that for all such examples K is a maximal compact subgroup of G. The reader is referred to Heber (1998) for further information on the noncompact case. The above results give some powerful existence results for Einstein metrics. However, there are examples known of homogeneous spaces G=K which admit no G-invariant Einstein metric (Wang and Ziller 1986). One such example is SU(4)=SU(2), where SU(2) is a maximal subgroup of Sp(2) SU(4). Techniques similar to those in the homogeneous case have been used to construct Einstein metrics on total spaces of certain bundles, via Riemannian submersions. Some highlights are Jensen’s exotic Einstein metrics on (4n þ 3)-dimensional spheres, and the Wang–Ziller metrics on total spaces of torus bundles over products of Ka¨hler–Einstein manifolds. The latter construction gives examples of spaces admitting volume-1 Einstein metrics with infinitely many Einstein constants .
Examples of Higher Cohomogeneity One can also look for Einstein metrics of higher cohomogeneity. Most progress has been made in the cohomogeneity-1 case, that is, where the principal orbit G=K of the action has real codimension one in
188 Einstein Manifolds
M (see Eschenburg and Wang (2000) for background on such metrics). On the open dense set in M which is the union of the principal orbits, we may write the metric as dt2 þ gt where gt is a t-dependent homogeneous metric on G=K. The Einstein equations are now a system of ordinary differential equations in t. One may also add a special orbit G=H at one or both ends of the interval over which t ranges. This will impose boundary conditions on the ODEs. For the manifold structure to extend smoothly over the special orbit, H=K must be a sphere. Notice that if > 0, then to obtain a complete metric M must be compact, so we must add two special orbits. If 0 and the metric is irreducible, then a Bochner argument tells us that M is noncompact. In the Ricciflat case, the Cheeger–Gromoll theorem tells us that to obtain a complete irreducible metric, we must have exactly one special orbit, so M is topologically the total space of a vector bundle over the special orbit. In fact, most of the known examples even with < 0 have a special orbit too. The system of ODEs we obtain is still highly nonlinear and difficult to analyze in general. However, there are certain situations in which the equations, or a subsystem, can be solved in closed form. If we take G=K to be a principal circle bundle over a Hermitian symmetric space, Be´rard Bergery (1982) showed that the resulting Einstein equations are solvable. (His work was inspired by the earlier example of Page, which corresponds to the case when G=K = U(2)=U(1), a circle bundle over CP1 .) In fact, Be´rard Bergery’s construction works in greater generality as we obtain the same equations if G=K is replaced by any Riemannian submersion with circle fibers over a positive Ka¨hler–Einstein space. This illustrates a general principle that systems arising as cohomogeneity-1 Einstein equations also typically arise from certain bundle ansa¨tze without homogeneity assumptions. Wang and Wang generalized this construction to be the case when the hypersurface in M is a Riemannian submersion with circle fibers over a product of an arbitrary number of Ka¨hler–Einstein factors. Other solvable Einstein systems have been studied by, for example, Wang and Dancer. It may also be possible in certain situations to get existence results without an explicit solution. This observation underlies the important work of Bo¨hm (1998). He constructs cohomogeneity-1 Einstein metrics on certain manifolds with dimension between 5 and 9, including all the spheres in this
range of dimensions. The equations are not now solved in closed form, but it is possible to get a qualitative understanding of the flow and to show that certain trajectories will give metrics on the desired compact manifolds. Bo¨hm has also shown, in an analogous result to the homogeneous case, that there are examples of manifolds with a cohomogeneity-1 G-action which do not support any G-invariant Einstein metric. So far, not much is known about Einstein metrics of higher cohomogeneity. An exception is the situation of self-dual Einstein metrics in dimension 4, where the self-dual condition greatly simplifies the resulting equations. Calderbank, Pedersen, and Singer have achieved a good understanding of such metrics with T 2 symmetry, including construction of such metrics on Hirzebruch–Jung resolutions of cyclic quotient singularities.
Analytical Methods So far there is no really general analytical method for proving existence of global Riemannian Einstein metrics (although, of course, such techniques do exist in more restrictive situations of special holonomy). Although the Einstein equations admit a variational formulation, this has (except for homogeneous metrics) not yielded general existence results. Note that the Wang–Ziller torus bundle examples at the end of the section ‘‘Homogeneous examples’’ show that the Palais–Smale condition does not hold in full generality. One early suggestion was to adopt a minimax procedure. In each conformal class [g], one looks for a minimizer of the volume-normalized scalar curvature. Such a minimizer always exists. One then takes the supremum over all conformal classes. The resulting supremum of the functional is called the Yamabe invariant Y(M) of the manifold M. If a maximizer g exists, and Y(M) 0, then g is Einstein. However, striking work of Petean shows that this procedure must fail to produce an Einstein metric in many cases. He proves that if dim M 5 and M is simply connected, then the Yamabe invariant is nonnegative. So, for such an M, any Einstein metric produced will have 0, and we know that this puts constraints on the topology of M. Another possible technique is to use the Hamilton Ricci flow. If this converges as t ! 1, the limiting metric is Einstein. However, it seems hard in higher dimensions to get control over the flow. In particular, the Wang–Ziller example in the section ‘‘Homogeneous examples’’ of a homogeneous space with no invariant Einstein metric shows that the flow may fail to converge (the Hamilton flow preserves the property of G-invariance).
Einstein–Cartan Theory
Graham–Lee and Biquard have used analytical methods to produce Einstein deformations of hyperbolic space (real, complex, quaternionic, or Cayley). The idea is to show that a sufficiently small deformation of the conformal infinity of hyperbolic space can be extended to a deformation of the hyperbolic metric. Recently, Anderson has shown the existence of Einstein metrics with < 0 on a large class of manifolds obtained by Dehn filling from hyperbolic manifolds with toral ends. The strategy is to glue on to the hyperbolic metric copies of a simple explicit asymptotically hyperbolic metric, and to show that the resulting metric can be perturbed to an exact solution of the Einstein equations. See also: Einstein Equations: Exact Solutions; Einstein Equations: Initial Value Formulation; Hamiltonian Reduction of Einstein’s Equations; Several Complex Variables: Compact Manifolds; Singularities of the Ricci Flow.
Further Reading Be´rard Bergery L (1982) Sur des nouvelles varie´te´s riemanniennes d’Einstein. Publications de l’Institut Elie Cartan (Nancy), vol. 6, pp. 1–60. Besse A (1987) Einstein Manifolds. Berlin: Springer. Biquard O (2000) Me´triques d’Einstein asymptotiquement syme´triques. Aste´risque 265. Bo¨hm C (1998) Inhomogeneous Einstein manifolds on lowdimensional spheres and other low-dimensional spaces. Inventiones Mathematicae 134: 145–176. Cveticˇ M, Gibbons GW, Lu¨ H, and Pope CN (2003) Special holonomy spaces and M-theory. In: Unity from Duality,
189
Gravity, Gauge Theory and Strings (Les Houches, 2001), 523–545. NATO Advanced Study Institute, EDP Sci, Les Ulis. Eschenburg J and Wang M (2000) The initial value problem for cohomogeneity one Einstein metrics. Journal of Geometrical Analysis 10: 109–137. Hawking SW and Ellis GFR (1973) The Large-Scale Structure of Space-Time. Cambridge: Cambridge University Press. Heber J (1998) Noncompact homogeneous Einstein spaces. Inventiones Mathematicae 133: 279–352. Hitchin NJ (1974) Compact four-dimensional Einstein manifolds. Journal of Differential Geometry 9: 435–441. Hitchin NJ (1987) Monopoles, Minimal Surfaces and Algebraic Curves. Les presses de l’Universite´ de Montreal. Hitchin NJ (1992) Hyper-Ka¨hler manifolds. Asterisque 206: 137–166. Hitchin NJ, Karlhede A, Lindstro¨m U, and Rocˇek M (1987) Hyper-Ka¨hler metrics and supersymmetry. Communications in Mathematical Physics 108: 535–589. Joyce D (2000) Compact Manifolds with Special Holonomy. Oxford: Oxford University Press. LeBrun C (2003) Einstein metrics, four-manifolds and differential topology. In: Yau S-T (ed.) Surveys in Differential Geometry, vol. VIII, pp. 235–255. Somerville, MA: International Press. LeBrun C and Wang M (eds.) (1999) Essays on Einstein Manifolds, vol. VI, Surveys in Differential Geometry. Boston, MA: International Press. Page D (1979) A compact rotating gravitational instanton. Physics Letters 79B: 235–238. Tian G (2000) Canonical Metrics in Ka¨hler Geometry, Lectures in Mathematics, ETH Zurich. Birkha¨user. Wang M and Ziller W (1986) Existence and nonexistence of homogeneous Einstein metrics. Invent. Math 84: 177–194. Wang M and Ziller W (1990) Einstein metrics on principal torus bundles. Journal of Differential Geometry 31: 215–248. Yau S-T (1978) On the Ricci curvature of a compact Ka¨hler manifold and the complex Monge–Ampere equations I. Communications in Pure and Applied Mathematics 31: 339–411.
Einstein–Cartan Theory A Trautman, Warsaw University, Warsaw, Poland ª 2006 Elsevier Ltd. All rights reserved.
Introduction Notation
Standard notation and terminology of differential geometry and general relativity are used in this article. All considerations are local, so that the fourdimensional spacetime M is assumed to be a smooth manifold diffeomorphic to R 4. It is endowed with a metric tensor g of signature (1, 3) and a linear connection defining the covariant differentiation of tensor fields. Greek indices range from 0 to 3 and refer to spacetime. Given a field of frames (e ) on M, and the dual field of coframes ( ), one can write the metric tensor as g = g , where g = g(e , e )
and Einstein’s summation convention is assumed to hold. Tensor indices are lowered with g and raised with its inverse g. General-relativistic units are used, so that both Newton’s constant of gravitation and the speed of light are 1. This implies h = l2 , where l 1033 cm is the Planck length. Both mass and energy are measured in centimeters. Historical Remarks
The Einstein–Cartan theory (ECT) of gravity is a modification of general relativity theory (GRT), allowing spacetime to have torsion, in addition to curvature, and relating torsion to the density of intrinsic angular momentum. This modification was put forward in 1922 by E´lie Cartan, before the discovery of spin. Cartan was influenced by the work of the Cosserat brothers (1909), who considered besides an (asymmetric) force stress tensor also a moments stress tensor in a suitably
190 Einstein–Cartan Theory
generalized continuous medium. Work done in the 1950s by physicists (Kondo, Bilby, Kro¨ner, and other authors) established the role played by torsion in the continuum theory of crystal dislocations. A recent review (Ruggiero and Tartaglia 2003) describes the links between ECT and the classical theory of defects in an elastic medium. Cartan assumed the linear connection to be metric and derived, from a variational principle, a set of gravitational field equations. He required, without justification, that the covariant divergence of the energy–momentum tensor be zero; this led to an algebraic constraint equation, bilinear in curvature and torsion, severely restricting the geometry. This misguided observation has probably discouraged Cartan from pursuing his theory. It is now known that conservation laws in relativistic theories of gravitation follow from the Bianchi identities and, in the presence of torsion, the divergence of the energy–momentum tensor need not vanish. Torsion is implicit in the 1928 Einstein theory of gravitation with teleparallelism. For a long time, Cartan’s modified theory of gravity, presented in his rather abstruse notation, unfamiliar to physicists, did not attract any attention. In the late 1950s, the theory of gravitation with spin and torsion was independently rediscovered by Sciama and Kibble. The role of Cartan was recognized soon afterward and ECT became the subject of much research; see Hehl et al. (1976) for a review and an extensive bibliography. In the 1970s, it was recognized that ECT can be incorporated within supergravity. In fact, simple supergravity is equivalent to ECT with a massless, anticommuting Rarita–Schwinger field as the source. Choquet–Bruhat considered a generalization of ECT to higher dimensions and showed that the Cauchy problem for the coupled system of Einstein–Cartan and Dirac equations is well posed. Penrose (1982) has shown that torsion appears in a natural way when spinors are allowed to be rescaled by a complex conformal factor. ECT has been generalized by allowing nonmetric linear connections and additional currents, associated with dilation and shear, as sources of such a ‘‘metric-affine theory of gravity’’ (Hehl et al. 1995). Physical Motivation
Recall that, in special relativity theory (SRT), the underlying Minkowski spacetime admits, as its group of automorphisms, the full Poincare´ group, consisting of translations and Lorentz transformations. It follows from the first Noether theorem that classical, special-relativistic field equations, derived from a variational principle, give rise to
conservation laws of energy–momentum and angular momentum. Using Cartesian coordinates (x ), abbreviating @’=@x to ’, and denoting by t and s = s the tensors of energy–momentum and of intrinsic angular momentum (spin), respectively, one can write the conservation laws in the form t ; ¼ 0
½1
ðx t x t þ s Þ; ¼ 0
½2
and
In the presence of spin, the tensor t need not be symmetric, t t ¼ s ; Belinfante and Rosenfeld have shown that the tensor T ¼ t þ 12 ðs þ s þ s Þ; is symmetric and its divergence vanishes. In quantum theory, the irreducible, unitary representations of the Poincare´ group correspond to elementary systems such as stable particles; these representations are labeled by the mass and spin. In Einstein’s GRT, the spacetime M is curved; the Lorentz group – but not the Poincare´ group – appears as the structure group acting on orthonormal frames in the tangent spaces of M. The energy–momentum tensor T appearing on the right-hand side of the Einstein equation is necessarily symmetric. In GRT there is no room for translations and the tensors t and s. By introducing torsion and relating it to s, Cartan restored the role of the Poincare´ group in relativistic gravity: this group acts on the affine frames in the tangent spaces of M. Curvature and torsion are the surface densities of Lorentz transformations and translations, respectively. In a space with torsion, the Ricci tensor need not be symmetric so that an asymmetric energy–momentum tensor can appear on the right-hand side of the Einstein equation.
Geometric Preliminaries Tensor-Valued Differential Forms
It is convenient to follow Cartan in describing geometric objects as tensor-valued differential forms. To define them, consider a homomorphism : GL4 (R) ! GLN (R) and an element A = (A ) of End R4 , the Lie algebra of GL4 (R). The derived representation of Lie algebras is given by d ðexp AtÞjt¼0 ¼ A dt
Einstein–Cartan Theory
If (ea ) is a frame in RN , then (ea ) = b a eb , where a, b = 1, . . . ,N. A map a = (a ) : M ! GL4 (R) transforms fields of frames so that e0 ¼ e a
and
¼ a 0
½3
A differential form ’ on M, with values in RN , is said to be of type if, under changes of frames, it transforms so that ’0 = (a1 )’. For example, = ( ) is a 1-form of type id. If now A = (A ) : M ! End R 4 , then one puts a(t) = exp tA : M ! GL4 (R) and defines the variations induced by an infinitesimal change of frames, d ¼ ðaðtÞ1 Þjt¼0 ¼ A dt ’ ¼
d ððaðtÞ1 Þ’Þjt¼0 ¼ A ’ dt
191
so that the covariant derivative of e in the direction of e is r e = e . Under a change of frames [3], the connection forms transform as follows: a !0 ¼ ! a þ da
If ’ = ’a ea is a k-form of type , then its covariant exterior derivative b D’a ¼ d’a þ a b ! ^ ’
is a (k þ 1)-form of the same type. For a 0-form one has D’a = r ’a . The infinitesimal change of !, defined similarly as in [4], is ! = DA . The 2-form of curvature = ( ), where ¼ d! þ ! ^ !
½4
is of type ad: it transforms with the adjoint representation of GL4 (R) in End R4 . The 2-form of torsion = ( ), where ¼ d þ ! ^ is of type id. These forms satisfy the Bianchi identities
Hodge Duals
Since M is diffeomorphic to R4 , one can choose an orientation on M and restrict the frames to agree with that orientation so that only transformations with values in GLþ 4 (R) are allowed. The metric then defines the Hodge dual of differential forms. Put = g . The forms , , , , and are defined to be the duals of 1, , ^ , ^ ^ , and ^ ^ ^ , respectively. The 4-form is the volume element; for p a ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi holonomic coframe = dx , it is given by det (g )dx0 ^ dx1 ^ dx2 ^ dx3 . In SRT, in Cartesian coordinates, one can define the tensor-valued 3-forms t ¼ t
and
s ¼ s
½5
so that eqns [1] and [2] become dt ¼ 0
and
where ½6
For an isolated system, the 3-forms t and j , integrated over the 3-space x0 = const., give the system’s total energy–momentum vector and angular momentum bivector, respectively. Linear Connection, Its Curvature and Torsion
A linear connection on M is represented, with respect to the field of frames, by the field of 1-forms ! ¼
and
D ¼ ^
For a differential form ’ of type , the following identity holds: b D2 ’a ¼ a b ^ ’
½7
The tensors of curvature and torsion are given by ¼ 12 R ^ and ¼ 12 Q ^ respectively. With respect to a holonomic frame, d = 0, one has Q ¼
dj ¼ 0
j ¼ x t x t þ s
D ¼ 0
In SRT, the Cartesian coordinates define a radius-vector field X = x , pointing towards the origin of the coordinate system. The differential equation it satisfies generalizes to a manifold with a linear connection: DX þ ¼ 0
½8
By virtue of [7], the integrability condition of [8] is X þ ¼ 0 Integration of [8] along a curve defines the Cartan displacement of X; if this is done along a small closed circuit spanned by the bivector f , then the radius vector changes by about X ¼ 12 ðR X þ Q Þf
192 Einstein–Cartan Theory
This holonomy theorem – rather imprecisely formulated here – shows that torsion bears to translations a relation similar to that of curvature to linear homogeneous transformations. In a space with torsion, it matters whether one considers the potential of the electromagnetic field to be a scalar-valued 1-form ’ or a covector-valued 0-form (’ ). The first choice leads to a field d’ that is invariant with respect to the gauge transformation ’ 7! ’ þ d . The second gives 12 (r ’ r ’ ) ^ = (D’ ) ^ = d’ ’ , a gauge-dependent field.
covariant derivative with respect to !. ˜ By definition, a symmetry of a Riemann–Cartan space is a diffeomorphism of M preserving both g and !. The one-parameter group of local transformations of M, generated by the vector field v, consists of symmetries of (M, g, !) if and only if
Metric-Affine Geometry
In a Riemannian space, the connections ! and !˜ coincide and [14] is a consequence of the Killing equation [13]. The metricity condition implies
A metric-affine space (M, g, !) is defined to have a metric and a linear connection that need not depend on each other. The metric alone determines the torsion free Levi-Civita connection ! characterized by
d þ ! ^ ¼ 0
and
¼
d!
þ
!
^
~ v þ R v ¼ 0 Dr
½14
and
D ¼
½15
The Einstein–Cartan Theory of Gravitation
!
An Identity Resulting from Local Invariance
The 1-form of type ad,
¼ ! !
½9
determines the torsion of ! and the covariant derivative of g, ¼ ^ ;
Dg ¼
¼ þ D þ ^
12 ! ^ s þ an exact form ½10
The transposed connection !˜ is defined by ~ ¼ ! þ Q ! so that, with respect to a holonomic frame, one has ˜ = . The torsion of !˜ is opposed to that of !. Riemann–Cartan Geometry
A Riemann–Cartan space is a metric-affine space with a connection that is metric, Dg ¼ 0
½16
so that La = 0 is the Euler–Lagrange equation for ’. If the changes of the functions g, , !, and ’ are induced by an infinitesimal change of the frames [4], then L = 0 and [16] gives the identity b g ^ t þ 12 Ds b a La ^ ’ ¼ 0
It follows from the identity that the two sets of Euler–Lagrange equations obtained by varying L with respect to the triples (’, , !) and (’, g, !) are equivalent. In the sequel, the first triple is chosen to derive the field equations.
½11
The metricity condition implies that þ = 0 and þ = 0. In a Riemann–Cartan space, the connection is determined by its torsion Q and the metric tensor. Let Q = g Q ; then
¼ 12 ðQ þ Q þ Q Þ
Let (M, g, !) be a metric-affine spacetime. Consider a Lagrangian L which is an invariant 4-form on M; it depends on g, , !, ’, and the first derivatives of ’ = ’a ea . The general variation of the Lagrangian is L ¼ La ^ ’a þ 12 g þ ^ t
The curvature of ! can be written as
½13
Dg ¼ 0
Its curvature is
~ v ¼ 0 ~ v þ r r
½12
The transposed connection of a Riemann–Cartan space is metric if and only if the tensor Q is ~ denote the completely antisymmetric. Let r
Projective Transformations and the Metricity Condition
Still under the assumption that (M, g, !) is a metricaffine spacetime, consider the 4-form 8 K ¼ 12 g ^
½17
which is equal to R, where R = g R is the Ricci scalar; the Ricci tensor R = R is, in general, asymmetric. The form [17] is invariant with
Einstein–Cartan Theory
respect to projective connection,
transformations
! 7! ! þ
of
the ½18
where is an arbitrary 1-form. Projectively related connections have the same (unparametrized) geodesics. If the total Lagrangian for gravitation interacting with the matter field ’ is K þ L, then the field equations, obtained by varying it with respect to ’, , and ! are: La = 0, 1 2 g
^ ¼ 8 t
½19
and Dðg Þ ¼ 8 s
½20
respectively. Put s = g s . If s þ s ¼ 0
½21
then s = 0 and L is also invariant with respect to [18]. One shows that, if [21] holds, then, among the projectively related connections satisfying [20], there is precisely one that is metric. To implement properly the metricity condition in the variational principle, one can use the Palatini approach with constraints (Kopczyn´ski 1975). Alternatively, following Hehl, one can use [9] and [12] to eliminate ! and obtain a Lagrangian depending on ’, , and the tensor of torsion.
relativistic gravity are based on consideration of Einstein’s equations in empty space, there is no difference, in this respect, between the Einstein and the Einstein–Cartan theories: the latter is as viable as the former. In any case, the consideration of torsion amounts to a slight change of the energy–momentum tensor that can be also obtained by the introduction of a new term in the Lagrangian. This observation was made in 1950 by Weyl in the context of the Dirac equation. In Einstein’s theory, one can also satisfactorily describe spinning matter without introducing torsion (Bailey and Israel 1975). Consequences of the Bianchi Identities: Conservation Laws
Computing the covariant exterior derivatives of both sides of the Einstein–Cartan equations, using [15] and the Bianchi identities, one obtains
^ ¼ 8 s
½22
Introducing the asymmetric energy–momentum tensor t and the spin density tensor s = g s similarly as in [5], one can write the Einstein–Cartan equations [19] and [22] in the form given by Sciama and Kibble, R 12 g R ¼ 8 t
½23
Q þ Q Q ¼ 8 s
½24
Equation [24] can be solved to give Q ¼ 8 ðs þ 12 s þ 12 s Þ
½25
Therefore, torsion vanishes in the absence of spin and then [23] is the classical Einstein field equation. In particular, there is no difference between the Einstein and Einstein–Cartan theories in empty space. Since practically all tests of
8 Dt ¼ 12 ^
½26
8 Ds ¼ ^ ^
½27
and
Cartan required the right-hand side of [26] to vanish. If, instead, one uses the field equations [19] and [22] to evaluate the right-hand sides of [26] and [27], one obtains
The Sciama–Kibble Field Equations
From now on the metricity condition [11] is assumed, so that [21] holds and the Cartan field equation [20] is
193
Dt ¼ Q ^ t 12 R ^ s
½28
Ds ¼ ^ t ^ t
½29
and
Let v be a vector field generating a group of symmetries of the Riemann–Cartan space (M, g, !) so that eqns [13] and [14] hold. Equations [28] and [29] then imply that the 3-form j ¼ v t þ 12 r~ v s
is closed, dj = 0. In particular, in the limit of SRT, in Cartesian coordinates x , to a constant vector field v there corresponds the projection, onto v, of the energy–momentum density. If A is a constant bivector, then v = A x gives j = j A , where j is as in [6]. Spinning Fluid and the Generalized Mathisson– Papapetrou Equation of Motion
As in classical general relativity, the right-hand sides of the Einstein–Cartan equations need not necessarily be derived from a variational principle; they may be determined by phenomenological
194 Einstein–Cartan Theory
considerations. For example, following Weyssenhoff, consider a spinning fluid characterized by t ¼ P u
and
s ¼ S u
where S þ S = 0 and u is the unit, timelike velocity field. Let U = u so that t ¼ P U
and
s ¼ S U
Define the particle derivative of a tensor field ’a in the direction of u by ’_ a ¼ Dð’a UÞ For a scalar field ’, the equation ’˙ = 0 is equivalent to the conservation law d(’U) = 0. Define = g P u , then [29] gives an equation of motion of spin S_ ¼ u P u P
Introducing the Compton wavelength rCompt = l2 =m 1013 cm, one can write
P ¼ u þ S_ u
rCart ðl2 rCompt Þ1=3
so that
From [28] one obtains the equation of translatory motion, P_ ¼ ðQ P 12 R S Þu which is a generalization to the ECT of the Mathisson–Papapetrou equation for point particles with an intrinsic angular momentum. From ECT to GRT: The Effective Energy–Momentum Tensor
Inside spinning matter, one can use [12] and [25] to eliminate torsion and replace the Sciama–Kibble system by a single Einstein equation with an effective energy–momentum tensor on the righthand side. Using the split [10], one can write [23] as
eff R 12 g R ¼ 8 T
From the physical point of view, the second term on the right-hand side of [31], can be thought of as providing a spin–spin contact interaction, reminiscent of the one appearing in the Fermi theory of weak interactions. It is clear from eqns [30]–[32] that whenever terms quadratic in spin can be neglected – in particular, in the linear approximation – ECT is equivalent to GRT. To obtain essentially new effects, the density of spin squared should be comparable to the density of mass. For example, to achieve this, a nucleon of mass m should be squeezed so that its radius rCart be such that !2 l2 m 3 3 rCart rCart
½30
Here R and R are, respectively, the Ricci tensor and scalar formed from g. The term in [10] that is quadratic in contributes to T eff an expression quadratic in the components of the tensor s so that, neglecting indices, one can symbolically write T eff ¼ T þ s2
½31
The symmetric tensor T is the sum of t and a term coming from D in [10]: T ¼ t þ 12 r ðs þ s þ s Þ
½32
It is remarkable that the Belinfante–Rosenfeld symmetrization of the canonical energy–momentum tensor appears as a natural consequence of ECT.
The ‘‘Cartan radius’’ of the nucleon, rCart 1026 cm, so small when compared to its physical radius under normal conditions, is much larger than the Planck length. Curiously enough, the energy l2 =rCart is of the order of the energy at which, according to some estimates, the grand unification of interactions is presumed to occur. Cosmology with Spin and Torsion
In the presence of spinning matter, T eff need not satisfy the positive-energy conditions, even if T does. Therefore, the classical singularity theorems of Penrose and Hawking can be overcome here. In ECT, there are simple cosmological solutions without singularities. The simplest such solution, found in 1973 by Kopczyn´ski, is as follows. Consider a universe filled with a spinning dust such that P = u , u = 0 , S23 = , and S = 0 for þ 6¼ 5, and both and are functions of t = x0 alone. These assumptions are compatible with the Robertson–Walker line element dt2 R(t)2 (dx2 þ dy2 þ dz2 ), where (x, y, z)= (x1 , x2 , x3 ) and torsion is determined from [25]. The Einstein equation [23] reduces to the modified Friedmann equation, 1 _2 2R
MR1 þ 32 S2 R4 ¼ 0
½33
supplemented by the conservation laws of mass and spin, M ¼ 43 R3 ¼ const:;
S ¼ 43 R3 ¼ const:
The last term on the left-hand side of [33] plays the role of a repulsive potential, effective at small values of R; it prevents the solution from vanishing. It should be
Einstein’s Equations with Matter
noted, however, that even a very small amount of shear in u results in a term counteracting the repulsive potential due to spin. Neglecting shear and making the (unrealistic) assumption that matter in the universe at t = 0 consists of 1080 nucleons of mass m with aligned spins, one obtains the estimate R(0) 1 cm and a density of the order of m2 =l4 , very large, but much smaller than the Planck density 1=l2 . Tafel (1975) found large classes of cosmological solutions with a spinning fluid, admitting a group of symmetries transitive on the hypersurfaces of constant time. The models corresponding to symmetries of Bianchi types I, VII0 , and V are nonsingular, provided that the influence of spin exceeds that of shear.
Summary ECT is a viable theory of gravitation that differs very slightly from the Einstein theory; the effects of spin and torsion can be significant only at densities of matter that are very high, but nevertheless much smaller than the Planck density at which quantum gravitational effects are believed to dominate. It is possible that ECT will prove to be a better classical limit of a future quantum theory of gravitation than the theory without torsion. See also: Cosmology: Mathematical Aspects; General Relativity: Overview.
Further Reading Arkuszewski W, Kopczyn´ski W, and Ponomariev VN (1974) On the linearized Einstein–Cartan theory. Annales de l’Institut Henri Poincare´ 21: 89–95.
195
Bailey I and Israel W (1975) Lagrangian dynamics of spinning particles and polarized media in general relativity. Communications in Mathematical Physics 42: 65–82. Cartan E´ (1923, 1924, 1925) Sur les varie´te´s a` connexion affine et la the´orie de la relativite´ ge´ne´ralise´e. Part I: Annales de l’E´cole Normale Superie´ure 40: 325–412 and ibid. 41: 1–25; Part II: ibid. 42: 17–88; English transl. by A Magnon and A Ashtekar, On manifolds with an affine connection and the theory of general relativity. Napoli: Bibliopolis (1986). Cosserat EF (1909) The´orie des corps de´formables. Paris: Hermann. Hammond RT (2002) Torsion gravity. Reports of Progress in Physics 65: 599–649. Hehl FW, von der Heyde P, Kerlick GD, and Nester JM (1976) General relativity with spin and torsion: foundations and prospects. Reviews of Modern Physics 48: 393–416. Hehl FW, McCrea JD, Mielke EW, and Ne’eman Y (1995) Metric-affine gauge theory of gravity: field equations, Noether identities, world spinors, and breaking of dilation invariance. Physics Reports 258: 1–171. Kibble TWB (1961) Lorentz invariance and the gravitational field. Journal of Mathematical Physics 2: 212–221. Kopczyn´ski W (1975) The Palatini principle with constraints. Bulletin de l’Acade´mie Polonaise des Sciences, Se´rie des Sciences Mathe´matiques, Astronomiques et Physiques 23: 467–473. Mathisson M (1937) Neue Mechanik materieller Systeme. Acta Physica Polonica 6: 163–200. Penrose R (1983) Spinors and torsion in general relativity. Foundations of Physics 13: 325–339. Ruggiero ML and Tartaglia A (2003) Einstein–Cartan theory as a theory of defects in space–time. American Journal of Physics 71: 1303–1313. Sciama DW (1962) On the analogy between charge and spin in general relativity. In: (volume dedicated to Infeld L) Recent Developments in General Relativity, pp. 415–439. Oxford: Pergamon and Warszawa: PWN. Tafel J (1975) A class of cosmological models with torsion and spin. Acta Physica Polonica B 6: 537–554. Trautman A (1973) On the structure of the Einstein–Cartan equations. Symposia Mathematica 12: 139–162. Van Nieuwenhuizen P (1981) Supergravity. Physics Reports 68: 189–398.
Einstein’s Equations with Matter
Newton’s theory of gravity with absolute time and Euclidean 3-space connects the gravitational potential U with its source, the density of matter r, by the Poisson equation
Newton’s theory has proven to be very accurate in the laboratory as well as in the solar system (except for a small discrepancy with the observed value of Mercury perihelion). Newton’s theory together with special relativity, the equivalence principle, and ideas of Mach, have been an inspiration for Einstein to uncover the equations which must be satisfied by the geometry of spacetime. They link the curvature of the spacetime metric with a phenomenological symmetric 2-tensor T, which must represent the energy, momentum, and stresses of all the sources, by the equality:
U ¼ 4 r
SðgÞ RicciðgÞ 12gRðgÞ ¼ 8 T
where is the Laplace operator and is the gravitational constant. The trajectories of massive test particles are the flow lines of the gradient of U.
where Ricci(g) is the Ricci tensor of the spacetime metric g and R(g) its scalar curvature. The symmetric 2-tensor S(g) is called the Einstein tensor. The
Y Choquet-Bruhat, Universite´ P.-M. Curie, Paris VI, Paris, France ª 2006 Elsevier Ltd. All rights reserved.
Introduction
196 Einstein’s Equations with Matter
Bianchi identities, due to the invariance of curvature by isometries of g, imply that the divergence of the Einstein tensor is identically zero: the Einstein equations imply therefore the vanishing of the divergence of the source tensor T. The equations so obtained generalize in a relativistic context the conservation laws of Newtonian mechanics. In local spacetime coordinates x , the Einstein equations and conservation laws read S R 12g R ¼ 8T ;
r T 0
where r denotes the covariant derivative in the metric g. The gravitational constant is inspired by the Newtonian equation relating the potential U with the density of matter. This equation can be obtained as an approximation of Einstein’s equations with matter in the case of low velocities of matter and weak gravitational fields. The Newton’s equation of motion of test particles is also an approximation of Einstein’s geodesic motion of such particles which can be deduced from Einstein’s equations themselves. However, if one wants to remain in the framework of the general relativity theory, it is these Einstein’s equations which define the mass of a body, there is no comparison possible with some fixed given mass. As length had the dimension of time already in special relativity, now mass is found to have dimension of length. We write the equations in geometrical units, where 8 = 1, keeping in mind the corresponding change to usual laboratory units only in specific applications. In geometrical units the mass of the Earth is of the order of the centimeter. The most precise measures of are still made using Newton type experiments, giving = 6.67259 1011 m3 kg1 s2 . In the case of electromagnetic (or classical Yang– Mills) field sources, the stress energy tensor in special relativity is the well-known Maxwell tensor (or its generalizations), whose divergence vanishes when the field satisfies the Maxwell (or Yang–Mills) equations in vacuum. The expression of this tensor in a curved spacetime can be trivially deduced from its Minkowskian form. Its expression can also be deduced from the Lagrangian, and the vanishing of its divergence results from the invariance of this Lagrangian under isometries of the metric. It is the natural source of Einstein equations coupled with these fields. In the case of matter, the construction of a stress energy tensor is already delicate even in special relativity. The simplest models of sources with wellunderstood properties – kinetic matter and perfect fluids – are reviewed in this article. Physical
situations difficult to model, even in special relativity, dissipative fluids and elasticity, are mentioned. The extension to electrically, or classical Yang– Mills–Higgs, charged matter, offers no conceptual difficulty, but interesting new situations.
Fluid Sources A fluid source in a domain of a spacetime (V, g) is such that there exists, in this domain, a unit timelike vector field u, satisfying g(u,u) g u u = 1, whose trajectories are the flow lines of matter. A moving Lorentzian orthonormal frame is called a proper frame if its timelike vector is u. Since the Einstein gravitational potentials reduce at a point in a Lorentzian orthonormal frame to Minkowskian values, one admits that the spacetime symmetric 2-tensor T, which embodies the density of stress, energy, and momentum of a given type of matter, in a proper frame takes the expression it would have in special relativity and inertial coordinates. The expression of T in a general frame results from its tensorial character and the equivalence principle. The problem is to find a good expression of T in special relativity. Case of Dust (Incoherent Matter)
In a proper frame there is neither momentum nor stresses. Therefore, the stress energy tensor reads in a general frame, with r a scalar function representing the matter density: T ¼ ru u;
i:e:, T ¼ ru u
Using the property g(u,u) = 1, the conservation laws imply the vanishing of the divergence of the matter flow ru, that is, the continuity equation (conservation of matter) r ðru Þ ¼ 0 and the motion of the particles along geodesics of the metric: u r u ¼ 0 Similar equations are obtained for a null dust model where g(u,u) = 0. Perfect Fluid
Euler equations In Newtonian mechanics, a continuous matter flow is characterized by its mass density and flow velocity. The equations are a continuity equation (conservation of matter) and equations of motion resulting from Newton’s law, which link the acceleration vector and the space
Einstein’s Equations with Matter
divergence of the stress symmetric 2-tensor whose contraction with the normal to a small 2-surface gives the force applied to it. A fluid is called perfect if the pressure it applies to a small surface element with normal n is independent of n. Its stress tensor t, symmetric 2-tensor on Euclidean space, is then invariant by rotations. By generalization, a relativistic fluid is called perfect if its stress energy tensor has the following form: T ¼ u u þ pðg þ u u Þ Then in a proper frame, where g takes the Minkowskian values and the only nonvanishing component of u is along the time axis and equal to 1, the projection of T on space is the Newtonian stress tensor with pressure p, while , the projection of T on the time axis, is the fluid energy density. There is no momentum density in the proper frame. The conservation laws, also called Euler equations, are shown to split, as in the case of dust, into a continuity equation r ½ð þ pÞu u @ p ¼ 0 and equations of motion ð þ pÞu r u þ ðg þ u u Þ@ p ¼ 0 In relativity, where mass and energy are equivalent, the continuity equation is no more a conservation law. Equations of state As in Newtonian mechanics, the Euler equations must be completed by a relation, called equation of state, depending on the physical properties of the fluid. In general in addition to mechanics, thermodynamic properties must be considered. In relativity, they are borrowed from classical thermodynamics formulated in a spacetime context. In the simplest cases one introduces a conserved rest mass density r (or particle number density for particles with rest mass zero), satisfying the equation r P ¼ 0
with
197
is equivalent to the conservation of entropy along the flow lines: r ðrSu Þ ¼ 0
hence
u @ S ¼ 0
The scalars p, ,S, r are not independent. Simple situations can be modeled by an ‘‘equation of state’’ linking these quantities. In astrophysics, one is inspired by what is known from classical fluids, with additional relativistic considerations. General relativity plays a role in the case of strong gravitational field. Very cold matter and nuclear matter are barotropic fluids; they obey an equation of state of the form p = p(). When the energy is largely dominated by the radiation energy, the fluid is called ultrarelativistic. The Stefan–Boltzmann laws give = KT 4 and p = (1=3)KT 4 , hence p = (1=3); the stress energy tensor is traceless. In white dwarves, the fluid is considered as polytropic: it obeys an equation of state of the form p = f (S)r . If only the internal energy " and pressure p are dominated by radiation, then " = Kr1 T 4 and p = (1=3)KT 4 , hence p = (1=3)r". The use of the thermodynamic identity leads to = 4=3, p = (K=3)(3S=4K)4=3 r4=3 , with = 3p þ r. For most other stars, the physical situation is too complex to be modeled by a simple equation; only tables of numerical values may be available. In cosmology, there is little physical information about the fluid which is to represent the energy content of the universe. It is assumed that in the early universe of the big-bang models, at very high temperature, the fluid was ultrarelativistic. At later times, it is generally assumed, for simplicity, that there is an equation of state linear and independent of entropy, p = ( 1). In order that the speed of sound waves be not greater than the speed of light, one assumes that 1 2; = 1 corresponds to dust, = 2 to a stiff (see below) fluid. Recent confrontations of theory and observations seem to imply the existence of a new, not directly seen, type of matter, called ‘‘dark matter.’’
P ru
This r differs from the density of energy . One sets = r(1 þ ") and calls " the internal specific energy. The first law of (reversible) thermodynamics is extended to relativistic perfect fluids by the identity dS d" þ pdðr1 Þ which defines both the absolute temperature and the differential of the specific entropy S. Modulo the continuity equation and the thermodynamic identity, the matter conservation
Wave fronts and propagation speeds The wave fronts of a differential system are the submanifolds of spacetime whose normals n annul the characteristic determinant. Discontinuities propagate along wave fronts. For a hyperbolic system, the wave fronts determine the domain of dependence of a solution. For a perfect fluid, they are found to be 1. the matter wave fronts, generated by the flow lines, such that u n = 0 and
198 Einstein’s Equations with Matter
2. the sound wave fronts, whose normals satisfy the equation D ðp0 1Þðu n Þ2 þ p0 n n ¼ 0 in a proper frame at a point of spacetime u = 0 , g = ; this equation states that the slope of the spacetime normal to the wave front can be written as ðni Þ2 n20
!1=2
1 ¼ pffiffiffiffiffi0 p
The sound propagation pffiffiffiffiffi speed is the inverse of this slope, that is, v = p0 . It is less than the speed of light, as expected from a relativistic theory, if p0 1. The limiting case where these speeds are equal is called incompressible or stiff fluid.
Global problems The spacetimes obtained above are, in general, incomplete: even in Minkowski spacetime, the Euler equations do not in general have solutions that are global in time. Shocks appear in relativistic perfect fluids as in classical ones. Global existence results have been obtained for fourdimensional ultrarelativistic fluids (limited data), and in the case of 1-space dimension. A detailed study of the global behavior of spherically symmetric solutions of the Einstein–Euler equations with equation of state admitting a phase transition from zero pressure to stiff fluid has been done by Christodoulou. Dissipative Fluids
A general fluid stress energy tensor is with u, a unit vector whose trajectories are the flow lines: T ¼ u u þ q u þ q u þ Q with q u ¼ 0;
Hyperbolicity, existence, and uniqueness theorem The characteristics of the perfect fluid equations are real, but the apparent multiplicity of the matter wave fronts poses a problem for the hyperbolicity of the relativistic Euler equations, even in a given background metric. However, Choquet-Bruhat has proven that this system is a hyperbolic Leray system as well as its coupling with the Einstein equations, for instance, in wave gauge. The following theorem can then be proved using the general theorem on hyperbolic systems and an extension of the method used for Einstein’s equations in vacuum. Theorem Let (M,g,K) be an initial data set for the ¯ ¯ be Cauchy data in a S) Einstein equations and (u, ¯ , loc local Sobolev space Hs , s 3, on the 3-manifold M for a perfect fluid with a smooth equation of state. Suppose > 0 and p0 1. There exists a globally hyperbolic spacetime of maximal extension solution of the Einstein equations with source such as perfect fluid taking these Cauchy data. Such a spacetime and fluid flow are smooth for smooth initial data. They are unique, up to spacetime isometries. The Euler equations have also been written as a first-order symmetric hyperbolic system by Boillat, Ruggeri, and Strumia using general methods relying on the existence of a convex functional, and directly by Rendall, who pointed out the difficulty of modeling the general motion of isolated fluid bodies, because of the assumption > 0. He constructed some solutions without this assumption where the boundaries are freely falling. The general problem of determining the evolution of boundaries appears everywhere in general relativity, and in classical mechanics.
Q u ¼ 0
= T u u is the energy density, which must satisfy 0, Q is a space tensor representing the stresses, orthogonal to u and q is a space vector considered as a heat flow. The fundamental equations are still r T = 0, but they must be implemented by constitutive equations for q and Q which do not have simple satisfactory answer in a relativistic context. The transfer of results from classical mechanics on viscous fluids or on heat transfer leads to propagation speeds greater than the speed of light. It should be remarked that these classical equations are obtained as governing asymptotic states; thus, the parabolic character of their relativistic version does not contradict relativistic causality. However, it would be interesting to obtain, for dissipative relativistic fluids, hyperbolic dissipative equations. Various systems have been proposed, in particular, by Marle by using an approximation near equilibrium of a solution of the relativistic Boltzmann equation. A promising system, also inspired from kinetic theory, is the ‘‘extended thermodynamics’’ of Mu¨ller and Ruggeri which takes as 14 fundamental unknowns, the vector P = ru and the tensor T, satisfying the conservation laws. These equations are supplemented by equations linking a totally symmetric 3-tensor A with a symmetric 2-tensor I by equations of the form r A ¼ I
½1
A and I are functions of P and T depending on the model and called constitutive equations. The system is shown to be symmetric hyperbolic under the existence of a convex entropy function, property which holds under appropriate physical assumptions.
Einstein’s Equations with Matter
Reasonable equations have been proposed and studied for several constituent fluids and superfluids. Charged Fluids
The stress energy tensor of a charged fluid with electric (or Yang–Mills) charge is generally the sum of the stress energy tensor of the fluid and of the Maxwell (or Yang–Mills) field. This tensor is conserved modulo the Maxwell (or Yang–Mills) equations with source the electric current, and the Euler equations completed by the Lorentz force. The corresponding Einstein–Maxwell perfect fluid system is well posed in the case of zero or infinite conductivity (magnetohydrodynamics). A subtlety appears in the case of finite conductivity: the system is still well posed, but for a restricted (Gevrey) class of C1 fields.
Kinetic Models
199
Liouville–Vlasov Equation
When the gas is so rarefied that the particle trajectories do not cross, then in the absence of nongravitational forces, these trajectories are geodesics of g, orbits in TV of the vector field X = (p , Q p pm ) with , the Christoffel symbols of g. In a collisionless model, the physical law of conservation of particles imposes the conservation of f along the trajectories of X, that is, the Liouville– Vlasov equation LX f p
@f @f þ Q ¼ 0 @x @p
Conservation laws If f satisfies the Vlasov equation, then all moments satisfy a conservation law, in particular, r P ¼ 0 and r T ¼ 0
Distribution Function and Moments
A general relativistic kinetic theory can be formulated without appeal to classical mechanics or special relativity. The matter is composed of particles whose size is negligible in the considered scale: rarefied gases in the laboratory, galaxies or even clusters of galaxies at the cosmological scale. The number of particles is so great and their motion so chaotic that the state of the matter can be described by a ‘‘one-particle distribution function,’’ a positive scalar function on the tangent bundle to the spacetime (x, p) 7! f (x, p), which gives the mean number of particles with momentum p present at the point x of spacetime. The first moment of f is a causal vector field P defined by the integral over the space P x of momenta at x, with !p a volume element in that space: Z PðxÞ ¼: pf ðx; pÞ!p Px
Out of the first moment, one extracts a scalar r 0, interpreted as the square of a proper mass density given by r 2 =: g(P, P) and, if r > 0, a unit vector u = r1 P interpreted as the macroscopic flow velocity. The second moment of the distribution function f is the symmetric 2-tensor on spacetime given by Z TðxÞ ¼: f ðx; pÞp p!p Px
It is interpreted as the stress energy tensor of the distribution f. Higher moments are defined similarly.
equations which make the Einstein–Vlasov system consistent. The theory extends without problem to particles having the same rest mass m, because the scalar g(p, p) = m2 is constant on a geodesic. Cauchy problem The Einstein–Vlasov system is an integro-differential system for g and f on a manifold V = M R. The Cauchy data for the spacetime metric g on M0 = M {0} is, as usual, a pair (g, ¯ K), implemented with gauge initial data which complete the definition of Cauchy data for a well-posed hyperbolic system in the chosen gauge. The Cauchy data for f are a function f¯ on the bundle PM0 . It has been proved long ago that there exists a solution, geometrically unique, in a neighborhood of M0 if the data are in Sobolev spaces, weighted by a power ¯ of p0 in the case of f. Since the Vlasov matter model, solution of a linear equation for given g, has no singularity by itself, the Einstein–Vlasov system is a good candidate for solutions that are global in time. This global existence has been proved by Rein and Rendall in the case of small data, asymptotically flat with spherical symmetry or plane symmetry, or with hyperbolic symmetry and compact space. Global existence without these symmetries is an open problem. Boltzmann Equation
When the particles undergo collisions, their trajectories in phase space are no more connected integral curves of the vector field X, that is, their moment
200 Einstein’s Equations with Matter
undergoes a jump with the crossing of another trajectory. In the Boltzmann model, the derivative LX f is equal to the so-called collision operator, I f : ðLX f Þðx; pÞ ¼ ðI f Þðx; pÞ where I f is an integral operator linked with the probability that two particles of momentum, respectively, p0 and q0 , collide at x and give, after the shock, two particles of momentum p and q. For ‘‘elastic’’ shocks, the total momentum is conserved, that is, p0 and q0 lie in the submanifold pq =: {p0 þ q0 = p þ q}, with volume element 0 and Z Z ½f ðx; p0 Þf ðx; q0 Þ ðI f Þðx; pÞ Px
pq 0
0
0
f ðx; pÞf ðx; qÞAðx; p; q; p ; q Þ ^ !q The function A(x, p, q, p0 , q0 ) is called the shock cross section; it is a phenomenological quantity. No explicit expression is known for it in relativity. A generally admitted property is the reversibility of elastic shocks, A(x, p, q, p0 , q0 ) = A(x, p0 , q0 , p, q). It can be proved that under this hypothesis, the first and second moment of f are conserved as in the collisionless case, making the Einstein–Boltzmann system consistent. Existence of solutions (that are local in time) of the Cauchy problem for this system has long been known. No global existence for the coupled system is known yet. One defines, in a relativistic context, an entropy flux vector H which is proved to satisfy an H-theorem, that is, r H 0. In an expanding universe, for instance, Robertson Walker, where H depends only on time and an entropy density is defined by H 0 , one finds that a decrease in entropy is linked with the expansion of the universe, thus permitting its ever-increasing organization from an initial anisotropy of f in momentum space.
Other Matter Sources Elastic Media
There are no solids in general relativity; in special relativity rigid motions are already very restricted. A theory of elastic deformations can only be defined relatively to some a priori given state of matter whose perturbations will satisfy laws analogous to the classical laws. Various such theories have been proposed through geometric considerations, extending methods of classical elasticity; they have been used to predict the possible signals from bar detectors of gravitational waves, or the motions in the crust of neutron stars. A general theory constructed by Lagrangian formalism has recently been developed.
Spinor Sources
A symmetric stress energy tensor can be associated to classical spinors of spin 1/2, leading to a wellposed Einstein–Dirac system. The theories of supergravity couple the Einstein–Cartan equations with anticommuting spin 3/2 sources. See also: Boltzmann Equation (Classical and Quantum); Einstein Equations: Exact Solutions; Einstein Equations: Initial Value Formulation; General Relativity: Overview; Geometric Analysis and General Relativity; Kinetic Equations; Spinors and Spin Coefficients.
Further Reading Anile M (1989) Relativistic Fluids and Magneto Fluids. Cambridge: Cambridge University Press. Bancel D and Choquet-Bruhat Y (1973) Existence, uniqueness and local stability for the Einstein–Maxwell–Boltzmann system. Communications in Mathematical Physics 33: 76–83. Beig R and Schmidt B (2002) Relativistic elasticity, gr-qc 0211054. Carter B and Quintana H (1972) Foundations of general relativistic high pressure elasticity theory. Proceedings of the Royal Society A 331: 57–83. Choquet-Bruhat Y (1958) The´ore`mes d’existence en me´canique des fluides relativistes. Bull. Soc. Math. de France 86: 155–175. Choquet-Bruhat Y (1966) Etude des e´quations des fluides relativistes inductifs et conducteurs. Communications in Mathematical Physics 3: 334–357. Choquet-Bruhat Y (1987) Spin 12 fields in arbitrary dimensions and the Einstein Cartan theory. In: Rindler W and Trautman A (eds.) Gravitation and Geometry, pp. 83–106. Bibliopolis. Choquet-Bruhat Y and Lamoureux-Brousse L (1973) Sur les e´quations de l’e´lasticite´ relativiste. Comptes Rendus Hebdomadalrun dun de l’ Academic des Sciences, Paris A 276: 1317–1321. Choquet-Bruhat Y. General Relativity and Einstein’s equations (in preparation). Christodoulou D (1995) Self gravitating fluids: a two-phase model. Archives for Rational and Mechanical Analysis 130: 343–400 and subsequent papers. Ehlers J (1969) General relativity and kinetic theory. corso XLVII Scuola Internazionale Enrico Fermi. Friedrich H (1998) Evolution equations for gravitating fluid bodies in general relativity. Physical Review D 57: 2317–2322. Geroch R and Lindblom L (1991) Causal theories of relativistic fluids. Annals of Physics 207: 394–416. Israel W (1987) Covariant fluid mechanics and thermodynamics, an introduction. In: Anile M and Choquet-Bruhat Y (eds.) Relativistic Fluid Dynamics, LNM 1385. Springer. Lichnerowicz A (1967) Relativistic Hydrodynamics and Magnetohydrodynamics. Benjamin. Marle C (1969) Sur l’e´tablissement des e´quations de l’hydrodynamique des fluides relativistes dissipatifs. Ann Inst. H. Poincare´ 10: 67–194. Mu¨ller I and Ruggeri T (1999) Rational extended thermodynamics. Springer. Rein G and Rendall AD (1992) Global existence of solutions of the sphericall symmetric Vlasov–Einstein system with small initial data. Communications in Mathematical Physics 150: 561–583. Rendall AD (1992) The initial value problem for a class of general relativistic fluid bodies. Journal of Mathematical Physics 33: 1047–1053. Taub AH (1957) Relativistic hydrodynamics III. Journal of Mathematics 1: 370–400.
Electric–Magnetic Duality
201
Electric–Magnetic Duality Tsou Sheung Tsun, University of Oxford, Oxford, UK ª 2006 Elsevier Ltd. All rights reserved.
Introduction Classical electromagnetism is described by Maxwell’s equations, which, in 3-vector notation and corresponding respectively to the laws of Coulomb, Ampe`re, Gauss, and Faraday, are given by eqns [1a]–[1d]: div E ¼ ½1a curl B
@E ¼J @t
½1b
div B ¼ 0
½1d
Equivalently, in covariant 4-vector notation, these correspond to eqns [2a] and [2b]: @ F ¼ j
½2a
@ F ¼ 0
½2b
In eqns [1], E and B are the electric and magnetic fields, respectively, is the electric charge density, and J is the electric current. In eqns [2], F is the field tensor, F the dual field tensor, and j is the 4-current, related to the previous vector quantities by the following relations: 0 1 0 E1 E2 E3 B E 0 B3 B2 C 1 B C F ¼ B C @ E2 B3 0 B1 A
F
E3
B2
B1
0 B B 1 B ¼B @ B2
B1 0
B2 E3
E3
0
B3
E2
E1
0 1 B3 E2 C C C E1 A 0
j ¼ ð; JÞ Throughout this article, we shall denote the three spatial indices by lower-case Latin letters such as i, j, while Greek indices such as , denote spacetime indices running through 0, 1, 2, 3. The Einstein summation convention is used, whereby repeated indices are summed. Spacetime indices are raised
F ¼ 12" F
½3
We say that F is the dual of F , and eqn [3] is indeed a duality relation because eqn [4] holds, which means that up to a sign, F and F are duals of each other:
ð FÞ ¼ F
½1c
@B curl E þ ¼0 @t
0
and lowered by the (flat) Minkowski metric g = diag(1, 1, 1, 1). We also use units conventional in particle physics, in which the reduced Planck constant h and the speed of light c are both set to 1. In terms of the totally skew symmetric symbol " (with "0123 = 1), the two field tensors are related by eqn [3]:
½4
This duality is in fact the Hodge duality between p-forms and (n p)-forms in an n-dimensional space. In our particular case, p = 2 and n = 4, so that both F and its dual are 2-forms. The minus sign in eqn [4] comes about because of the Lorentzian (or pseudo-Riemannian) signature of Minkowski spacetime. The physical significance of this duality is that such a symmetry interchanges electric and magnetic fields (again up to sign) (eqn [5]), as can be seen from the matrix representation of F and F above:
: E 7! B; B 7! E
½5
Now in the absence of electric charges and currents, one sees immediately that Maxwell’s equations [1] or [2] are dual symmetric. This means that, in vacuo, whether we call an electromagnetic field electric or magnetic is a matter of convention. As far as the dynamics is concerned, there is no distinction. On the other hand, eqns [1] and [2] as presented, that is, in the presence of matter, are manifestly not dual symmetric. The underlying reason for this asymmetry has been much studied both in physics and in mathematics. One of the two questions that this article addresses is precisely this. Following on this, we shall see what happens if we try somehow to restore this dual symmetry even in the presence of matter. The second question that we wish to discuss is a generalization of this duality. Electromagnetism is a gauge theory, in which the gauge group is the abelian circle group U(1), representing the phase of wave-functions in quantum mechanics. A physically relevant generalization, in which the abelian U(1) is replaced by a nonabelian group (e.g., SU(2), SU(3))
202 Electric–Magnetic Duality
is called Yang–Mills theory (Yang and Mills 1954), which is the theoretical basis of all modern particle physics. We shall show in this article how the concept of electric–magnetic duality can be generalized in the context of Yang–Mills theory.
F ðxÞ 7! SðxÞF ðxÞS1 ðxÞ
Gauge Invariance, Sources, and Monopoles Electric–magnetic duality, whether in the well-known abelian case or in the still somewhat open nonabelian case, is intimately connected with gauge invariance, sources, and monopoles, and also the dynamics as embodied in the gauge action. These questions in turn find their natural setting in differential geometry, particularly the geometry of fibre bundles. Although classical electrodynamics can be fully described by the field tensor F , one needs to introduce the electromagnetic (or gauge) potential A if one considers quantum mechanics, as has been beautifully demonstrated by the Bohm– Aharonov experiment. The two quantities are related by eqn [6]: F ðxÞ ¼ @ A ðxÞ @ A ðxÞ
½6
The fact that the phase of a wave function (x) (e.g., of the electron) is not a measurable quantity (although relative phases of course are) implies that we are free to make the following transformation: ðxÞ 7! e
ieðxÞ
ðxÞ
½7
This in turn implies an unobservable transformation [8] on the gauge potential, where (x) is a realvalued function on spacetime: A ðxÞ 7! A ðxÞ þ @ ðxÞ
½8
This invariance is called gauge invariance. Since in this abelian case F is gauge invariant, so are the Maxwell equations, for which we shall take from now on the covariant form [2]. Inasmuch as the Maxwell equations dictate the dynamics of electromagnetism, gauge invariance is an intrinsic ingredient even in the classical theory. In Yang–Mills theory, the U(1) phase eie(x) is replaced by an element S(x) of a nonabelian group G, so that eqns [7], [8], and [6] become, respectively, eqns [9], [10], and [11]: ðxÞ 7! SðxÞ ðxÞ A ðxÞ 7! SðxÞA ðxÞS1 ðxÞ
½9
i @ SðxÞS1 ðxÞ ½10 g
F ðxÞ ¼ @ A ðxÞ @ A ðxÞ þ ig½A ðxÞ; A ðxÞ
Here the electric coupling e is replaced by a general gauge coupling g. The quantities A and F now take values in the Lie algebra of the Lie group G and the bracket is the Lie bracket. The wave function (x) takes values in a vector space on which an appropriate representation of G acts. Notice that now the field tensor F is no longer invariant, but only covariant:
½11
½12
Next we consider the charges of gauge theory. For the moment, we wish to distinguish between two types of charges: sources and monopoles. These are defined with respect to the gauge field, which in turn is derivable from the gauge potential. Source charges are those charges that give rise to a nonvanishing divergence of the field. For example, the electric current j due to the presence of the electric charge e occurs on the right-hand side of the first Maxwell equation, and is given in the quantum case by eqn [13], where is a Dirac gamma matrix, identifiable as a basis element of the Clifford algebra over spacetime: j ¼ e
½13
In the Yang–Mills case, the first Maxwell equation is replaced by the Yang–Mills equation D F ¼ j ;
j ¼ g
½14
We define the covariant derivative D as in D F ¼ @ F ig½A ; F
½15
Monopole charges, on the other hand, are topological obstructions specified geometrically by nontrivial G-bundles over every 2-sphere S2 surrounding the charge. They are classified by elements of 1 (G), the fundamental group of G. They are typified by the (abelian) magnetic monopole as first discussed by Dirac in 1931. Let us go into a little more detail about the Dirac magnetic monopole. If the field tensor F does come from a gauge potential A as in eqn [6], then simple algebra will tell us that this implies @ F = 0 as in eqn [2]. Hence, we conclude the following: 9 monopole ¼) A cannot be well defined everywhere The result is actually stronger. Suppose there exists a magnetic monopole at a certain point in spacetime, and, without loss of generality, we shall consider a static monopole. If we surround this point by a (spatial) 2-sphere , then the magnetic flux out of the sphere is given by ZZ
B ds ¼
ZZ N
B ds þ
ZZ
B ds S
½16
Electric–Magnetic Duality
Here N and S are the northern and southern hemispheres overlapping on the equator S. By Stokes’ theorem, since F has no components F0i = Ei , we have ZZ I B ds ¼ A ds ½17a N
ZZ
B ds ¼
I
E¼0 B¼
A ds
½17b
S
In eqn [17b], S means the equator H H with the opposite orientation. Hence, S þ S = 0. But this contradicts the assumption that there exists a magnetic monopole at the center of the sphere. Hence, we see that if a monopole exists, then A will have at least a string of singularities leading out of it. This is the famous Dirac string. The more mathematically elegant way to describe this is that the principal bundle corresponding to electromagnetism with a magnetic monopole is nontrivial, so that the gauge potential A has to be patched (i.e., related by transition functions in the overlap). Consider the example of a static monopole of magnetic charge e˜ . For any (spatial) sphere Sr of radius r surrounding the monopole, we cover it with two patches N, S as follows: 0 < ; 0 2 0 < ; 0 2
In each patch we define the following: ~ey 4rðr þ zÞ ~ex ðNÞ A2 ¼ 4rðr þ zÞ ðNÞ
A1 ¼
ðNÞ A3
along the positive in this case); similarly for A(S) i z-axis. Furthermore, the corresponding field strength is given by ½19a
S
S
ðNÞ: ðSÞ:
203
~er 4r3
½19b
If we now evaluate the ‘‘magnetic flux’’ out of Sr , we have ZZ I ðSÞ dx ¼ ~e ½20 B ds ¼ AðNÞ A Sr
Equator
In other words, in the presence of a magnetic monopole, the second half of Maxwell’s equations is modified according to eqn [21], with j˜ given by eqn [22]. 9 div B ¼ ~ = @ F ¼ ~j ½21 @B ~ ;; curl E þ ¼J @t ~j ¼ ~e
½22
Furthermore, the form of eqn [21] tells us that a monopole of the F field can also be considered as a source of the F field. The two descriptions are equivalent. How are the charges e and e˜ related? The gauge transformation S = eie relating A(N) and A(S) must be well defined; that is, if one goes round the equator once, = 0 ! 2, one should get the same S. This gives e~e ¼ 2n;
n2Z
½23
In particular, the unit electric and magnetic charges are related by eqn [24], which is Dirac’s quantization condition,
¼0
~ey 4rðr zÞ ~ex ðSÞ A2 ¼ 4rðr zÞ ðSÞ
A1 ¼
e~e ¼ 2
ðSÞ
A3 ¼ 0 In the overlap (containing the equator), A(N) and A(S) are related by a gauge transformation: ðNÞ
Ai
ðSÞ
Ai ¼ @i y ~e ~e ¼ ¼ tan1 x 2 2
½18
Notice that A(N) has a line of singularity along i the negative z-axis (which is the Dirac string
½24
So, in principle, just as in the electric case, where we could have charges e, 2e, . . . , here we could also have magnetic charges of e˜ , 2˜e, . . . : In other words, both charges are quantized. Another way to look at this is to consider the classification of principal bundles over S2 . The reason for these topological 2-spheres is that we are interested in enclosing a point charge. For a nontrivial bundle, the patching is given by a function S defined in the overlap (the equator), in other words, a map S1 ! U(1). What this amounts to is a closed curve in the circle group U(1). Now, curves that can be continuously deformed into one another cannot give
204 Electric–Magnetic Duality
distinct fibre bundles, so that one sees easily that there exists a one-to-one correspondence: fprincipal U(1) bundles over S2 g l fhomotopy classes of closed curves in U(1)g This last is 1 (U(1)) ffi Z. Hence, we recover Dirac’s quantization condition. So, for electromagnetism, there are two equivalent ways of defining the magnetic charge, as a source or as a monopole: 1. @ F = ~j / n˜e 6¼ 0. 2. An element of 1 (U(1)) ffi Z.
The same goes for the electric charge. We also note that both definitions give us the fact that these charges are discrete (quantized) and conserved (invariant under continuous deformations). We now want to apply similar considerations to the magnetic charges in the nonabelian case. For several (subtle) reasons the obvious expression D F =? ~j as a source (see Table 1) does not work. The quickest way to say this is that F in general has no corresponding potential A˜ and so is not a gauge field. Moreover, in contrast to the abelian case, the field tensor does not fully specify the physical field configuration, as demonstrated by Wu and Yang. We shall come back to this later. But we have just seen that in the abelian case there is another equivalent definition, which is that a magnetic monopole is given by the gauge configuration corresponding to a nontrivial U(1) bundle over S2 . This can be generalized to the nonabelian case without any problem. Moreover, this definition automatically guarantees that a nonabelian monopole charge is quantized and conserved. This is the way monopoles are defined above. Arguments similar to the abelian case easily yield the nonabelian analog of the Dirac quantization condition, eqn [25], the difference between the two cases being only a matter of conventional normalization. g~ g ¼ 4
½25
Table 1 Definitions of charges
Abelian Nonabelian
Sources
Monopoles
@ F = j D F = j
@ F = ˜j ?
Abelian Duality and the Wu–Yang Criterion We saw above the well-known fact that classical Maxwell theory is invariant under the duality operator. By this we mean that at any point in spacetime free of electric and magnetic charges we have the two dual symmetric Maxwell equations: @ F ¼ 0
½dF ¼ 0
½26
@ F ¼ 0
½d F ¼ 0
½27
Displayed in square brackets are the equivalent equations in the language of differential forms. Then by the Poincare´ lemma we deduce immediately the existence of potentials A and A˜ such that eqns [28] and [29] hold:
F ðxÞ ¼ @ A ðxÞ @ A ðxÞ
½F ¼ dA
½28
~ ðxÞ @ A ~ ðxÞ F ðxÞ ¼ @ A
~ ½ F ¼ dA
½29
The two potentials transform independently under ˜ independent gauge transformations and : A ðxÞ 7! A ðxÞ þ @ ðxÞ
½30
~ ðxÞ 7! A ~ ~ ðxÞ þ @ ðxÞ A
½31
This means that the full symmetry of this theory is ~ doubled to U(1) U(1), where the tilde on the second circle group indicates that it is the symmetry ˜ It is important to note that of the dual potential A. the physical degrees of freedom remain the same. This is clear because F and F are related by an algebraic equation [3]. As a consequence, the physical theory is the same: the doubled gauge symmetry is there all the time but is just not so readily detected. As mentioned in the Introduction, this dual symmetry means that what we call ‘‘electric’’ or ‘‘magnetic’’ is entirely a matter of choice. In the presence of electric charges, the Maxwell equations usually appear as @ F ¼ 0
½32
@ F ¼ j
½33
The apparent asymmetry in these equations comes from the experimental fact that there is only one type of charges observed in nature, which we choose to regard as a source of the field F (or, equivalently but unconventionally, as a monopole of the field F). But as we see by dualizing eqns [32] and [33], that is, by interchanging the role of electricity and magnetism in relation to F, we could equally have thought of these instead as source charges of
Electric–Magnetic Duality
the field F (or, similarly to the above, as monopoles of F): ½34 @ F ¼ ~j @ F ¼ 0
½35
If both electric and magnetic charges existed in nature, then we would have the dual symmetric pair: @ F ¼ ~j
½36
@ F ¼ j
½37
This duality in fact goes much deeper, as can be seen if we use the Wu–Yang criterion to derive the Maxwell equations, although we should note that what we present here is not the textbook derivation of the Maxwell equations from an action, but we conisder this method to be much more intrinsic and geometric. Consider first pure electromagnetism. The free Maxwell action is given by Z 1 A0F ¼ F F ½38 4 The true variables of the (quantum) theory are the A , so in eqn [38] we should put in a constraint to say that F is the curl of A [28]. This can be viewed as a topological constraint, because it is precisely equivalent to [26]. Using the method of Lagrange multipliers, we form the constrained action Z A ¼ A0F þ ð@ F Þ ½39 We can now vary this with respect to F , obtaining eqn [40], which implies [27]: F ¼ 2" @
½40
Moreover, the Lagrange multiplier is exactly the ˜ dual potential A. This derivation is entirely dual symmetric, since we can equally well use [27] as constraint for the action A0F , now considered as a functional of F (eqn [41]), and obtain [26] as the equation of motion: Z 1 0 F F ½41 AF ¼ 4 This method applies to the interaction of charges and fields as well. In this case we start with the free field plus free particle action (eqn [42]), where we assume the free particle m to satisfy the Dirac equation, Z 0 0 ði@ mÞ A ¼ AF þ ½42
205
To fix ideas, let us regard this particle carrying an ˜ . electric charge e as a monopole of the potential A Then the constraint we put in is [33], giving Z A0 ¼ A0 þ ~ ð@ F þ j Þ ½43 Variation with respect to F gives eqn [32], and varying with respect to gives ði@ mÞ ¼ eA
½44
So, the complete set of equations for a Dirac particle carrying an electric charge e in an electromagnetic field is [32], [33], and [44]. The duals of these equations will describe the dynamics of a Dirac magnetic monopole in an electromagnetic field. We see from this that the Wu–Yang criterion actually gives us an intuitively clear picture of interactions. The assertion that there is a monopole at a certain spacetime point x means that the gauge field on a 2-sphere surrounding x has to have a certain topological configuration (e.g., giving a nontrivial bundle of a particular class), and if the monopole moves to another point then the gauge field will have to rearrange itself so as to maintain the same topological configuration around the new point. There is thus naturally a coupling between the gauge field and the position of the monopole, or, in physical language, a topologically induced interaction between the field and the charge (Wu and Yang, 1976). Furthermore, this treatment of interaction between field and matter is entirely dual symmetric. As a side remark, consider that although the action A0F is not immediately identifiable as geometric in nature, the Wu–Yang criterion, by putting the topological constraint and the equation of motion on equal (or dual) footing, suggests that in fact it is geometric in a subtle manner not yet fully understood. Moreover, as pointed out, eqn [40] says that the dual potential is given by the Lagrange multiplier of the constrained action.
Nonabelian Duality Using Loop Variables The next natural step is to generalize this duality to the nonabelian Yang–Mills case. Although there is no difficulty in defining F , which is again given by [3], we immediately come to difficulties in the relation between field and potential; for example, as in eqn [11], F ðxÞ ¼ @ A ðxÞ @ A ðxÞ þ ig½A ðxÞ; A ðxÞ First of all, despite appearances the Yang–Mills equation [45] (in the free-field case) and the Bianchi
206 Electric–Magnetic Duality
identity [46] are not dual-symmetric, because the correct dual of the Yang–Mills equation ought to be ˜ is the covariant given by eqn [47], where D derivative corresponding to a dual potential: D F ¼ 0
½45
D F ¼ 0
½46
~ F ¼ 0 D
½47
Secondly, the Yang–Mills equation, unlike its abelian counterpart [27], says nothing about whether the 2-form F is closed or not. Nor is the relation [11] about exactness at all. In other words, the Yang–Mills equation does not guarantee the existence of a dual potential, in contrast to the Maxwell case. In fact, Gu and Yang have constructed a counterexample. Because the true variables of a gauge theory are the potentials and not the fields, this means that Yang–Mills theory is not symmetric under the Hodge star operation [3]. Nevertheless, electric–magnetic duality is a very useful physical concept, so one may wish to seek a more general duality transform (~), satisfying the following properties: 1. ( )~~ = ( ). 2. Electric field F ! magnetic field F˜ . 3. Both A and A˜ exist as potentials (away from charges). 4. Magnetic charges are monopoles of A , and ˜ . electric charges are monopoles of A 5. ~ reduces to in the abelian case. One way to do this is to study the Wu–Yang criterion more closely. This reveals the concept of charges as topological constraints to be crucial even in the pure field case, as can be seen in Figure 1. The point to stress is that, in the above abelian case, the condition for the absence of a topological charge (a monopole) exactly removes the redundancy of the variables F , and hence recovers the potential A .
Now the nonabelian monopole charge was defined topologically as an element of 1 (G), and this definition also holds in the abelian case of U(1), with 1 (U(1)) = Z. So the first task is to write down a condition for the absence of a nonabelian monopole. To fix ideas, let us consider the group SO(3), whose monopole charges are elements of Z2 , which can be denoted by a sign . The vacuum, charge (þ) (that is, no monopole) is represented by a closed curve in the group manifold of even winding number, and the monopole charge () by a closed curve of odd winding number. It is more convenient, however, to work in SU(2), which is the double cover of SO(3) and which has the topology of S3 , as sometimes it is useful to identify the fundamental group of SO(3) with the center of SU(2) and hence consider the monopole charge as an element of this center. There the charge (þ) is represented by a closed curve, and the charge () by a curve that winds an odd number of ‘‘half-times’’ round the sphere S3 . Since these charges are defined by closed curves, it is reasonable to try to write the constraint in terms of loop variables. The treatment presented below is not as rigorous as some others, but the latter are not so well adapted to the problem in hand. Furthermore, it is important to emphasize that this approach aims to generalize electric–magnetic duality to Yang–Mills theory in direct and close analogy to duality in electromagnetism, without any further symmetries with which it may be expedient to enrich the theory. Other approaches are referred to in the next section. Consider the gauge-invariant Dirac phase factor (or holonomy) (C) of a loop C, which can be written symbollically as a path-ordered exponential: Z 2 ½ ¼ Ps exp ig ds A ððsÞÞ_ ðsÞ ½48 0
In eqn [48], we parametrize the loop C as is eqn [49] and a dot denotes differentiation with respect to the parameter s. C:
Aµ exists as potential for Fµv [F = dA]
Poincaré
Defining constraint ∂µ*F µv = 0 [dF = 0] Gauss
Figure 1
Principal Aµ bundle trivial
No magnetic monopole e~
Geometry
Physics
f ðsÞ : s ¼ 0 ! 2; ð0Þ ¼ ð2Þ ¼ 0 g ½49
We thus regard loop variables in general as functionals of continuous piecewise smooth functions of s. In this way, loop derivatives and loop integrals are just functional derivatives and functional integrals. This means that loop derivatives (s) are defined by a regularization procedure approximating delta functions with finite bump functions and then taking limits in a definite order. For functional integrals, there exist various regularization procedures, which are treated elsewhere in this Encyclopedia.
Electric–Magnetic Duality
Polyakov (1980) introduces the logarithmic loop derivative of []: i F ½js ¼ 1 ½ ðsÞ½ g
½50
This acts as a kind of ‘‘connection’’ in loop space since it tells us how the phase of [] changes from one loop to a neighbouring loop. One can go a step further and define its ‘‘curvature’’ in direct analogy with F (x) by þ ig½F ½js; F ½js
½51
It can be shown that by using the F [js] we can rewrite the Yang–Mills action as eqn [52], where the ¯ is an infinite constant: normalization factor N Z Z 2 1 _ 2 A0F ¼ ds trfF ½jsF ½jsgjðsÞj 4N 0 ½52 However, the true variables of the theory are still the A . They represent 4 functions of a real variable, whereas the loop connections represent 4 functionals of the real function (s). Just as in the case of the F , these F [js] have to be constrained so as to recover A , but this time much more severely. It turns out that, in pure Yang–Mills theory, the constraint that says there are no monopoles ([53]) also removes the redundancy of the loop variables, exactly as in the abelian case, G ½js ¼ 0
½53
That this condition is necessary is easy to see by simple algebra. The proof of the converse of this ‘‘extended Poincare´ lemma’’ is fairly lengthy. Granted this, we can now apply the Wu–Yang criterion to the action [52] and derive the Polyakov equation [54], which is the loop version of the Yang–Mills equation: ðsÞF ½js ¼ 0
½54
In the presence of a monopole charge (), the constraint [53] will have a nonzero right-hand side, G ½js ¼ J ½js
To formulate an electric–magnetic duality that is applicable to nonabelian theory, one defines yet another set of loop variables. Instead of the Dirac phase factor [] for a complete curve [48], we consider the parallel phase transport for part of a curve from s1 to s2 : Z s2 ðs2 ; s1 Þ ¼ Ps exp ig dsA ððsÞÞ_ ðsÞ ½56 s1
Then the new variables are defined by [57]. E ½js ¼ ðs; 0ÞF ½js1 ðs; 0Þ
G ½js ¼ ðsÞF ½js ðsÞF ½js
½55
The loop current J [js] can be written down explicitly. However, its global form is much easier to understand. Recall that F [js] can be thought of as a loop connection, for which we can form its ‘‘holonomy.’’ This is defined for a closed (spatial) surface (enclosing the monopole), parametrized by a family of closed curves t (s), t = 0 ! 2. The ‘‘holonomy’’ is then the total change in phase of [t ] as t ! 2, and thus equals the charge ().
207
½57
These are not gauge invariant like F [js] and may not be as useful in general, but seem more convenient for dealing with duality. Using these variables, we now define their dual E˜ [ jt] according to ~ ½ jt!ð ðtÞÞ !1 ð ðtÞÞE Z 2 ¼ " _ ðtÞ dsE ½js_ ðsÞ_2 ðsÞ N ððsÞ ðtÞÞ
½58
In eqn [58], !(x) is a (local) rotation matrix transforming from the frame in which the orientation in internal symmetry space of the fields E [js] are measured to the frame in which the dual fields E˜ [ jt] are measured. It can be shown that this dual transform satisfies all five of the required conditions listed earlier. Electric–magnetic duality in Yang–Mills theory is now fully reestablished using this generalized duality. We have the dual pairs of equations [59]–[60] and [61]–[62]: E E ¼ 0
½59
E ¼ 0
½60
~ ¼ 0 E
½61
~ E ~ ¼ 0 E
½62
Equation [59] guarantees that the potential A exists, and so is equivalent to [53], and hence is the nonabelian analog of [26]; while equation [60] is equivalent to the Polyakov version of Yang–Mills equation [54], and hence is the nonabelian analog of [27]. Equation [61] is equivalent by duality to [59] and is the dual Yang–Mills equation. Similarly equation [62] is equivalent to [60], and guarantees ˜ the existence of the dual potential A. The treatment of charges using the Wu–Yang criterion also follows the abelian case, and will not be further elaborated here. For this and further details, the reader is referred to the orginal papers (Chan and Tsou 1993, 1999).
208 Electric–Magnetic Duality
Also, just as in the abelian case, the gauge symmetry is doubled: from the group G we deduce ˜ but that that the full gauge symmetry is in fact G G, the physical degrees of freedom remain the same. The above exposition establishes electric–magnetic duality in Yang–Mills theory only for classical fields. A hint that this duality persists at the quantum level comes from the work of ’t Hooft (1978) on confinement. There he introduces two loop quantities A(C) and B(C) that are operators in the Hilbert space of quantum states satisfying the commutation relation [63] for an SU(N) gauge theory, where n is the linking number between the two (spatial) loops C and C0 : AðCÞBðC0 Þ ¼ BðC0 ÞAðCÞ expð2in=NÞ
½63
The order or Wilson operator is given explicitly by A(C) = tr (C). These two operators play dual roles in the sense of electric–magnetic duality:
A(C) measures the magnetic flux through C and creates electric flux along C.
B(C) measures the electric flux through C and creates magnetic flux along C. By defining the disorder operator B(C) as the Wilson operator corresponding to the dual potential A˜ obtained above, one can prove the commutation relation [63], thus showing that these classical fields, when promoted to operators, retain their duality relation. Furthermore, there is a remarkable relation between the two (abstractly identical) gauge groups, in that if one is confined then the dual must be broken (that is, in the Higgs phase). This result is known as ’t Hooft’s theorem. The doubling of gauge symmetry, together with ’t Hooft’s theorem, has been applied to the confined colour group SU(3) of quantum chromodynamics (QCD), in the Dualized Standard Model, to solve the puzzle of the existence of exactly three generations of fermions, with good observational support, by identifying the (necessarily broken) dual SU(3) with the generation symmetry (Chan and Tsou, 2002).
Other Treatments of Nonabelian Duality Since Yang–Mills theory is not symmetric under the Hodge -operation, there are several routes one can take to generalize the concept of electric–magnetic duality to the nonabelian case. What was presented in the last section is a modification of the -operation so as to restore this symmetry for Yang–Mills theory, keeping to the original gauge structure as much as possible. However, Yang–Mills theory as used today in particle and field theories are usually embedded in theories with more structures.
In the simplest case we have the Standard Model of Particle Physics, which describes all of particle interactions (except gravity) and which has the gauge group usually written as SU(3) SU(2) U(1), corresponding to the SU(3) of strong interaction and SU(2) U(1) of electroweak interaction. [Strictly speaking, it is (SU(3) SU(2) U(1))=Z6 , if we have the standard particle spectrum.] However, the former group is confined and the latter broken. The breaking is usually effected by introducing scalar fields called Higgs fields into the theory. Besides the experimentally well-tested Standard Model, there are many theoretically popular models of gauge theory in which supersymmetry is postulated, thereby introducing extra symmetries into the theory. Many of these are remnants of string theory, and are usually envisaged as gauge theories in a spacetime dimension higher than 4. Because of the extra structures and increased symmetries in these theories, there is quite a proliferation of concepts of duality, which could all be thought of as generalizations of abelian electric– magnetic duality (Schwarz, 1997). They come under the names of Seiberg–Witten duality, S-duality, T-duality, mirror symmetry, and so on. All these other aspects of duality have their own entries in this Encyclopedia. See also: AdS/CFT Correspondence; Duality in Topological Quantum Field Theory; Four-Manifold Invariants and Physics; Large-N Dualities; Measure on Loop Spaces; Mirror Symmetry: a Geometric Survey; Nonperturbative and Topological Aspects of Gauge Theory; Seiberg–Witten theory; Standard Model of Particle Physics.
Further Reading Chan Hong-Mo and Tsou Sheung Tsun (1993) Some Elementary Gauge Theory Concepts. Singapore: World Scientific. Chan Hong-Mo and Tsou Sheung Tsun (1999) Nonabelian generalization of electric–magnetic duality – a brief review. International Journal of Modern Physics A14: 2139–2172. Chan Hong-Mo and Tsou Sheung Tsun (2002) Fermion generations and mixing from dualized standard model. Acta Physica Polonica B33(12): 4041–4100. Polyakov AM (1980) Gauge fields as rings of glue. Nuclear Physics B164: 171–188. Schwarz John H (1997) Lectures on superstring and M theory dualities. Nuclear Physics Proceedings Supplements 55B: 1–32. ’t Hooft G (1978) On the phase transition towards permanent quark confinement. Nuclear Physics B138: 1–25. Wu Tai Tsun and Yang Chen Ning (1976) Dirac monopoles without strings: classical Lagrangian theory. Physical Review D14: 437–445. Yang Chen Ning and Mills RL (1954) Conservation of isotopic spin and isotopic gauge invariance. Physical Review 96: 191–195.
Electroweak Theory
209
Electroweak Theory K Konishi, Universita` di Pisa, Pisa, Italy ª 2006 Elsevier Ltd. All rights reserved.
Introduction The discovery of the electroweak theory crowned long years of investigation on weak interactions. The key earlier developments included Fermi’s phenomenological four-fermion interactions for the -decay, discovery of parity violation and establishment of V A structure of the weak currents, the Feynman–Gell–Mann conserved vector current (CVC) hypothesis, current algebra and its beautiful applications in the 1960s, Cabibbo mixing and lepton–hadron universality, and finally, the proposal of intermediate vector bosons (IVBs) to mitigate the high-energy behavior of the pointlike Fermi’s interaction theory. It turned out that the scattering amplitudes in IVB theory still generally violated unitarity, due to the massive vector boson propagator, g þ q q =M2 q2 M2 þ i The electroweak theory, known as Glashow– Weinberg–Salam (GWS) theory (Weinberg 1967, Salam 1968, Taylor 1976), was born through the attempts to make the hypothesis of IVBs for the weak interactions such that it is consistent with unitarity. The GWS theory contains, and is in a sense a generalization of, quantum electrodynamics (QED) which was earlier successfully established as the quantum theory of electromagnetism in interaction with matter. GWS theory describes the weak and electromagnetic interactions in a single, unified gauge theory with gauge group SUL ð2Þ Uð1Þ
½1
Part of this gauge symmetry is realized in the so-called ‘‘spontaneously broken’’ mode; only a UEM (1) SUL (2) U(1) subgroup, corresponding to the usual local gauge symmetry of the electromagnetism, remains manifest at low energies, with a massless gauge boson (photon). The other three gauge bosons W , Z, are massive, with masses 80.4 and 91.2 GeV, respectively. The theory is renormalizable, as conjectured by S Weinberg and by A Salam, and subsequently proved by G ’t Hooft (1971), and makes welldefined predictions order by order in perturbation theory.
Since the experimental observation of neutral currents (a characteristic feature of the Weinberg– Salam theory which predicts an extra, neutral massive vector boson, Z, as compared to the naive IVB hypothesis) at Gargamelle bubble chamber at CERN (1973), the theory has passed a large number of experimental tests. The first basic confirmation also included the discovery of various new particles required by the theory: the charm quark (SLAC, BNL, 1974), the bottom quark (Fermilab, 1977), and the tau () lepton (SLAC, 1975). The heaviest top quark, having mass about two hundred times that of the proton, was found later (Fermilab, 1995). The direct observation of W and Z vector bosons was first made by UA1 and UA2 experiments at CERN (1983). The GWS theory is today one of the most precise and successful theories in physics. Even more important, perhaps, together with quantum chromodynamics (QCD), which is a SU(3) (color) gauge theory describing the strong interactions (which bind quarks into protons and neutrons, and the latter two into atomic nuclei), it describes correctly – within the present experimental and theoretical uncertainties – all the presently known fundamental forces in Nature, except gravity. The SU(3)QCD (SUL (2) U(1))GWS theory is known as the standard model (SM). Both the electroweak (GSW) theory and QCD are gauge theories with a nonabelian (noncommutative) gauge group. This type of theories, known as Yang–Mills theories, can be constructed by generalizing the well-known gauge principle of QED to more general group transformations. It is a truly remarkable fact that all of the fundamental forces known today (apart from gravity) are described by Yang–Mills theories, and in this sense a very nontrivial unification can be said to underlie the basic laws of Nature (G ’t Hooft). There are further deep and remarkable conditions (anomaly cancellations), satisfied by the structure of the theory and by the charges of experimentally known spin-1/2 elementary particles (see Tables 1 and 2), which guarantees the consistency of the theory as a quantum theory. It should be mentioned, however, that the recent discovery of neutrino oscillations (SuperKamiokande (1998), SNO, KamLAND, K2K experiments), which proved the neutrinos to possess nonvanishing masses, clearly indicates that the standard GWS theory must be extended, in an as yet unknown way.
210 Electroweak Theory
where
Table 1 Quarks and their charges SUL (2)
Quarks
cL tL uL , , d 0L s 0L b 0L
UY (1)
2
1 3
u R, c R, t R
1
4 3
2 3
d R, s R, b R
1
23
13
13
The primes indicate that the mass eigenstates are different from the states transforming as multiplets of SUL (2) UY (1): They are linearly related by CKM mixing matrix. Table 2 Leptons and their charges
SUL (2)
UY (1)
eR , R , R
are SUL (2) U(1) gauge field tensors; Lg.f. and LFP are the so-called gauge-fixing term and Faddeev– Popov ghost term, needed to define the gauge-boson propagators appropriately and to eliminate certain unphysical contributions. The gauge invariance of the theory is ensured by a set of identities (A Slavnov, J C Taylor). The quark kinetic terms have the form X i D Lquarks ¼
UEM (1)
0 0 0 eL L L , , eL L L
G ¼ @ B @ B
3
2
Leptons
a F ¼ @ Aa @ Aa þ gabc Ab Ac
UEM (1)
2
1
0 1
1
2
1
The primes indicate again that the mass eigenstates are in different from the states transforming as multiplets of SUL (2) UY (1), as required by the observed neutrino oscillations.
The following is a brief summary of the GWS theory, its characteristic features, its implications to the symmetries of Nature, the status of the precision tests, and its possible extensions.
GWS Theory All the presently known elementary particles (except for the gauge bosons W , Z, , the gluons, the graviton, possibly right-handed neutrinos) are listed in Tables 1–3 together with their charges with respect to the SUL (2) U(1) gauge group. A doublet of Higgs scalar particles is included even though the physical component (which should appear as an ordinary scalar particle) has not yet been experimentally observed. The Lagrangian is given by
quarks
where D are appropriate covariant derivatives, ig ig0 D qL ¼ @ A B qL 2 6 for the left-handed quark doublets, 2ig0 B uR D uR ¼ @ 3 ig0 D dR ¼ @ þ B dR 3 and similarly for other ‘‘up’’ quarks cR (charm) and tR (top), and ‘‘down’’ quarks, sR (strange), and bR (bottom). Analogously, the lepton kinetic terms are given by Lleptons ¼
i¼1
¼
þ
a Aa ig0 i i @ ig þ B L 2 2
3 X
i i @ þ ig0 B R
i L
i R
where i (i = 1, 2, 3) indicate the e, , lepton families; finally, the parts involving the Higgs fields are LHiggs ¼ D D þ Vð; y Þ
and
3 1X 1 a a ¼ F F G G 4 a¼1 4
LYukawa ¼
Table 3 Higgs doublet scalars and their charges
þ 0
i
Vð; y Þ ¼ 2 y ðy Þ2
The gauge kinetic terms are
i i D
i¼1
þ Lg:f: þ Lghosts
Higgs doublet
3 X i¼1
L ¼ Lgauge þ Lquarks þ Lleptons þ LHiggs þ LYukawa
Lgauge
3 X
SUL (2)
UY (1)
UEM (1)
2
1
1 0
þ 3 X iL gijd q dRj 0 i; j¼1 ! # 0 ij i L þ gu q ujR þ h:c: þ 3 X i i i ge L þ R þ h:c: 0 i;j¼1
½2
Electroweak Theory
For 2 < 0, the Higgs potential has a minimum at 2
where
2
v
6¼ 0 h i ¼ hj j þ j j i ¼ 2 2 þ 2
y
0 2
Jþ ¼
By choosing conveniently the direction of the Higgs field, its vacuum expectation value (VEV) is expressed as rffiffiffiffiffiffiffiffiffiffi þ 2 0pffiffiffi ½3 ¼ ; v 0 v= 2 The physical properties of Higgs and gauge bosons are best seen by choosing the so-called unitary gauge, ðxÞ ¼
þ 0
¼ ei
a
ðxÞ 2 =v
0 pffiffiffi ðv þ ðxÞÞ= 2
0 L ¼ Uð Þ L ; R ¼ i A ¼ Uð Þ A0 þ @ U1 ð Þ; g
Uð Þ0 ðxÞ
0 R
A
a Aa 2
and expressing everything in terms of primed variables. It is easy to see that 1. There is one physical scalar (Higgs) particle with mass, pffiffiffiffiffiffiffiffiffiffiffiffi m ¼ 22 ½4 2. The Higgs kinetic term (D0y )(D0 ) produces the gauge-boson masses g2 v2 ; M2W ¼ 4
v2 M2Z ¼ ðg 2 þ g 02 Þ 4
½5
X
Z ¼ cos W A3 sin W B ; A ¼ sin W A3 þ cos W B
L þ
L
1X þ ¼ ð1 5 Þ 2 1 VA
Jþ 2
½8
corresponds to the standard charged current, and J0 ¼ J3 sin2 W Jem
½9
is the neutral current the Z boson is P to which 3 em coupled ( J3 = (1=2) is the L L and J electromagnetic current). The model thus predicts the existence of neutral current processes, mediated by the Z boson, such as e ! e or e ! e, with cross section of the same order of that for the charged current process, e e ! e e, but with a characteristic L–R asymmetric couplings depending on the Weinberg angle. By eqn [9] appropriate ratios of cross sections, such as ( e ! e)= ( e ! e), can be used to measure sin2 W . The exchange of heavy W bosons generates an effective current–current interaction at low energies: Lc:c eff ¼
g2 J Jþ 2M2W
the well-known Fermi–Feynman–Gell-Mann LagranGffiffi F y gian p J J , with 2 VA VA GF g2 pffiffiffi ¼ 2 8M2W
3. The physical gauge bosons are the charged W , and two neutral vector bosons described by the fields
211
This means that the Higgs VEV must be taken to be 1=2
v ¼ 21=4 GF
’ 246 GeV
½10
where the mixing angle W ¼ tan
1
g0 g
sin W
g0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g2 þ g02
! Masses
is known as the Weinberg angle. The massless A field describes the photon. Fermi Interactions and Neutral Currents
The fermions interact with gauge bosons through the charge and neutral currents g L ¼ J Wþ þ Jþ W þ Ln:c: 2 g0 Y J B 2 g ¼ eJem A þ J 0 Z cos W
It is remarkable that ‘‘all’’ known masses of the elementary particles – except perhaps those of the neutrino masses – are generated in GWS theory through the spontaneous breakdown of SUL (2) U(1) symmetry, through the Higgs VEV (eqns [3] and [10]). The boson masses are given by [4] and [5]. Note that the relation ¼
½6
Ln:c: ¼ gJ3 A3 þ
½7
M2Z
M2W ¼ 1 þ OðÞ cos2 W
reflects an accidental SO(3) symmetry present (note the SO(4) symmetry of the Higgs potential in the limit ! 0, before the spontaneous breaking) in the model, called custodial symmetry. This is a characteristic, model-dependent feature of the minimal model, not
212 Electroweak Theory
necessarily required by the gauge symmetry. This relation is well met experimentally, although a quantitative discussion requires the choice of the renormalization scheme (including the definition of sin W itself) and check of consistency with various other data. The fermions get mass through the Yukawa interactions (eqn [2]); the fermion masses are arbitrary parameters of the model and cannot be predicted within the GWS theory. An important feature of this mechanism is that the coupling of the physical Higgs particle to each fermion is proportional to the mass of the latter. This should give a clear, unambiguous experimental signature for the Higgs scalar of the minimal GWS model. The recent discovery of nonvanishing neutrino masses requires the theory to be extended. Actually, there is a natural way to incorporate such masses in the standard GWS model, by a minimal extension. As the right-handed neutrinos, if they exist, are entirely neutral with respect to the SUL (2) U(1) gauge symmetry, they do not need its breaking to have mass. In other words, R may get Majorana masses, MR R R , by some yet unknown mechanism, much larger than those of other fermions (such a mechanism is quite naturally present in some grand unified models). If now the Yukawa couplings are introduced as for the quarks and for the down leptons, then the Dirac mass terms result upon condensation of the Higgs field, and the neutrino mass matrix would take the form, for one flavor (in the space of ( L , R )): 0 mD ½11 mD MR
If the Dirac masses are assumed to be of the same order of those of the quarks and if the right-handed Majorana masses MR are far larger, for example, of the order of the grand unified scale, O(1016 GeV), then diagonalization of the mass matrix would give, for the physical masses of the left-handed neutrinos, m2D =MR mD , much smaller than other fermion masses, quite naturally (‘‘see-saw’’ mechanism). CKM Quark Mixing As there is a priori no reason why the weak-interaction eigenstates should be equal to the mass eigenstates, the Yukawa couplings in eqn [2] are in general nondiagonal matrices in the flavor. Suppose that the the weak base for the quarks is given in terms of the mass eigenstates (in which quark masses are made diagonal), by unitary transformations X up X ~ Lj ~Lj ; uLi ¼ Vij u dLi ¼ Vijdown d j
then the interaction terms with W bosons [6] can be cast in the form (Kobayashi and Maskawa 1972) iL Wþ Uij LW -exc ¼ u
ðCKMÞ j dL ðCKMÞy ‘ uL
k W U þd L k‘
0
1.5–4
c (GeV)
t (GeV)
d (MeV) s (MeV) b (GeV)
1.15–1.35 174:3 5:1
4–8
80–130
B U ¼ @ Ucd
0 the Sobolev space W s,p () of order s is defined as the space of all f with the finite norm kf kW s;p ðÞ ¼ kf kp
½s
Wp ðÞ
þ
!1=p X Z Z jD f ðxÞ D f ðyÞjp jj¼½s
jx yjNþpðs½sÞ
where [s] is the integer part of s (for details, see, e.g., Adams and Fournier (2003) and Ziemer (1989)). Imbedding Theorems
One of the most useful and important features of the functions in Sobolev spaces is an improvement of their integrability properties and the compactness of various imbeddings. Theorems of this type were first proved by Sobolev and Kondrashev. Let us agree that the symbols ,! and ,!,! stand for an imbedding and for a compact imbedding, respectively. Theorem 1
Let be a Lipschitz open. Then
(i) If sp < N, then W s,p () ,! Lp () with p = Np=(N ps) (the Sobolev exponent). If jj < 1, then the target space is any Lr () with 0 < r p . If is bounded, then W s,p () ,!,! Lq () for all 1 q < p . (ii) If sp > N, then W jþs,p (),! C j () for j = 0, 1, ... . If has the Lipschitz boundary, then W jþs, p for j ¼ 0,1, . . . and ¼ s N=p: () ,! C j, () If sp > N, then W jþs,p () ,!,! C j (), j = 0,1, . . . j and W jþs,p () ,!,!W q () for all 1 q 1. If, moreover, has the Lipschitz boundary, then the provided target space can be replaced by C j, () sp > N > (s 1)p and 0 < < s N=p. Note that if the imbedding W s,p () ,! Lq () is compact for some q p, then jj < 1. Moreover, if lim supr ! 1 j{x 2 ; r jxj < r þ 1}j > 0, then W s,p () ,! Lq () cannot be compact. Traces and Sobolev Spaces of Negative Order
Let s > 0 and let be, for simplicity, a bounded open subset of R N with boundary of class C[s],1 . Then with the help of local coordinates, we can define Sobolev spaces W s,p () (also denoted H s () for p = 2) on = @ (see, e.g., Necˇas (1967) and Adams and then fj has Fournier (2003) for details). If f 2 C(), sense. Introducing the space D() of restrictions in of functions in D(RN ), one can show that if f 2 D(), we have kfj kW 11=p, p () Ckf kW 1, p () so that, in view in W 1, p (), the restriction of the density of D()
of f to can be uniquely extended to the whole W 1, p (). The result is the bounded trace operator 0 : W 1,p () ! W 11=p,p (). Moreover, every g 2 W 11=p,p () can be extended to a (nonunique) function f 2 W 1,p () and this extension operator is bounded with respect to the corresponding norms. More generally, let us suppose is of class C k1,1 by and define the operator Trn for any f 2 D() Trn f = (0 f , 1 f , . . . , k1 f ), where @jf ðxÞ @nj X j! ð@ f ðxÞ=@x Þn ; ¼ ! jj¼j
j f ðxÞ ¼
x2
is the jth-order derivative of f with respect to the outer normal n at x 2 ; by density, this operator can be uniquely extended to a continuous linear mapping defined on the space W k,p (); moreover, 0 (W k,p ()) = W k1=p,p (). The kernel of this mapping is the space W k,p () (denoted by H0k () for p = 2), where W s,p () is defined as the closure of D() in W s,p () (s > 0). For 1 < p < 1, the following holds: W s,p (RN ) = W s,p (R N ), W s,p () = W s,p () provided 0 < s 1 p. If s < 0, then the space W s,p () is defined as the dual ˚ s,p0 (), where p0 = p=(p 1) (see, e.g., Triebel to W (1978, 2001)). Observe that, for an arbitrary , a function f 2 W 1,p () has the zero trace if and only if f (x)=dist(x, ) belongs to Lp (). For p = 2, we simply denote by Hk () the dual space of H0k (). In the case of bounded opens, we recall the following useful Poincare´–Friedrichs inequality (for simplicity, we state it here in the Hilbert frame): Theorem 2 Let be bounded (at least in one direction of the space). Then there exists a positive constant CP () such that kvkL2 ðÞ CP ðÞkrvk½L2 ðÞN for all v 2 H01 ðÞ
½4
The Whole-Space Case: Riesz and Bessel Potentials
The Riesz potentials I naturally occur when one defines the formal powers of the Laplace operator . Namely, if f 2 S(RN ) and > 0, then h i F ðÞ=2 f ðÞ = jj F f ðÞ: This can be taken formally as a definition of the Riesz potential I on S 0 (RN ), I f ð:Þ ¼ F 1 ½jj F f ðÞð:Þ
Elliptic Differential Equations: Linear Theory
for any 2 R. If 0 < < N, then I f (x) = (I f )(x), where I is the inverse Fourier transform of jj , I ðxÞ ¼ C jxjN
1 C ¼ ððN Þ=2Þ N=2 2 ð=2Þ where is the Gamma function and I is the Riesz kernel. The following formula is also true: Z 1 2 dt tðNÞ=2 ejxj =t I ðxÞ ¼ C t 0 Recall that every f 2 S(RN ) can be represented as the Riesz potential I g of a suitable function g 2 S(RN ), namely g = ()=2 f ; we get the representation formula f ðxÞ ¼ I gðxÞ Z ¼ C R
N
gðyÞ jx yjN
In other words, the spaces Hs,p (R N ) are isomorphic copies of Lp (R N ). For k = 0, 1, 2, . . . , plainly Hk,2 (RN ) = W k,2 (RN ) by virtue of the Plancherel theorem. But it is true also for integer s and general 1 < p < 1 (see, e.g., Triebel (1978)). Remark 3 Much more comprehensive theory of general Besov and Lizorkin–Triebel spaces in RN has been established in the last decades, relying on the the Littlewood–Paley theory. Spaces on opens can be defined as restrictions of functions in the corresponding space on the whole RN , allowing to derive their properties from those valid for functions on RN . The justification for that are extension theorems. In particular, there exists a universal extension operator for the Lipschitz open, working for all the spaces mentioned up to now. We refer to Triebel (1978, 2001).
dy
The standard density argument implies then an appropriate statement for functions in W k,p (RN ) with an integer k and for the Bessel potential spaces H,p (RN ) – see below for their definition. The original Sobolev imbedding theorem comes from the combination of this representation and the basic continuity property of I , p < N, I: Lp ðRN Þ ! Lq ðRN Þ;
1 1 ¼ q p N
To get an isomorphic representation of a Bessel potential space (of a Sobolev space with positive integer smoothness in particular) it is more convenient to consider the Bessel potentials (of order 2 R),
Unbounded Opens and Weighted Spaces
The study of the elliptic problems in unbounded opens is usually carried out with use of suitable Sobolev weighted space. The Poisson equation u ¼ f
(with a slight abuse of the notations); the following formula for the Bessel kernel G is well known: Z 1 2 dt tðNÞ=2 eðjxj =tÞðt=4Þ G ðxÞ ¼ c1 t 0 (cf. the analogous formula for I ), where c = (4)=2 (=2). The kernels G can alternatively be expressed with help of Bessel or Macdonald functions. Now we can define the Bessel potential spaces. For s 2 R and 1 < p < 1, let Hs,p (R N ) be the space of all f 2 S 0 (RN ) with the finite norm kf kHs;p ðRN Þ Z p 1=p ¼ F 1 ð1 þ jj2 Þs=2 F f ðÞ d
in RN ; N 2
½5
is the typical example; the Poincare´ inequality [4] is not true here and it is suitable to introduce Sobolev spaces with weights. Let m 2 N, 1 < p < 1, 2 R, k = m N=p if N=p þ 2 {1, . . . , m} and k = 1 elsewhere. For an open R N , we define n Wm;p ðÞ ¼ v 2 D0 ðÞ; 0 jj k; mjj ðlog Þ1 D u 2 Lp ðÞ; k þ 1 jj m; o mþjj D u 2 Lp ðÞ
G f ðxÞ ¼ ðG f ÞðxÞ ¼ F 1 ½1 þ jj2 =2 F f ðÞ ðxÞ
RN
219
where (x) = (1 þ jxj2 )1=2 : Note that Wm,p is a reflexive Banach space for the norm k.kWm,p defined by X kmþjj ðlog Þ1 D ukpLp ðÞ kukpW m;p ¼
0jjk
þ
X
p
kmþjj D ukLp ðÞ
kþ1jjm
We also introduce the following seminorm: 0 11=p X p jujWm;p ¼ @ k D ukLp ðÞ A jj¼m
Let Wm;p ðÞ ¼ v 2 Wm;p ; 0 ðvÞ ¼ m1 ðvÞ ¼ 0g
220 Elliptic Differential Equations: Linear Theory
If is a Lipschitz domain, then Wm,p () is is the closure of D() in Wm,p (), while D() m,p0 dense in Wm,p (). We denote by W () the dual of Wm,p () (p0 = p=(p 1)). We note that these spaces also contain polynomials, Pj Wm;p ðÞ 8 N > > N > :j ¼ m p
The idea is to study in fact this new problem (showing first its equivalence with the boundaryvalue problem), noting that it makes sense for far less regular functions u, v (and also f ), in fact u, v 2 H01 () (and f 2 H 1 ()). The Lax–Milgram Theorem
N if þ 2 =Z p
The general form of a variational problem is to find u 2 V such that
elsewhere
Aðu; vÞ ¼ LðvÞ for all v 2 V
½8
where [s] is the integer part of s and P[s] = {0} if [s] < 0. The fundamental property of functions belonging to these spaces is that they satisfy the Poincare´ weighted inequality. An open is an exterior domain if it is the complement of a closure of a bounded domain in RN .
where V is a Hilbert space, A a bilinear continuous form defined on V V and L a linear continuous form defined on V. We say, moreover, that A is V-elliptic if there exists a positive constant such that
Theorem 4 Suppose that is an exterior domain N or = R N þ or = R . Then
The following theorem is due to Lax and Milgram.
(i) the seminorm j jWm,p () is a norm on Wm,p ()=Pj , equivalent to the quotient norm with j0 = min (m 1, j); (ii) the seminorm j jWm,p () is equivalent to the full norm on Wm, p ().
Aðu; uÞ kuk2V
for all u 2 V
½9
Theorem 5 Let V be a Hilbert space. We suppose that A is a bilinear continuous form on V V which is V-elliptic and that L is a linear continuous form on V. Then the variational problem [8] has a unique solution u on V. Moreover, if A is symmetric, u is characterized as the minimum value on V of the quadratic functional E defined by for all v 2 V; EðvÞ ¼ 12 Aðv; vÞ LðvÞ
½10
Variational Approach Let us first describe the method on the model problem [1]–[2], supposing f 2 L2 () and bounded. We first suppose that this problem admits a sufficiently smooth function u. Let v be any arbitrary (smooth) function; we multiply eqn [1] by v(x) and integrate with respect to x over ; this gives Z Z ðuvÞðxÞdx ¼ ðfvÞðxÞdx
Using the following Green’s formula (d (x) denotes the measure on = @ and @u(x)=@n = ru(x) n(x), where n(x) is the unit normal at point x of oriented towards the exterior of ): Z Z ðuvÞðxÞdx ¼ ðru rvÞðxÞdx Z @u v ð Þd ½6 þ @n we get, since vj = 0: A(u, v) = L(v), where we have set Z Aðu; vÞ ¼ ruðxÞ rvðxÞdx Z ½7 LðvÞ ¼ f ðxÞvðxÞdx
Remark 6 (i) We have the following ‘‘energy estimate’’: kukV 1 kLkV 0 where V 0 is the dual space to V. In the particular case of our model problem, this inequality shows the continuity of the solution u 2 H01 () with respect to the data f 2 L2 () (that can be weakened by choosing f 2 H 1 ()). (ii) Theorem 5 can be extended to sesquilinear continuous forms A defined on V V; such a form is called V-elliptic if there exists a positive constant such that Re Aðu; uÞ kuk2V
for all u 2 V
½11
(iii) Denoting by A the linear operator defined on the space V by A(u, v) = hAu, viV 0 , V , for all v 2 V, the Lax–Milgram theorem shows that A is an isomorphism from V onto its dual space V 0 , and the problem [8] is equivalent to solving the equation Au = L. (iv) Let us make some remarks concerning the numerical aspects. First, this variational formulation is the starting point of the well-known finite element method: the idea is to compute a solution of an approximate variational problem stated on a finite subspace of V (leading to the resolution of a linear
Elliptic Differential Equations: Linear Theory
221
system), with a precise control of the error with the exact solution u. Second, the equivalence with a minimization problem allows the use of other numerical algorithms.
By induction, if the data are more regular, that is, f 2 Hk () and u0 2 H kþ3=2 () (with k 2 N), and if is of class Ckþ1, 1 , we get u 2 Hkþ2 ().
Let us now present some classical examples of second-order elliptic problems than can be solved with help of the variational theory.
Remark 9 Let us point out the importance of the open geometry. For example, if is a bounded plane polygon, one can find u 2 H01 () with u 2 such that u 2 C1 (), = H 1þ=w (), where w is the biggest value of the interior angles of the polygon. In particular, if the polygon is not convex, the solution of the Dirichlet problem [12] cannot be in H 2 ().
The Dirichlet Problem for the Poisson Equation
We consider the problem on a bounded Lipschitz open RN , u ¼ f u ¼ u0
on ¼ @
½12
with u0 2 H1=2 (), so that there exists U0 2 H 1 () satisfying 0 (U0 ) = u0 . The variational formulation of problem [12] is
The Neumann Problem for the Poisson Equation
We consider the problem (n is the unit outer normal on ) u ¼ f in @u ¼ h on @n
to find u 2 U0 þ H01 ðÞ such that for all v 2 H01 ðÞ; Aðu; vÞ ¼ LðvÞ
½13
with A given by [7] and a more general L with f 2 H 1 (), defined by LðvÞ ¼ hf ; viH1 ðÞ;H1 ðÞ 0
½14
The existence and uniqueness of a solution of [13] follows from Theorem 5 (and Poincare´ inequality [4]). Conversely, thanks to the density of D() in H01 (), we can show that u satisfies [12]. More precisely, we get: Theorem 7 Let us suppose f 2 H 1 () and u0 2 H 1=2 (); let U0 2 H 1 () satisfy 0 (U0 ) = u0 . Then the boundary-value problem [12] has a unique solution u such that u U0 2 H01 (). This is also the unique solution of the variational problem [13]. Moreover, there exists a positive constant C = C() such that kukH1 ðÞ C kf kH1 ðÞ þ ku0 kH1=2 ðÞ ½15 which shows that u depends continuously on the data f and u0 . Moreover, using techniques of Nirenberg’s differential quotients, we have the following regularity result (see, e.g., Grisvard (1980)): Theorem 8 Let us suppose that is a bounded open subset of RN with a boundary of class C1, 1 and let f 2 L2 (), u0 2 H 3=2 (). Then u 2 H 2 () and each equation in [12] is satisfied almost everywhere (on for the first one and on for the boundary condition). Moreover, there exists a positive constant C = C() such that kukH2 ðÞ C½kf kL2 ðÞ þ kgkH3=2 ðÞ
½16
½17
Setting E() = {v 2 H 1 (); v 2 L2 ()}, the space is a dense subspace, and we have the following D() Green formula for all u 2 E() and v 2 H 1 (): Z uðxÞvðxÞdx
Z @u ;0 v ¼ ruðxÞ rvðxÞdx þ @n H1=2 ðÞ;H 1=2 ðÞ If u 2 H 1 () satisfies [17] with f 2 L2 () and h 2 H 1=2 (), then for any function v 2 H 1 (), we have, by virtue of the above Green formula, ~ Aðu; vÞ ¼ LðvÞ Z ~ ¼ ðfvÞðxÞdx þ hh; 0 vi 1=2 Lv H ðÞ;H 1=2 ðÞ
But, here the form A is not H 1 ()-elliptic; in fact, one can check that, if problem [17] has a solution, then we have necessarily (take v = 1 above) Z f ðxÞ dx þ hh; 1iH1=2 ðÞ;H1=2 ðÞ ¼ 0 ½18
Moreover, we note that if u is a solution, then u þ C, where C is an arbitrary constant, is also a solution. So the variational problem is not well posed on H1 (). It can, however, be solved in the quotient space H 1 ()=R, which is a Hilbert space for the quotient norm _ H1 ðÞ=R ¼ inf kv þ kkH1 ðÞ kvk k2R
½19
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi but also for the seminorm v 7! jvjH1 () = A(v, v), which is an equivalent norm on this quotient space (see Necˇas (1967)).
222 Elliptic Differential Equations: Linear Theory
Then, supposing that the data f and h satisfy the ‘‘compatibility condition’’ [18], we can apply the Lax–Milgram theorem to the variational problem to find u_ 2 V such that ~ vÞ _ vÞ _ ¼ Lð _ for all v_ 2 V Aðu;
[W 1,1 ()]N , c 2 L1 (), M is an N N square matrix with entries Mij , and r (Mrv) stands for N X @ @u Mij @xi @xj i;j¼1
½20
with V = H 1 ()=R. We get the following result (see, e.g., Necˇas (1967): Theorem 10 Let us suppose that is connected and that the data f 2 L2 () and h 2 H 1=2 () satisfy [18]. Then the variational problem [20] has a unique solution u_ in the space H 1 ()=R and this solution is continuous with respect to the data, that is, there exists a positive constant C = C() such that jujH1 ðÞ C kf kL2 ðÞ þ khkH1=2 ðÞ
We also assume that there is a positive constant M such that N X
Mij ðxÞi j M
i;j¼1
Problem with Mixed Boundary Conditions
Here we consider more general boundary conditions: the Dirichlet conditions on a closed subset 1 of = @, and the Neumann, or more generally the ‘‘Robin’’, conditions on the other part 2 = 1 . We seek u such that (f 2 L2 (), h 2 L2 (2 ), a 2 L1 (2 )) u ¼ f in u ¼ 0 on 1 @u ¼ h on 2 au þ @n
for a.e. x 2 and ¼ ð1 ; . . . ; N Þ 2 RN For given data f 2 L2 (), h 2 L2 (), we look for a solution u of the problem r ðMruÞ þ b ru þ cu ¼ f
in
au þ n ðMruÞ ¼ h
on
LðvÞ ¼
Z
f ðxÞvðxÞdx þ
Z
Supposing, for example, a 0, we get a unique solution u 2 V for this variational problem by virtue of the Lax–Milgram theorem. Moreover, if u 2 H 2 (), then u is the unique solution in H 2 () \ V of the problem [21]. The Newton Problem for More General Operators
Let be a bounded open subset of Rn . We now consider more general second-order operators of the form v 7! r.(Mrv) þ b rv þ cv, where b 2
ðh0 vÞð Þ d
½24
If the conditions
aþ
Let V = {v 2 H 1 (); 0 v = 0 on 1 }. Then [8] is the variational formulation of this problem with R R 1. A(u, v) = R ru(x) rv(x) R dx þ 2 (a0 u0 v)( )d ; 2. L(v) = f (x)v(x)dx þ 2 (h0 v)( )d .
½22
We assume that a 2 L1 (). The variational formulation of this problem is still [8], with V = H 1 () and Z Aðu; vÞ ¼ Mru rvdx Z Z þ ½b ru þ cuvdx þ a0 u0 vd ½23
c 12r b C0 0 ½21
i2
i¼1
for all u 2 u_ Moreover, if is of class C1,1 and if the data satisfy f 2 L2 (), g 2 H 1=2 (), then every u 2 u_ is such that u 2 H 2 () and it satisfies each equation in [17] almost everywhere.
N X
1 2b
C1 0
a.e. on a.e. on
are fulfilled, with (C0 , C1 ) 6¼ (0, 0), then the bilinear form A is V-elliptic and the Lax–Milgram theorem applies. A Biharmonic Problem
We consider the Dirichlet problem for the operator of fourth order: (c 2 L1 ()): 2 u þ cu ¼ f u ¼ u0 on ;
in
½25
@u ¼ h on @n
½26
Theorem 11 Let us suppose that has a boundary of class C1,1 and that the data satisfy f 2 H 2 (), u0 2 H 3=2 (), h 2 H 1=2 (). Let U0 2 H 2 () be such that 0 (U0 ) = u0 , 1 (U0 ) = h. Then, if c 0 a.e. in , the boundary value problem [25]–[26] has a unique
Elliptic Differential Equations: Linear Theory
solution u such that u U0 2 H02 (), and u is also the unique solution of the variational problem to find u 2 U0 þ H02 ðÞ such that Aðu; vÞ ¼ lðvÞ for all v 2 H02 ðÞ
½27
where l(v) = hf , viH2 (), H2 () and 0 Z Z Aðu; vÞ ¼ uðxÞvðxÞdx þ ðcuvÞðxÞdx ½28
Moreover, there exists a positive constant C = C() such that kukH2 ðÞ C ½kf kH2 ðÞ þ ku0 kH3=2 ðÞ þ khkH1=2 ðÞ
½29
which shows that u depends continuously upon the data f, u0 , and h. Remark 12 The Hilbert space choice V is of crucial importance for the V-ellipticity. In fact, let us consider for example the problem [25], with u ¼ 0 on ;
@u ¼ 0 on @n
½30
In fact, the associated bilinear form is not V-elliptic for V = H 2 () but it is V-elliptic for V = {v 2 L2 (); v 2 L2 )}.
General Elliptic Problems Here will be a bounded and sufficiently regular open subset of RN . Let us consider a general linear differential operator of the form X Aðx; DÞu ¼ a ðxÞD u; a ðxÞ 2 C ½31
223
satisfy some compatibility conditions with respect to the operator A (see Renardy and Rogers (1992) for details; these conditions were introduced by Agmon, Douglis, and Nirenberg). For example, A = (1)m m and Bj = @ j =@nj is a convenient choice. In order to show that problem [32]–[33] has a solution u 2 H 2mþr () (r 2 N), the idea is to show that the operator P defined by u 7! P(u) = (Au, B0 u, . . . , Bm1 u) is an index operator from 2mþrmj 1=2 H 2mþr () into G = H r () m1 () j=0 H and to express the compatibility conditions through the adjoint problem. We recall that a linear continuous operator P is an index operator if (1) dim Ker P < 1, and Im P closed; (2) codim Im P < 1. Then the index (P) is given by (P) = dim Ker P – codim Im P. We recall the following Peetre’s theorem: Theorem 13 Let E, F, and G be three reflexive Banach spaces such that E ,!,! F, and P a linear continuous operator from E to G. Then condition (1) is equivalent to: ‘‘there exists C 0, such that for all u 2 E, we have kukE C (kPukG þ kukF ).’’ Applying this theorem to our problem [32]–[33], condition (1) results from a priori estimates of the following type: kukH2mþr ðÞ C kPukG þ kukH2mþr1 ðÞ and condition (2) by similar a priori estimates for the dual problem.
jjl
P Setting A0 (x, ) = jj = l a (x) , we say that the operator A is elliptic at a point x if A0 (x, ) 6¼ 0 for all 2 RN {0}. One can show that, if N 3, l is even, that is, l = 2m; the same result holds for N = 2 if the coefficients a are real. Moreover, for N 3, every elliptic operator is properly elliptic, in the following sense: for any independent vectors , 0 in RN , the polynomial 7! A0 (., þ 0 ) has m roots with positive imaginary part. The aim here is to study boundary-value problems of the following type: Au ¼ f
in
½32
Second-Order Elliptic Problems We consider a second-order differential operator of the ‘‘divergence form’’ Au ¼
N X i;j¼1
ðaij ðxÞuxi Þxj þ
N X
bi ðxÞuxi þ cðxÞu
½34
i¼1
with given coefficient functions aij , bi , c (i, j = 1, . . . , N), and where we have used the notation @u uxi = @x . Such operators are said uniformly strongly i elliptic in if there exists > 0 such that X aij ðxÞi j jj2 for all x 2 ; 2 RN jij¼jjj¼1
Bj u ¼ gj on ;
j ¼ 0; . . . ; m 1;
½33
with sufficiently where A is properly elliptic on , regular coefficients, and the operators Bj are boundary operators, of order mj 2m 1, that must
Remark 14 There exist elliptic problems for which the associated variational problem does not necessarily satisfy the ellipticity condition. Let us consider
224 Elliptic Differential Equations: Linear Theory
the following example, due to Seeley: let = {(r, ) 2 (, 2) [0, 2]} and @ 2 @2 A ¼ ei e2i 1 þ 2 @ @r One can check that, for all 2 C, the problem Au þ u = f in and u = 0 on admits nonzero solutions u which are given by (with such that 2 = ) u = sin r cos (ei ) and u = sin r sin (ei ) for 6¼ 0; u = sin r and u = sin ei for = 0. Most of the results concerning existence, unicity, and regularity for second-order elliptic problems can be established thanks to a maximum principle. There exist different types of maximum principles, which we now present. Maximum Principle
Theorem 15 (Weak maximum principle). Let A be a uniformly strongly elliptic operator of the form [34] in a bounded open RN , with aij , bi , c 2 and L1 () and c 0. Let u 2 C2 () \ C() Au 0 ½resp: Au 0
in
Then
form on V V, V-coercive with respect to H, that is, there exist 0 2 R and > 0 such that ReðAðv; vÞÞ þ 0 kvk2H kvk2V
for all v 2 V
Denoting by A the operator associated with the bilinear form A (see Remark 6(iii)), the equation Au = f is equivalent to u 0 Tu = g, with T = (A þ 0 Id)1 and g = Tf . Note that T is an isomorphism from H onto D(A) = {u 2 H; Au 2 H}). The operator T : H ! H is compact and, thanks to the Fredholm alternative, there are two situations: 1. either Ker A = 0 and A is an isomorphism from D(A) onto H; 2. or Ker A 6¼ 0; then Ker A is of finite dimension, and the problem Au = f with f 2 H admits a solution if and only if f 2 Im A = [Ker(A )]? . We now give another example in a non-Hilbertian frame. Let us consider the problem (Grisvard 1980): Au = f in and Bu = g on , where is of class C1, 1 , A, which is defined by [34], is uniformly bi , c 2 L1 (), strongly elliptic with aij = aji 2 C0, 1 (), and Bu = 0 (u) or Bu = 1 (u). One can show that the operator u 7! (Au, Bu) is a Fredholm operator of index zero from W 2,p () in Lp () W 2d1=p, p () (with d = 0 if Bu = 0 (u) and d = 1 if Bu = 1 (u)).
inf u inf u ½resp: sup u sup uþ
@
þ
@
where u = max (u, 0) and u = min (u, 0). If c = 0 in , one can replace u [resp. uþ ] by u. Theorem 16 (Strong principle maximum). Under the assumptions of the above theorem, if u is not a such that Au 0 constant function in C2 () \ C() [resp. Au 0], then inf u < u (x) [resp. sup u > u(x)], for all x 2 . Remark 17 These two maximum principles can be adapted to elliptic operators in nondivergence form, that is, Au ¼
N X
aij ðxÞuxi xj þ
i;j¼1
N X
bi ðxÞuxi þ cðxÞu
½35
i¼1
Regularity
Assume that is a bounded open. Suppose that u 2 H01 () is a weak solution of the equation Au ¼ f in u ¼ 0 on
½36
where A has the divergence form [34]. We now address the question whether u is in fact smooth: this is the regularity problem for weak solutions. Theorem 18 (H 2 -regularity). Let be open, of bi , c 2 L1 (), f 2 L2 (). Supclass C1,1 , aij 2 C1 (), pose, furthermore, that u 2 H 1 () is a weak solution of [36]. Then u 2 H 2 () and we have the estimate kukH2 ðÞ Cðkf kL2 ðÞ þ kukL2 ðÞ Þ
Fredholm Alternative
We now present some existence results which are based on the Fredholm alternative rather than on the variational method. Let us consider two Hilbert spaces V and H, where V is a dense subspace of H and V ,!,! H. Denoting by V 0 the dual space of V, and identifying H with its dual space, we have the following imbeddings: V ,! H ,! V 0 . Let A be a sesquilinear
where the constant C depends only on and on the coefficients of A. Theorem 19 (Higher regularity). Let m be a nonnegative integer, be open, of class Cmþ1, 1 and assume bi , c 2 Cmþ1 (), f 2 H m (). Supthat aij 2 Cmþ1 (), 1 pose, furthermore, that u 2 H () is a weak solution of [36]. Then u 2 H mþ2 () and kukHmþ2 ðÞ Cðkf kHm ðÞ þ kukL2 ðÞ Þ
225
Elliptic Differential Equations: Linear Theory
where the constant C depends only on and on the coefficients of A. In particular, if m > N=2, then Moreover, if is of C1 class and f 2 u 2 C2 (). aij 2 C1 (), bi , c 2 C1 (), then u 2 C1 (). C1 (), Remark 20 (i) If u 2 H01 () is the unique solution of [36], one can omit the L2 -norm of u in the right-hand side of the above estimate. (ii) Moreover, let us suppose the coefficients aij , bi and c are all C1 and f 2 C1 (); then, if u 2 H 1 () satisfies Au = f , u 2 C1 (); this is due to the ‘‘hypoellipticity’’ property satisfied by the operator A. We have a similar result in the Lp frame (Grisvard 1980): Theorem 21 (W 2, p -regularity). Let be open, of bi , c 2 L1 (). class C1,1 , aij 2 C1 (), Suppose, i furthermore, that b = 0, 1 i N and c 0 a.e. Then for every f 2 Lp () there exists a unique solution u 2 W 2,p () of [36].
Unbounded Open The Whole Space
Note in passing that we shall work with the weighted Sobolev spaces Wm,p () defined in the subsection ‘‘Unbounded opens and weighted spaces.’’ Theorem 22
The following claims hold true:
(i) Let f 2 W01, p (RN ) satisfy the compatibility condition hf ; 1iW 1;p ðRN Þ W 1;p0 ðRN Þ ¼ 0 0
if p0 N
0
Then the problem [5] has a solution u 2 W01,p (RN ), which is unique up to an element in P [1N=p] and satisfies the estimate kukW 1;p ðRN Þ=P ½1N=p Ckf kW 1;p ðRN Þ 0
Further, point (i) means that the Riesz potential of second order satisfies I2: W01;p ðR N Þ?P ½1N=p0 ! W01;p ðRN Þ=P ½1N0 p
(where the initial space is the orthogonal comple1, p ment of P [1N=p0 ] in W0 (RN )) and it is an isomorphism. Note that here
1;p
W0 ðRN Þ ¼ fv 2 Lp ðRN Þ; rv 2 Lp ðRN Þg for 1 < p < N and 1=p = 1=p 1=N. And for 1 < r < N=2, we also have the continuity property I2: Lr ðRN Þ ! Lq ðRN Þ; for Remark 23
1 1 2 ¼ q r N
The problem u u ¼ f
in RN
½37
is of a completely different nature than the problem [5]. The class of function spaces appropriate for the problem [37] are the classical Sobolev spaces. With the help of the Caldero´n–Zygmund theory, one can prove that if f 2 Lp (RN ), then the unique solution of [37] belongs to W 2,p (RN ) and can be represented as the Bessel potential of second order (see Stein (1970)): u = G f , where G is the appropriate Bessel b kernel, that is, G, for which G() (1 þ jj2 )1=2 . 1 jxj Recall that in particular G(x) jxj e for N = 3. In the Hilbert case, f 2 L2 (RN ), we get u 2 L2 ðR N Þ ð1 þ jj2 Þb which, by Plancherel’s theorem, implies that u 2 H 2 (R N ). For f 2 W 1, p (RN ), the problem [37] has a unique solution u 2 W 1, p (RN ) satisfying the estimate kukW 1;p ðRN Þ Cðp; nÞkf kW 1;p ðRN Þ
0
Moreover, if 1 < p < N, then u = E f . (ii) If f 2 Lp (RN ), then the problem [5] has a 2,p solution u 2 W0 (RN ), which is unique up to an element in P [2N=p] and if 1 < p < N=2, then u=E f. The Caldero´n–Zygmund inequality 2 @ ’ @x @x p N CðN; pÞk’kLp ðRN Þ i j L ðR Þ ’ 2 DðR N Þ and Theorem 4 are crucial for establishing Theorem 22.
Exterior Domain
We consider the problem in an exterior domain with the Dirichlet boundary condition u ¼ f in u ¼ g on ¼ @
½38
where f 2 W01, p () and g 2 W 11=p, p (@). Invoking the results for RN and bounded domains, one can 1, p prove the existence of a solution u 2 W0 () which p is unique up to an element of the kernel A0 () = {z 2
226 Elliptic Differential Equations: Linear Theory 1, p
W0 (); z = 0} provided that f satisfies the compatibility condition
@’ p0 hf ; ’i ¼ g; for all ’ 2 A0 ðÞ @n
In order to prove Theorem 25, one can start with a homogeneous problem. The procedure of finding u is a simple application of the Lax–Milgram theorem. Application of de Rham’s theorem gives the pressure . We introduce the space
The kernel can be characterized in the following way: it is reduced to {0} if p = 2 or p < N and if not, then
V ¼ fv 2 D ðÞN ; div v ¼ 0g
Ap0 ðÞ ¼ fCð 1Þ; C 2 Rg if p N 3 where is (unique) solution in W01,2 () \ W01,p () of the problem = 0 in and = 1 on @, and Ap0 ðÞ ¼ fCð u0 Þ; C 2 Rg if p > N ¼ 2 R where u0 (x) = (2jj)1 log jy xj d y and is the 1, p only solution in W01, 2 () \ W0 () of the problem = 0 in and = u0 on . Remark 24 Similar results exist for the Neumann problem in an exterior domain (see Amrouche et al. (1997)). The framework of the spaces Wm, p (RN þ ) for the Dirichlet problem in R N þ was also considered in the literature. For a more general theory see Kozlov and Maz’ya (1999).
Elliptic Systems The Stokes System
The Stokes problem is a classical example in the fluid mechanics. This system models the slow motion with the field of the velocity u and the pressure , satisfying
ðSÞ
u þ r ¼ f in div u ¼ h in u¼g
on
¼ @
where > 0 denotes the viscosity, f is an exterior force, g is the velocity of the fluid on the domain boundary, and h measures the compressibility of the fluids (if h = 0, it is an incompressible fluid). The functions h and g must satisfy the compatibility condition Z Z hðxÞ dx ¼ g n d ½39
Theorem 25 Let be a Lipschitz bounded domain in R N , N 2. Let f 2 H 1 ()N , h 2 L2 (), and g 2 H 1=2 ()N satisfy [39]. Then the problem (S) has a unique solution (u, ) 2 H 1 ()N L2 ()=R satisfying the a priori estimate kukH1 ðÞ þ kkL2 ðÞ=R
C kf kH1 ðÞ þ khkL2 ðÞ þ kgkH1=2 ðÞ
and define F 2 H 1 ()N by hF; viH1 H 1 ¼ 0
for all v 2 V
Moreover, there exists 2 L2 (), unique up to an additive constant, and such that F = r. The problem (S), which we transform to the homogeneous case (h = 0, g = 0), can be formulated on an abstract level. Let X and M be two real Hilbert spaces and consider the following variational problem: Given L 2 X 0 and X 2 M0 , find (u, ) 2 X M such that Aðu; vÞ þ B½v; ¼ LðvÞ; B½u; q ¼ XðqÞ;
v2X q2M
½40
where the bilinear forms A, B and the linear form L are defined by Z Aðu; vÞ ¼ ru rv Z B½v; q ¼ ½qr v Z LðvÞ ¼ f v
Theorem 26 the space
If the bilinear form A is coercive in
V ¼ fv 2 X; B½v; q ¼ 0g
for all q 2 M
that is, if there exists > 0 such that Aðv; vÞ kvk2X ;
v2V
then the problem [40] has a unique solution (u, ) if and only if the bilinear form B satisfies the ‘‘inf–sup’’ condition: there exists > 0 such that Bðv; qÞ inf sup q2M v2X kvkX kqkM As for the Dirichlet problem, the regularity result is the following: Theorem 27 Let be a bounded domain in RN , of the class Cmþ1,1 if m 2 N and C1,1 if m = 1. Let f 2 W m,p ()N , h 2 W mþ1,p () and g 2 W mþ21=p,p ()N satisfy condition [39]. Then the problem (S) has a unique solution (u, ) 2 W mþ2,p ()N W mþ1,p ()=R.
Elliptic Differential Equations: Linear Theory
Remark 28 It is possible to solve (S) under weaker assumption, for instance, if f 2 W 1=p (0 ), h = 0 and g 2 W 1=p,p ()N . We can prove that then (u, ) 2 Lp ()N W 1,p (). The Linearized Elasticity
The equations governing the displacement u = (u1 , u2 , u3 ) of a three-dimensional structure subjected to an external force field f are written as ( is a bounded open subset of R 3 and = @) u ð þ Þrðr uÞ ¼ f 3 X
on 0
ij ðuÞ j ¼ g i
on 1 ¼ 0
j¼1
where > 0 and > 0 are two material characteristic constants, called the Lame´ coefficients, and (v = (v1 , v2 , v3 )) 3 X
½41
with "ij ðvÞ ¼ "ji ðvÞ ¼ 12 ð@j vi þ @i vj Þ where ij denotes the Kronecker symbol, that is, ij = 1, for i = j and ij = 0, for i 6¼ j. These equations describe the equilibrium of an elastic homogeneous isotropic body that cannot move along 0 ; along 1 , surface forces of density g = (g1 , g2 , g3 ) are given. The case 1 = ; physically corresponds to clamped structures. The matrix with entries "ij (u) is the linearized strain tensor while ij (u) represents the linearized stress tensor; the relationship [41] between these tensors is known as Hooke’s law. We refer for example to Ciarlet and Lions (1991) and Necˇas and Hlava´cˇek (1981) (and references therein) for most of the results stated in this paragraph. The variational formulation of this problem is to find u 2 V such that Aðu; vÞ ¼ LðvÞ for all v 2 V
½42
where the bilinear form A and the linear form L are given by Z Aðu; vÞ ¼ ½ðr uÞðr vÞ þ 2
3 X
"ij ðuÞ"ij ðvÞðxÞdx;
½43a
i;j¼1
LðvÞ ¼
Z
f ðxÞ vðxÞdx þ
0 vi ¼ 0 on 0 ; 1 i 3g To prove the ellipticity of A, one needs the following Korn inequality: There exists a positive constant C() such that, for all v = (v1 , v2 , v3 ) 2 [H1 ()]3 , we have " kvk1; CðÞ
3 X
k"ij ðvÞk2L2 ðÞ þ
i;j¼1
3 X
#1=2 kvi k2L2 ðÞ
½44
i¼1
Theorem 29 Let be a bounded open in R3 with a Lipschitz boundary, and let 0 be a measurable subset of , whose measure (with respect to the surface measure d(x)) is positive. Then the mapping " #1=2 3 X 2 v 7! k"ij ðvÞkL2 ðÞ i;j¼1
As a consequence, we get: "kk ðvÞ þ 2"ij ðvÞ
k¼1
V ¼ fv ¼ ðv1 ; v2 ; v3 Þ 2 ½H1 ðÞ3 ;
is a norm on V, equivalent to the usual norm k.k1, .
ij ðvÞ ¼ ji ðvÞ ¼ ij
The functional space V is defined as
The following result holds true:
in
u¼0
227
Z 1
gð Þ vð Þd
½43b
Theorem 30 Under the above assumptions, there exists a unique u 2 V solving the variational problem [42]–[43]. This solution is also the unique one which minimizes the energy functional Z 3 X 2 1 EðvÞ ¼ "ij ðvÞ ðxÞ dx ðr vÞ2 þ 2 2 i;j¼1 Z Z f ðxÞ vðxÞ dx þ gð Þ vð Þ d
1
over the space V.
Acknowledgmnt M Krbec and Sˇ Necˇasova´ were supported by the Institutional Research Plan No. AV0Z10190503 and by the Grant Agency of the Academy of Sciences No. IAA10019505. See also: Evolution Equations: Linear and Nonlinear; -Convergence and Homogenization; Image Processing: Mathematics; Inequalities in Sobolev Spaces; Partial Differential Equations: Some Examples; Schro¨dinger Operators; Separation of Variables for Differential Equations; Viscous Incompressible Fluids: Mathematical Theory.
Further Reading Adams RA and Fournier JJF (2003) Sobolev Spaces, 2nd edn. Amsterdam: Academic Press. Agmon S (1965) Lectures on Elliptic Boundary Value Problems. Princeton: D. van Nostrand.
228 Entanglement Amrouche C, Girault V, and Giroire J (1997) Dirichlet and Neumann exterior problems for the n-dimensional Laplace operator. An approach in weighted Sobolev spaces. Journal de Mathe´matiques Pures et Applique´es IX. Se´rie 76: 55–81. Ciarlet PG and Lions JL (1991) Handbook of Numerical Analysis, Vol. 2, Finite Element Methods. Amsterdam: North-Holland. Dautray R and Lions JL (1988) Analyse mathe´matique et calcul nume´rique pour les sciences et les techniques. Paris: Masson. Friedman A (1969) Partial Differential Equations. New York: Holt, Rinehart and Winston. Gilbarg D and Trudinger N (1977) Elliptic Partial Differential Equations of Second Order. Berlin: Springer. Grisvard P (1980) Boundary Value Problems in Non-Smooth Domains, Lecture Notes 19. University of Maryland, Department of Mathematics. Ho¨rmander L (1964) Linear Partial Differential Operators. Berlin: Springer. Kozlov V and Maz’ya V (1999) Differential Equations with Operators Coefficients with Applications to Boundary Value Problems for Partial Differential Equations. Berlin: Springer.
Ladyzhenskaya O and Uraltseva N (1968) Linear and Quasilinear Elliptic Equations. New York: Academic Press. Lions J-L and Magenes E (1968) Proble`mes aux limites non homoge`nes et applications, vol. 1. Paris: Dunod. Necˇas J (1967) Les me´thodes directes en e´quations elliptiques. Prague: Academia. Necˇas J and Hlava´cˇek I (1981) Mathematical Theory of Elastic and Elastoplastic Bodies: An Introduction. Amsterdam: Elsevier. Renardy M and Rogers RC (1992) An Introduction to Partial Differential Equations. New York: Springer. Stein EM (1970) Singular Integrals and Differentiability Properties of Functions. Princeton, NJ: Princeton University Press. Triebel H (1978) Interpolation Theory, Function Spaces, Differential Operators, VEB Deutsch. Verl. Wissenschaften, Berlin. Sec. rev. enl. ed. (1995): Barth, Leipzig. Triebel H (2001) The Structure of Functions. Basel: Birkha¨user. Weinberger HF (1965) A First Course in Partial Differential Equations. New York: Wiley. Ziemer WP (1989) Weakly Differentiable Functions. New York: Springer.
Entanglement R F Werner, Technische Universita¨t Braunschweig, Braunschweig, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction Entanglement is a type of correlation between subsystems, which cannot be explained by the action of a classical random generator. It is a key notion of quantum information theory and corresponds closely to the possibility of channels which transmit quantum information, and cannot be simulated by classical channels. In this article, we consider the development of the concept, and its qualitative aspects. The quantitative aspects are treated in a separate article (see Entanglement Measures).
Historical Development The first realization that quantum mechanics comes with new, and perhaps rather strange, correlations came in the famous 1935 paper by Einstein, Podolsky, and Rosen (EPR) (Einstein et al. 1935), in which they set up a paradox showing that the statistics of certain quantum states could not be realized by assigning wave functions to subsystems. It was in response to this paper that Schro¨dinger (1935), in the same year, coined the term ‘‘entanglement,’’ as well as its German equivalent ‘‘Verschra¨nkung.’’ The subject lay dormant for a long time, since Bohr, in his reply, completely ignored the entanglement theme, and there was a widespread reluctance in the physics community to
consider problems of interpretation. The leaf turned slowly with Bohm’s reduced model of the EPR paradox using spins rather than continuous variables, and decisively with Bell’s 1964 strengthening of the paradox (Bell 1964). He showed that not only wave functions assigned to individual systems failed to describe the correlations predicted by quantum mechanics, but any set of classical parameters assigned to the subsystems. This eliminated all reference to a possibly dubious quantum ontology and all reference to the quantum formalism from the argument. Bell derived a set of inequalities from the assumption that each subsystem could be described in terms of classical variables, and that these (possibly hidden) variables would not be changed by the mere choice of a measurement for the distant correlated system. The only relation to quantum mechanics was then the simple quantum calculation showing, in certain situations, such as the state described by EPR, quantum mechanics predicted a violation of Bell’s inequalities. This immediately suggested an experiment, and although it was difficult at first to find an efficient source of suitably quantum-correlated pairs of particles, the experiments that have been made since then have supported the quantum-mechanical result beyond reasonable doubt. This came too late for Einstein, whose research program in quantum mechanics had been precisely to build a ‘‘local hidden-variable theory’’ of the type seen in contradiction with Bell’s inequality. But at least the EPR paper had finally received the response it deserved. In Schro¨dinger’s work, entanglement was a purely qualitative term for the strange way the subsystems
Entanglement
seemed to be intertwined as soon as one insisted on discussing their individual properties. After Bell’s work, the favored mathematical definition of entanglement would probably have been the existence of measurements on the subsystems, such that Bell’s inequality (or some generalization derived on the same assumptions) is violated. However, around 1983 another notion of (the lack of) entanglement was independently proposed by Primas (1983) and Werner (1983). According to this definition, a quantum state is called unentangled if it can be written as X ¼ p 1 2 ½1
i
where the are arbitrary states of the subsystems (i = 1, 2), which depend on a ‘‘hidden variable’’ , drawn by a classical random generator with probabilities p . Such states are now called separable, which is a bit awkward, since the notion is typically applied to systems which are widely separated. However, the term is so firmly established that it is hopeless to try to improve on it. In any case, it was shown by Werner (1989) that there are nonseparable states, which nevertheless satisfy Bell’s inequalities and all its generalizations. The next step was the observation by Popescu (1994) that entanglement could be distilled: this is a process by which some number of moderately entangled pair states is converted to a smaller number of highly entangled states, using only local quantum operations, and classical communication between the parties. For some time it seemed that this might close the gap, that is, that the failure of separability might be equivalent to ‘‘distillability’’ (i.e., the existence of a distillation procedure producing arbitrarily highly entangled states from many copies of the given one). However, this turned out to be false, as shown by the Horodecki family in 1998 (Horodecki et al. 1998), by explicitly exhibiting bound entangled, that is, nonseparable, but also not distillable states. In 2003 Oppenheim and the Horodeckis introduced a further distinction, namely whether it is possible to extract a secret key from copies of a given quantum state by local quantum operations and public classical communication (Horodecki et al. 2005). This task had hitherto been viewed as an application of entanglement distillation, but it turned out that secret key can be distilled from some bound entangled (but never from separable) states. For the entanglement theory of multipartite states, that is, states on systems composed of three or more parts, between which no quantum interaction takes place, one key observation is that new entanglement properties must be expected with any increase of the
229
number of parties. As shown by Bennett et al. (1999), there are states of three parties which cannot be written in the three-party analog of [1], but are nevertheless separable for all three splits of the system into one vs. two subsystems. The crucial advance of entanglement theory, however, lies not so much in the distinctions outlined above, but in the quantitative turn of the theory. With the discovery of the teleportation and dense coding processes (Bennett and Wiesner 1992, Bennett et al. 1993), entanglement changed its role from a property of counterintuitive contortedness to a resource, which is used up in teleportation and similar processes. Distillation is then seen as a method to upgrade a given source to a new source of highly entangled states suitable for this purpose, and it is not just the possibility of doing this, but the rate of this conversion, which becomes the focus of the investigation. All the tasks in which entanglement appears suggest quantitative measures of entanglement. In addition, there are many entanglement measures, which appear natural from a mathematical point of view, or are introduced simply because they can be estimated relatively easily and in turn give bounds on other entanglement measures of interest. The current situation is that there is no shortage of entanglement measures in the literature, but it is not yet clear which ones will be of interest in the long run. Some of these measures are described in Entanglement Measures. The current state of entanglement theory is marked firstly by some long-standing open problems in the basic bipartite theory on the one hand (additivity of the entanglement of formation, the existence of NPT bound entangled states, and more recently the existence of entangled states with vanishing key rate). Secondly, there is significant effort to try to compute some of the entanglement measures, at least for simple subclasses of states. This is so difficult, because many definitions involve an optimization over operations on an asymptotically large system. Thirdly, there is a new trend in multipartite entanglement theory, namely looking specifically at entanglement in lattice structures such as spin systems of harmonic-oscillator lattices. Here one can expect very fruitful interaction with the statistical mechanics and solid-state physics in the near future.
Qualitative Entanglement Theory Setup
Throughout this section, we will consider density operators on a Hilbert space split in some fixed way into a tensor product of a Hilbert space HA for
230 Entanglement
Alice’s system and a Hilbert space HB for Bob’s system, that is, H = HA HB . For simplicity, we will mostly consider finite-dimensional spaces, and if a dimension parameter d < 1 appears, it is understood that d = dim HA = dim HB . By B(H) we will denote the set of bounded operators on a Hilbert space, and by B (H) the set of trace-class operators. We distinguish these even in the finite-dimensional case, because of their different norms. By S we will denote the state space of the combined system, that is, the set of positive elements of B (H) with trace 1. For such a density operator = AB we denote by A and B the restrictions to the subsystems, defined by the partial trace over the other system, or by tr(A F) = tr(AB (F 1)). We denote by the operation of matrix transposition, and by id the partial transposition, applied only to the second tensor factor. Since transposition is not completely positive (see Channels in Quantum Information Theory) partial transposition may take positive operators to non-positive operators. The relative entropy (see Entropy and Quantitative Transversality) of two density operators , will be used with the convention S(jj) = tr ( log log ). Witnesses and the Criterion of Positivity of Partial Transpose
A state is called separable iff it is of the form [1], and entangled otherwise. The set of separable states C is a convex subset of the set S of all states. Its extreme points are obvious from the representation [1], namely the pure product states = jA B ihA B j. Since C , like S , is a convex set in (d4 1) dimensions, Caratheodory’s theorem asserts that the sum can be taken to be a decomposition into d4 such terms. For a given , deciding whether it is separable or entangled, hence, involves a nonlinear search problem in roughly 4d5 real parameters, namely the vector components of the A , B appearing in the sum. Dually, the convex set C can be described by a set of linear inequalities. Here is a simple way of generating such inequalities: let T : B (HB ) ! B(HA ) be a positive linear map, that is, a map taking positive matrices to positive matrices. Then for A , B 0 the expression tr(A T(B )) is positive. It is also bilinear, so we can find a Hermitian operator T \ 2 B(HA HB ) such that trðA TðB ÞÞ ¼ trððA B ÞT \ Þ Since the left-hand side is positive, we see by taking convex combinations that tr(T \ ) 0 for all separable states . Hence, if we find a state with a negative expectation of T \ , we can be sure it is entangled.
Therefore, such operators T \ are called entanglement witnesses. This is often a useful criterion, especially when one has some additional information about the state, allowing for an intelligent choice of witness. It is known from the theory of ordered vector spaces and their tensor products that the set of witnesses constructed above is complete. Hence, in principle, checking all such witnesses provides a necessary and sufficient criterion for entanglement. However, in practice this remains a difficult task, because the extreme points of the set of positive maps are only known for some low dimensions. By restricting T to completely positive maps, we get a useful necessary criterion. It can be seen that it is equivalent to ðid ÞðÞ 0 that is, to the positivity of the partial transpose (PPT). States with this property are called ‘‘PPT states’’ in current jargon. Pure States, Purification
For pure states, that is, for the extreme points of S , separability is trivial to decide: since for pure states the sum [1] can only be a single term, a pure state is separable iff it factorizes. A useful observation is that, for pure states = jihj, all information about entanglement is contained in the spectrum of the reduced states. Consider a vector 2 HA HB of the form X pffiffiffiffiffi B r A ½2 ¼
A
where 2 HA andPB 2 HB are orthonormal systems, r > 0, P and r = 1. Then it is easy to A check that A = r jA ih j is the spectral resolution of the restriction. Conversely, by diagonalizing the restriction of a general unit vector , we find a biorthogonal decomposition of the from [2], also known as the Schmidt decomposition. The Schmidt spectrum {r1 , . . . , rd } hence classifies vectors up to local basis changes in HA and HB . Since any A can appear in this construction, we see that any mixed state can be considered as the restriction of a pure state, which is essentially unique, namely up to the choice of basis in the purifying system B, and up to perhaps adding or deleting some irrelevant dimensions in HB . The resulting vector is known as the purification of A . The extreme cases of [2] are pure product states on the one hand, and vectors, for which A = 1=d is the totally chaotic state. These are known as maximally entangled and embody, in the most extreme way, the observation that in quantum
Entanglement
mechanics, as opposed to classical probability, the restriction of a pure state may be mixed. Let us fix a maximally entangled vector , and the matching Schmidt bases, so that 1 X ¼ pffiffiffi jkki ½3 d k where we have used the simplified ket notation, in which only the basis label is written. Then, an arbitrary vector can be written as = (X 1) = (1 XT ), where XT denotes the matrix transpose of X. Clearly, this vector is again maximally entangled iff X is unitary. Hence, the set of maximally entangled vectors is a single orbit under unilateral unitary transformations, and we even have the choice to which side we apply the unitaries. Teleportation
Suppose we have an orthonormal basis of maximally entangled vectors 2 HA HB . By the remarks above, this is equivalent to choosing unitaries U , = 1, . . . , d2 such that = (U 1) , and tr(U U ) = d . For example, a finite Weyl system constitutes such a system of unitaries, which shows that we can find realizations in any dimension d. Suppose that Alice and Bob each own part of a system prepared in the state then they can transmit perfectly the state of a d-dimensional system, using only classical communication. Classical communication by itself would never suffice to transmit quantum information, and the entangled resource by itself does not allow the transmission of any signal. But the combination of these resources does the trick: Alice measures the observable associated with the basis on the combined system formed by the unknown input and her part of the entangled pair. The result is then transmitted to Bob, who performs a U -rotation on his part of the entangled pair, producing the output state of the teleportation. One can show by direct calculation that this is exactly equal to the input state. Note that the resource is destroyed in this process, so that for every transmission we need a fresh entangled pair. Less than maximally entangled states instead of lead to less-than-perfect transmission, which can be extended to quantitative relations between entanglement and channel capacity.
properties. It consists of the vectors 0 = , and k = i(k 1), where k , k = 1, 2, 3, denotes the Pauli matrices. Then a vector is maximally entangled iff its components are real in this basis, up to a common phase. A unitary matrix of determinant 1 factorizes into U1 U2 iff its matrix elements are real, up to a common phase. For qubit pairs, and also for dimensions 2 3, the partial transposition criterion for entanglement is necessary and sufficient, as shown by Woronowicz and the Horodecki family. Orthogonally Invariant States
A state on Cd Cd is called orthogonally invariant if, for any orthogonal matrix U (with respect to some fixed product basis) [, U U] = 0. This leaves a three-dimensional space of operators, P spanned by the identity, the permutation F = i, j jijihjij, and its b = P jiiihjjj, which is d times the partial transpose F i, j projection onto the maximally entangled vector . Figure 1 shows the plane of Hermitian operators with the described symmetry and tr = 1. Convenient b Note that these are coordinates are tr F and tr F. defined for any density operator, and R are also invariant under the ‘‘twirl’’ operation 7! dU(U U)(U U) , using the Haar measure dU, which projects onto the orthogonally invariant states. Hence, the diagram provides a section as well as a projection of the state space. The intersection of the positive operators with those having positive partial transpose is the set of PPT states, which in this case coincides with the separable states. The thin lines correspond to states of higher symmetry, namely on the one hand the ‘‘isotropic with U the complex states’’ commuting with U U, conjugate of U, and the ‘‘Werner states’’ commuting with all unitaries U U. Their intersection point is the normalized trace.
tr(ρFˆ )
1
1
Special Systems Qubits
For qubit pairs, there is a special basis of maximally entangled vectors, which has some amazing
231
tr(ρF )
Figure 1 The plane of orthogonally invariant unit trace Hermitian operators of a 3 3-system. The upright triangle gives the positive operators, and the dashed one those with positive partial transpose. The shaded area gives the PPT states.
232 Entanglement Gaussians
Multipartite Stars
In general, the entanglement in systems with infinitedimensional Hilbert spaces is more difficult to analyze. However, if the system is characterized by variables satisfying canonical commutation relations, like positions and momenta, or the components of the free quantum electromagnetic field, there is a special class of states, which is again characterized by low-dimensional matrices. This allows the discussion of entanglement questions, in a way largely parallel to the finite-dimensional theory. Let R1 , . . . , R2f denote the canonical operators, where f is the number of degrees of freedom. The commutation relations can be summarized as i[R , R ] = 1, where is the symplectic matrix. Operators R have a common set of analytic vectors, and generate the unitary Weyl operators W(a) = exp (ia R ), which describe the phase space displacements. Gaussian states are those making a 7! tr W() a Gaussian function or, equivalently, those with Gaussian Wigner function. Up to a gloal displacement, they are completely characterized by the covariance matrix
A key feature of entanglement in a multipartite system is usually referred to as ‘‘monogamy’’: when Alice shares a highly entangled state with Bob, her system cannot also be highly entangled with Bill. More formally, suppose that a multipartite state for systems A, B1 , . . . , Bn is given, such that the restriction to each pair ABk is the same bipartite state . Then as n becomes larger, the existence of such a star-shaped extension constrains to become less and less entangled. In fact, as n ! 1, this condition is equivalent to the separability of .
¼ tr ðR R þ R R Þ
½4
The only constraint for a real symmetric matrix to be a covariance matrix of a quantum state is that
þ i is a positive semidefinite matrix, which is a version of the uncertainty relations. Now for entanglement theory, we take some of the degrees of freedom as Alice’s and some as Bob’s. Separability can be characterized in terms of , namely by the condition that 0 , where 0 is the covariance matrix of a Gaussian product state. Similarly, partial transposition can be implemented as an operation on covariance matrices, which allows a simple verification of the PPT condition. It turns out that as long as one partner has only a single degree of freedom, the PPT condition is necessary and sufficient for separability, but this fails for larger systems. The pure Gaussian states allow a normal form with respect to local symplectic transformations analogous to the Schmidt decomposition. For the minimal case of one degree of freedom on either side, one obtains a one-parameter family of ‘‘two mode squeezed states.’’ Its limit for infinite squeezing parameter is the state used by EPR (Einstein et al. 1935), which, however, makes rigorous mathematical sense only as a singular state, that is, a linear functional on B(H), which can no longer be represented as the trace with a density operator.
Open Problems Recall from the introduction the following chain of inclusions: separable states states with vanishing key rate PPT state undistillable states all states The second and fourth inclusions are strict, but for the first and third one might have equality, for all we know. Especially for the third inclusion, this is a long-standing problem. Finally, we would like to point out that qualitative and conceptual aspects of entanglement are surveyed by Bub (2001), Popescu and Rohrlich (1998), and Horodecki et al. (2001). For quantitative aspects see Entanglement Measures. See also: Capacities Enhanced by Entanglement; Capacity for Quantum Information; Channels in Quantum Information Theory; Entanglement Measures; Entropy and Quantitative Transversality.
Further Reading Bell JS (1964) Physics 1: 195. Bennett CH and Wiesner S (1992) Communication via one- and two-particle operators on Einstein–Podolsky–Rosen states. Physical Review Letters 69: 28812884. Bennett CH, Brassard G, Cre´peau C et al. (1993) Teleporting an unknown quantum state via dual classical and Einstein–Podolsky– Rosen channels. Physical Review Letters 70: 1895–1899. Bennett CH, et al. (1999) Unextendible product bases and bound entanglement. Physical Review Letters 82: 5385. Bub J (2001) Quantum Entanglement and Information, The Stanford Encyclopedia of Philosophy, URL = http://plato.stanford.edu/entries/qt-entangle/. Einstein A, Podolsky B, and Rosen N (1935) Can quantummechanical description of physical reality be considered complete? Physical Review 47: 777–780.
Entanglement Measures Horodecki K, Horodecki M, Horodecki P, and Oppenheim J (2005) Secure key from bound entanglement. Physical Review Letters 94: 160502. Horodecki M, Horodecki P, and Horodecki R (1998) Mixed-state entanglement and distillation: is there a ‘‘bound’’ entanglement in nature? Physical Review Letters 80: 5239–5242. Horodecki M, Horodecki P, and Horodecki R (2001) Mixed-state entanglement and quantum communication. In: Alber G et al. (eds.) Quantum Information, Springer Tracts in Modern Physics, vol. 173. Popescu S (1994) Teleportation versus Bell’s inequalities. What is nonlocality? Physical Review Letters, 72: 797. Popescu S and Rohrlich D (1998) The joy of entanglement. In: Lo H-K, Popescu S, and Spiller T (eds.) Introduction to
233
Quantum Computation and Information. Singapore: World Scientific. Primas H (1983) Verschra¨nkte Systeme und Quantenmechanik. In: Kanitscheider B (ed.) Moderne Naturphilosophie. Wu¨rzburg: Ko¨nigshausenþNeumann. Schro¨dinger E (1935) Die gegenwa¨rtige Situation in der Quantenmechanik. Naturwiss 23: 807–812, 823–828, 844–849. See the server for open problems at URL = http://www.imaph. tu-bs.de/qi/problems. Werner R (1983) Bell’s inequalities and the reduction of statistical theories. In: Balzer W, Pearce DA, and Schmidt H-J (eds.) Reduction in Science, vol. 175, Synthese Library, (Reidel 1984). Werner RF (1989) Quantum states with Einstein–Rosen–Podolsky correlations admitting a hidden-variable model. Physical Review A 40: 4277–4281.
Entanglement Measures R F Werner, Institut fu¨r Mathematische Physik, Braunschweig, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction Entanglement, or quantum correlation, is one of the central concepts in quantum information theory. Its theory can be roughly separated into three parts. The first is qualitative, that is, it addresses the question ‘‘Is this state entangled or not?’’ The second, comparative part asks ‘‘Is this state more entangled than that state?,’’ and finally the quantitative theory asks ‘‘How entangled is this state?,’’ and gives its answers in the form of entanglement measures assigning a number to every state. Quantitative questions come up naturally whenever entanglement is used as a resource for tasks of quantum information processing. For example, entangled states are in a way the fuel for the processes of teleportation and dense coding: in each transmission step a maximally entangled pair system is required, and cannot be used for a further transmission. The process also works with less than maximally entangled states, but then it also becomes less efficient. Since entangled states created in the laboratory typically have imperfections, it becomes important to understand the rates at which imperfectly entangled states may be distilled to maximally entangled ones, and this rate is a direct measure of the usefulness of the given state for many purposes. The quantitative, task related turn is a new development in the study of the foundations of quantum mechanics. It has been imported from classical information theory, where this way of thinking has been standard for a long time. The combination makes the particular flavor of quantum information theory.
In this article we consider the comparative and quantitative aspects of entanglement. The historical aspects and qualitative theory are treated in a separate article (see Entanglement), to which we refer for basic notions and notations. The example of teleportation suggests close links between quantitative entanglement theory and the theory of capacity Bennett et al. (1996), which is the transfer rate of quantum information through a given channel. These connections are described in Quantum Channels: Classical Capacity. We follow the notations of the basic article on entanglement (see Entanglement). In particular, denotes the transpose operation, and (id ) the partial transpose. A state is called ‘‘PPT’’ if its partial transpose is positive. The two physicists operating the laboratories in which the two parts of a bipartite system are kept are called Alice and Bob, as usual. The restriction of a state to Alice’s subsystem is denoted by A .
Comparative Entanglement and Protocols Protocols
In this section we introduce relations of the kind ‘‘state 1 is more entangled than 2 .’’ We take this to mean that 2 can be obtained by applying to 1 some operations which ‘‘cannot create entanglement.’’ The definition of a class of operations of which this can be claimed then defines the comparison. It turns out that there are different choices for the class of such operations, depending on the resources available for the transformation steps. The class of operations is usually referred to as a protocol. Certainly local operations performed separately by Alice and Bob cannot increase entanglement. Alice and Bob might have to make some choices, and even if they make these according to a
234 Entanglement Measures
prearranged scheme, by using a shared table of random numbers, entanglement will not be generated. In this restrictive protocol, which we abbreviate by LO, for ‘‘local operations,’’ no communication is allowed. It is clear that by just discarding the initial state, and preparing a new one, based on the random instruction allows Alice and Bob to make any separable state, so these states come out as the ‘‘least entangled’’ ones for this and any richer protocol. Next we might allow classical communication from Alice to Bob. That is, Bob’s decision to perform some operation in his laboratory is allowed to depend on measuring results obtained by Alice in an earlier stage. Of course, Alice is not allowed to send quantum systems, since in this case she might just send a particle entangled to one of her own, and any state could be generated. This protocol is referred to as ‘‘local operations and one-way classical communication’’ (LOWC). Obviously, we might also allow Bob to talk back, arriving at ‘‘local operations and classical communication’’ (LOCC). This is the protocol underlying most of the work in entanglement theory. The drawback of the LOCC protocol is that its operations are extremely difficult to characterize: an LOCC operation can take many rounds, and there is no way to simplify a general operation to some kind of standard form. This is the main reason why other protocols have been considered. For example, it is obvious that an LOCC operation can be written as a sum of tensor products of local operations, in a form reminiscent of the definition of separability. However, such ‘‘separable superoperators’’ may fall outside LOCC. Another property easily checked for all LOCC operations is that PPT states go into PPT states. The protocol ‘‘PPT-preserving operations’’ (PPTP) can also be characterized as the set of channels T for which (id )T(id ) is positive (although not necessarily completely positive). This condition is relatively easy to handle mathematically, so that the best way to show that some 1 cannot be converted to 2 by LOCC is often to show that this transition is impossible under PPTP. The drawback of the PPTP protocol is that it may create some entanglement after all, namely arbitrary PPT states. So it properly belongs to a modified entanglement theory in which separability is replaced by the PPT condition. Converting Pure States and Majorization
The entanglement ordering is exactly known for pure states due to a famous theorem by Nielsen (1999): a pure state 1 is more entangled than a pure
state 2 under the LOCC protocol iff the restriction A A 1 is more mixed than the restriction 2 in the sense of majorization of spectra (i.e., for every k the sum of the k largest eigenvalues of A 1 is less than the corresponding sum for A ). Equivalently, there is a 2 doubly stochastic channel (completely positive linear map preserving both the identity and the trace A functional) taking A 2 to 1 . An interesting aspect of this theory is the phenomenon of catalysis: It may happen that although 1 cannot be converted by LOCC to 2 , 1 can be converted to 2 . The ‘‘catalyst’’ is a resource borrowed at the beginning of the transformation, and is returned unchanged afterwards. The order relation allowing such catalysts is yet to be fully characterized. Asymptotic Conversion
In many applications we are not interested in exact conversion of one state to another, but are quite satisfied if the transformation can be done with a small controlled error. In particular, when we ask for the achievable conversion rate between many copies of the states involved, we allow small errors, but require the errors to go to zero. Given any protocol, and states 1 , 2 , we say that 1 can be converted to 2 with rate r if, for all sufficiently large n, there is a channel of the protocol, which takes n copies of 1 , that is, the state n 1 , to a state 0 which approximates roughly m rn copies of 2 , in the sense that m rn, and the trace norm k0 m 2 k goes to zero. Of course, one is usually interested in the supremum of the achievable conversion rates, which we call simply the maximal conversion rate. In particular, when 2 is the maximally entangled pure state of a qubit pair (usually called the ‘‘singlet’’), the maximal rate is called the distillable entanglement ED (1 ). In the other direction, when 1 is the singlet, we call the inverse of the maximal conversion rate the entanglement cost EC (2 ). These are two of the key entanglement measures to be discussed below. In general, ED () < EC (), so the asymptotic conversion between different states is usually not reversible. However, this is the case for pure states, and one finds ED ðÞ ¼ EC ðÞ ¼ SðA Þ
½1
where S() = tr log2 () denotes the von Neumann entropy (see Entropy and Quantitative Transversality) based on the binary logarithm. Since one can do the conversion between different pure states via singlets, it is clear that the maximal conversion rate from a pure state 1 to a pure state 2
Entanglement Measures A equals S(A 1 )=S(2 ). Hence, in contrast to the ordering given by Nielsen’s theorem, all pure states are interconvertible, and the ordering is described by a single number. For this simplification, the allowance of small errors is crucial. Without asymptotically small but nonzero errors, it would also be impossible to obtain singlets from any generic mixed state.
Entanglement Measures Properties of Interest
We now consider more systematically functions E : S ! R defined on the states spaces of arbitrary bipartite quantum systems. When can we regard this as a measure of entanglement? The minimal requirements are that E() 0 for all , and E() = 0 for separable states. Since the choice of local bases should be irrelevant, we will require E((UA UB )(UA UB ) ) = E() for unitaries UA , UB . We also normalize all entanglement measures so that E() = 1, when is the maximally entangled state of a pair of qubits. Beyond that, consider the following: P P 1. V (Convexity E( p ) p E( )) Starting from any E, possibly defined only on a subset containing the pure states, we can enforce this property by taking the convex hull (or ‘‘roof’’) coE, defined as the largest convex function, which is E wherever it is defined. 2. M (Monotonicity) Suppose that some LOCC protocol applied to returns some classical parameter with probability P p , and in that case a bipartite state . Then p E( ) E(). 3. A (Subadditivity E(1 2 ) E(1 ) þ E(2 )) In this and the following, the tensor products of bipartite states are to be reordered from A1 B1 A2 B2 to (A1 A2 )(B1 B2 ), so the separation into Alice’s and Bob’s subsystems is respected. 4. Aþ (Superadditivity E(1 2 ) E(1 ) þ E(2 )) 5. Aþþ (Strong superadditivity E(12 ) E(1 )þ E(2 )) Here i denotes the restriction of a general state 12 to the ith subsystem. 6. A1 (Weak additivity E(n ) = nE()) This can be enforced by regularization, going from E to 1 E1 ðÞ ¼ lim Eðn Þ n n Note that this is implied by additivity, which is the conjunction of Aþ and A . 7. C (Continuity) Here it is crucial to postulate the right kind of dimensional dependence. A good choice is to demand that jE(1 ) E(2 )j log df (k1 2 k), where f is some function with limt ! 0 f (t) = 0.
235
8. L (Lockability) A property related to, but not equal to, discontinuity: a measure is called lockable, if the loss (i.e., the tracing out) of a single qubit by Alice or Bob can make E() drop by an arbitrarily large amount. The Collection of Entanglement Measures
The following are the main entanglement measures discussed in the literature. Note that all measures defined by conversion rates in principle depend on the protocol used. Unless otherwise stated, we will only consider LOCC. For every function we list in brackets the properties which are known. 1. EF (Entanglement of formation [V, M, A , C, L]) This is defined as the convex hull of the entanglement of pure states given by eqn [1]. For qubit pairs, there is a closed formula due to Wootters (1998), orthogonally invariant states (Vollbrecht and Werner 2001) (see 00510), and permutation symmetric 2-mode Gaussians. One of the big open questions is whether EF is additive. This is equivalent to EF satisfying Aþþ , and also to the additivity of Holevo’s -capacity of quantum channels (see Quantum Channels: Classical Capacity). 2. EC (Entanglement cost [V, M, A , A1 , C, L]) This was already defined in the section ‘‘Asymptotic conversion.’’ It has been shown to be equal to the regularization of EF , that is, EC = E1 F . If EF would turn out to be additive, we would thus have EC = EF . 3. ED (Distillable entanglement [M, Aþþ , A1 ]) Again, see the section ‘‘Asymptotic conversion.’’ This is one of the important measures from the practical point of view, but notoriously difficult to compute explicitly. Convexity of ED is an open problem related to the existence of bound entangled, but not PPT states. 4. E! (One-way distillable entanglement [M, Aþþ , A1 ]) Same as ED , but restricting to the LOWC protocol. Obviously, E! () ED (). There are examples of proper inequality ‘‘1, and EN () > 0. EN is an easily computed upper bound to ED , but gives the wrong value for nonmaximally entangled pure states.
236 Entanglement Measures
6. ER (Relative entropy of entanglement [V, M, A ]) This measure (Vedral et al. 1997) is motivated geometrically: it is simply the relative entropy distance of to the separable subset: ER () = inf S(k), where ranges over all separable states. ER is an upper bound to ED . However, it can be improved by taking the distance to the PPT states rather than the separable states, and by combining with EN , in the following way: 7. EB (The Rains bound [V, M, A , C]) Following Rains (2001), we set EB ðÞ ¼ infðSðkÞ þ EN ðÞÞ
where the infimum is over all states . This is still an upper bound to ED , although clearly smaller than both ER (take only separable ) and EN (take = ). No example of ED () < ER () is known, but any bound entangled nonPPT state would be such an example. 8. ES (Squashed entanglement [V, M, A , Aþþ , C, L]) This measure, introduced by Christandl and Winter (2004), amazingly has all the good properties, but is as difficult to compute as any of the other measures. ES (AB ) is the infimum over the entropy combination SðAC Þ þ SðBC Þ SðABC Þ SðC Þ over all extensions ABC of the given state AB to a system enlarged by a part C, where the density operators in the above expression are the restrictions of ABC to the subsystems indicated. 9. EK (Key rate [V, M, A ]) The bit rate at which secret key can be generated is certainly larger than ED , since distillation is one way to do it. It is, in general, strictly larger, since there are undistillable states with positive key rate. 10. EC (Concurrence [V]) This measure was originally only defined for qubit pairs, as a step in Wootter’s (1998) formula for EF in this case. It has an extension to arbitrary dimensions (Rungta et al. 2001), namely the convex hull of
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the function c(j ih j) = 2(1 tr(2 )), where = j ih jA is the reduced density operator. Both upper and lower bounds exist in the literature. The main interest in this measure stems from the fact that it has interesting extensions to the multipartite case. To conclude, we would like to point out that many of the themes discussed in this article were set by Bennett et al. (1996); their article is worth reading even today. Good review articles covering entanglement measures, with more complete references, are Plenio and Virmani (2005), Bruß (2002), and Donald et al. (2002). See also: Entanglement; Entropy and Quantitative Transversality; Quantum Channels: Classical Capacity.
Further Reading Bennett CH, DiVincenzo DP, Smolin JA, and Woottes WK (1996) Mixed-state entanglement and quantum error correction. Physical Review A 54: 3824–3851. Bruß D (2002) Characterizing entanglement. Journal of Mathematical Physics 43: 4237. Christandl M and Winter A (2004) Squashed entanglement – an additive entanglement measure. Journal of Mathematical Physics 45: 829–840. Donald MJ, Horodecki M, and Rudolph O (2002) The uniqueness theorem for entanglement measures. Journal of Mathematical Physics 43: 4252–4272 and quant-ph/0105017. Horodecki M, Horodecki P, and Horodecki R (2000) Unified approach to quantum capacities: towards quantum noisy coding theorem. Physical Review Letters 85: 433. Nielsen M (1999) Conditions for a class of entanglement transformations. Physical Review Letters 83: 436–439. Plenio MB and Virmani S (2005) An introduction to entanglement measures, quant-ph/0504163. Rains E (2001) IEEE Trans. Inf. Theory 47: 2921–2933. Rungta P et al. (2001) Universal state inversion and concurrence in arbitrary dimensions. Physical Review A 64: 042315. Vedral V, Plenio MB, Rippin MA, and Knight PL (1997) Quantifying entanglement. Physical Review Letters 78: 2275. Vollbrecht KGH and Werner RF (2001) Entanglement measures under symmetry. Physical Review A 64: 062307. Wootters WK (1998) Entanglement of formation of an arbitrary state of two qubits. Physical Review Letters 80: 2245–2248.
Entropy and Quantitative Transversality
237
Entropy and Quantitative Transversality G Comte, Universite´ de Nice Sophia Antipolis, Nice, France
In this situation
ª 2006 Elsevier Ltd. All rights reserved.
rFðx0 ;...; x0 Þ ¼
1
Introduction A mathematical law for a physical phenomenon, describing the variation of a value y(2 R) in terms of parameters xi 2 R, i 2 {1, . . . , n}, is usually given: 1. in the simplest cases (and hence in exceptional cases), by an explicit functional equation y = F(x1 , . . . , xn ), or 2. by an implicit equation G(y, x1 , . . . , xn ) = 0, or 3. more generally, by a partial differentiable equation, @ j1 j y @ jk j y H y; ;...; ; x1 ; . . . ; xn @xi1 @xi1 @xj1 @xjk
!
¼ 0 þ initial values In the first case, the exact equation y = F(x1 , . . . , xn ) fully describes the behavior of y as (x1 , . . . , xn ) vary, but in practice this information is too substantive: using the Taylor formula, knowledge of the value y0 at some point (x01 , . . . , x0n ) and of the value of @F @F 0 ;...; rFðx0 ;...;x0 Þ ¼ x1 ; . . . ; x0n n 1 @x1 @xn is enough to predict, with controlled accuracy, by linear approximation, the behavior of y for parameters (x1 , . . . , xn ) close to (x01 , . . . , x0n ). In the case (2), both the parameters (x1 , . . . , xn ) and the value y belong to the set M = {(y, x1 , . . . , xn ) 2 Rnþ1 ; G(y, x1 , . . . , xn ) = 0}, and we would like to know whether or not this set may be (at least locally around one of its point (y0 , x01 , . . . , x0n )) a graph of some function (x1 , . . . , xn ) 7! y = F(x1 , . . . , xn ), as in the case (1). Using the implicit function theorem, we may try to reduce our equation to the explicit equation of (1), and then perform a linear approximation involving rF(x0 ,..., x0n ) . Assuming that a priori 1 we know a value y0 such that for (x01 , . . . , x0n ), (y0 , x01 , . . . , x0n ) 2 M, this reduction is possible, locally around (y0 , x01 , . . . , x0n ), under the condition that @G 0 0 y ; x1 ; . . . ; x0n 6¼ 0 @y
n
@G @G ;...; @x1 @xn 0 0 . @G 0 0 y ; x1 ; . . . ; x0n y ; x1 ; . . . ; x0n @y
Now, as it is normally the case, when they come from observation, the variables x1 , . . . , xn are known with an estimate and one sees that the larger @G . @G 0 0 @G 0 0 0 0 y ; . . . ; ; x ; . . . ; x ; x ; . . . ; x y 1 n 1 n @x @xn @y 1 is, the worse the estimate on y near y0 . Furthermore, assuming that M is locally a graph of a function (x1 , . . . , xn ) 7! y = F(x1 , . . . , xn ), for a given (x1 , . . . , xn ), the exact expression of y = F(x1 , . . . , xn ) and consequently the exact value of rF(x1 ,..., xn ) is not possible to obtain; we have to approach it using an algorithm (classically the Newton algorithm), and closer @G 0 0 y ; x1 ; . . . ; x0n @y is to 0, the more such an algorithm is unstable. Finally, in the case (3), skipping technical details, we encounter the same type of difficulties: we have to avoid small values for some gradient functions at a given point, in order to obtain, locally at some point (x01 , . . . , x0n ), in a stable way, reliable information on y in terms of (x1 , . . . , xn ). To sum up, the prediction of a physical phenomenon by a mathematical law greatly depends not only on the noncancellation of some gradient functions, but, as we deal with approximations and algorithms, on how different those gradient functions are from zero. This principle, of course, extends directly to applied problems (see the last of our examples in the final section): being close to singular values essentially means that the control (e.g., of the positions of some device by a manipulator) is poor. The geometric counterpart of this analytic phenomenmon is called ‘‘transversality,’’ the condition for some function G to have a nonzero partial derivative @G 0 0 y ; x1 ; . . . ; x0n @y is equivalent to the condition rGðy0 ; x0 ;...;x0 Þ Ox1 xn ¼ Rnþ1 1
n
238 Entropy and Quantitative Transversality Definitions Oy ∇G(a)
a
β M
TaM α Ox1 . . . xn Figure 1 Transversality of the manifold M and Oy.
or to the condition Tðy0 ;x0 ;...; x0 Þ M Oy ¼ R nþ1 1
n
where Ta M is the tangent space of M at a 2 M. We say that rG(y0 , x0 ,..., x0n ) is transverse to the 1 space of parameters Ox1 xn at (y0 , x01 , . . . , x0n ), or that M is transverse to Oy at (y0 , x01 , . . . , x0n ). For some quantity > 0, the condition that @G . @G 0 0 0 ; . . . ; ; x ; . . . ; x y 1 n @x @xn 1 @G 0 0 1 y ; x1 ; . . . ; x0n @y means that the angle = (rG(y0 , x0 ,..., x0n ) ,Ox1 xn ) 1 or the angle = (T 0 0 d0 M, Oy) is smaller than (y , x1 ,..., xn ) (see Figure 1). Our purpose in the sequel is to indicate how we can quantify the situations described above (the defect of transversality), in order to generically or almost generically avoid them with quantified accuracy.
We say that (M, N) is transverse at x, and we denote it by M \j x N, if and only if jM is a submersion at x, that is, DjM(x) : Tx M ! Tx N? is onto. For a given = (1 , . . . ,p ), 1 p , we say that (M, N) is -nontransverse at x, and we denote it by M \j x N, if and only if li (DjM(x) ) i ,8 i 2 {1, . . . , p}. With these notations, we have: M \j x N (i:e:, (M, N) nontransverse at x) if and only if x 62 M \ N or M \j x N, for some with p = 0, and the more (M, N) is nontransverse, with close to (1 , . . . , p1 , 0), the less the manifolds M and N seem transverse at x 2 M \ N (see Figure 2). The final step in our formalism to give a convenient quantitative approach of transversality is the following: let X, Y be two (real) Riemannian manifolds, f : X ! Y a (smooth) mapping, N Y a submanifold of Y with codimension p in N, y 2 N, and : O ! R p a submersion, where O is an open neighborhood of x in Y, such that 1 ({0}) = N \ O. Then we say that (f , N) is transverse at x, and we denote it by f \jx N, if and only if f is submersive in x. For a given = (1 , . . . , p ), 1 p , we say that (f , N) is (, )-nontransverse at x, and we denote it by f \j (,) N, if and only if li (D[f ](x) ) x i ,8 i 2 {1, . . . , p}. Clearly, we recognize the definition of transversality and of -nontransversality of two submanifolds M, N of R n by letting f : M ! Rn be the inclusion and = jM (for more details on transversality and stability, see, e.g., Golubitski and Guillemin (1973)). With the definitions and notations above, our general problem may be posed as follows: For a Ck -regular (k 2 N [ {1}) mapping f : R n ! Rp and a given = (1 , . . . , p ), how large is the set (f , Br , ) = f ((f , Br , )), where (f , Br , ) = {x 2 Br R n ; li (Df(x) ) i , 8i 2 {1, . . . , p}} and Br is a ball of radius r in Rn ?
Quantifying Transversality Given two submanifolds M and N of the Euclidean space R n , we can measure the transversality defect of (M, N) at x 2 Rn with a differential criterion, both analytical and geometric. Let us first introduce some notations. For a given linear map L : Rn ! Rp , the image by L of the unit ball of R n is an r-dimensional ellipsoid in R p with semi-axes denoted as l1 (L) lr (L), where r is the rank of L. For r < p, we denote lrþ1 (L) = 0, . . . , lp (L) = 0. Now, let x 2 M \ N; let : Rn ! Tx N ? be the projection onto the orthogonal space of Tx N, p = n dim (N) and jM the restriction of to M.
M
N
TxM B π|M (B ) ε1
x ⊥
TxN
Figure 2 Almost-nontransversality of M and N.
Entropy and Quantitative Transversality
The ‘‘bad’’ set (f , Br , ) is called the set of -almost critical values of f (restricted to Br ). Our purpose is to show that one can control its size in terms of k and . However, before explicitly stating quantitative results, let us precise what we understand by ‘‘big set’’ or by ‘‘size of a set.’’
239
entropy dimension of A, dime (A), is the order of M(, A) as ! 0. Precisely, dime ðAÞ ¼ lim sup !0
logðMð; AÞÞ logð1=Þ
¼ inff; Mð; AÞ ð1=Þ ; for sufficiently small g We clearly have
Measure and Dimensions We have a very natural way to measure a subset A of a metric space. To do this, we consider 0 a real number and we denote ( A ¼
ðDi Þi2N ; A
[
) Di and jDi j
i2N
( ¼ inf
X
For any bounded set A in Rn , we can bound M(, A) from above by a polynomial in 1= (see Ivanov (1975) and Yomdin and Comte (2004)): Mð; AÞ cðnÞ
n X
Vi ðAÞð1=Þi
i¼0
where jDi j is the diameter of Di , H ðAÞ
dimH ðAÞ dime ðAÞ
)
jDi j ; ðDi Þi2N 2 A
i2N
and H ðAÞ ¼ lim H ðAÞ 2 R \ f1g !0
H (A) is called the -dimensional Hausdorff measure of A. It appears that when H (A) 6¼ 1, 0 H (A) = 0 for 0 > , and when H (A) 6¼ 0, 0 H (A) = 1 for 0 < . This gives rise to the following definition of the Hausdorff dimension of A: dimH ðAÞ ¼ inff; H ðAÞ ¼ 0g ¼ supf; H ðAÞ ¼ 1g The Hausdorff dimension generalizes the classical notions of dimension, for instance, when A is a subset of Rn , dimH (A) n, a d-dimensional manifold has Hausdorff dimension d, and Hn (A) is the same as the Lebesgue measure Ln of A (for a very large class of subset A, which we do not describe here. For more details on geometric measure theory, see Falconer (1986) and Federer (1969)). Another convenient notion of dimension is the (metric) entropy dimension. Let us briefly define it. For a bounded subset A in some metric space and a real number > 0, we denote M(, A) the minimal number of closed balls of radius , covering A. H (A) = log2 (M(, A)) is called the -entropy of the set A. This terminology was introduced in Kolmogorov and Tihomirov (1961) and reflects the fact that H (A) is the amount of information needed to digitally memorize A with accuracy . The
where c(n) only depends on n and Vi (A) (the ith variation of the set A) is the mean value, with respect to P (for a suitable measure), of the number of connected components of A \ P, with P an affine (n i)-dimensional space of Rn . Since for A contained in a d-dimensional manifold, Vi (A) = 0 for i > d, we deduce from this inequality that in this case M(, A) is bounded from above by a polynomial of degree d in 1=. Our goal is to explain that we can be more precise than this general inequality when A is a set of critical or almost-critical values of a Ck mapping.
Transversality Is a Generic Situation The results in this section concern critical values, and not almost-critical values. They show that a ‘‘generic’’ point of the target space is not a critical value, and the more regular, the mapping the smaller the set of critical values. Such theorems relating the regularity of a mapping and the size of its critical values are called Morse–Sard type theorems (see Sard (1942, 1958, 1965)). The simplest theorem in this direction is the following: Theorem 1 (C1 Morse–Sard theorem) (Morse 1939, Sard 1942, Holm 1987). Let f : Rn ! Rp be a C1 -regular mapping. Then Hp ((f , Br )) = 0, where (f , Br ) = f ((f, Br )) and (f , Br ) is the set of points x 2 Br where rank(Df(x) ) < p. The set (f , Br ) is the image, under f, of the points of the ball Br in the source space at which f is not submersive, that is, the set of critical values of f. Consequently, the Morse–Sard theorem ensures that for almost all points y in the target space, f 1 ({y}) is either empty or a smooth submanifold of the source space of dimension n p.
240 Entropy and Quantitative Transversality
Note that (f , Br ) = (f , Br , ) for some convenient = (1 , . . . , p ) with p = 0, because x 7! li (Df(x) ) is bounded on Br , for all i 2 {1, . . . , p}. Now, we can concentrate our attention on more singular points than the critical ones, those at which the rank of f is prescribed. Let us denote such points by (f , Br ), for < p. By definition, (f , Br ) = f ( (f , Br )), where (f , Br ) = {x 2 Br Rn ; rank(Df(x) ) }. With these notations, the result for rank-r critical values is the following: Theorem 2 (Ck Morse–Sard theorem for rank-r critical values) (Federer 1969). Let f : R n ! Rp be a Ck -regular mapping. Then Hþ(n)=k ( (f , Br )) = 0. In particular, dimH ð ðf ; Br ÞÞ þ
n k
One can produce examples showing that the bound of Theorem 2 is the sharpest one (see Comte (1996), Whitney (1935), Grinberg (1985), and Yomdin and Comte (2004)). We note that Theorem 1 is a corollary of Theorem 2 (just replace k by 1 and by p 1 in Theorem 2). This result tells nothing about the entropy dimension of (f , Br ); in the next section, we will bound the growth of entropy of almost-critical values.
Almost-Transversality Is Almost Generic n
p
k
In this section, f : R ! R is a C mapping. We denote by K a Lipschitz constant of Dk1 f on Br and by Rk (f ) the quantity (K=(k 1)!) rk . We have: Theorem 3 (Ck quantitative Morse–Sard theorem) (Yomdin 1983 Yomdin and Comte 2004). Let f : R n ! Rp be a Ck mapping, = (1 , . . . , p ), 1 p , and let us denote 0 = 1. We have (for Rk (f )): Mð; ðf ; Br ; ÞÞ C
p X i¼0
0 i
r i R ðf ÞðniÞ=k k
where C is a constant depending only on n, p, and k. As a corollary, one can bound the entropy dimension of (f , Br ) by þ (n )=k, and hence its Hausdorff dimension, again finding Theorem 2: we just have to put þ1 = 0 and 1 , . . . , large enough, that is, i i (Df(x) ), for all x 2 Br , in Theorem 3, to obtain: Theorem 4 (Ck entropy Morse–Sard theorem) (Yomdin 1983 Yomdin and Comte 2004). Let f : R n ! Rp be a Ck mapping, let us denote 0 = 1
and i = sup { i (Df(x) ); x 2 Br }, for i 2 {1, . . . , }. We have (for Rk (f )): r i R ðf ÞðniÞ=k X k Mð; ðf ; Br ÞÞ C 0 i i¼0 where C is a constant depending only on n, p, and k. In particular, dimH ð ðf ; Br ÞÞ dime ð ðf ; Br ÞÞ þ
n k
Again we have examples showing that this bound is sharp (see Yomdin and Comte 2004). Furthermore, the mapping f in Theorems 2–4 may be of real differentiability class (Ho¨lder smoothness class Ck ), with the same conclusions in these theorems. That is, k may be a real number written as k = p þ with 2 [0, 1], p 2 Nn{0}, and f is Ck means that f is p times differentiable and there exists a constant C > 0 such that for all x, y 2 Br ,kDp f(x) Dp f(y) k C kx yk (see Yomdin and Comte (2004)).
Examples Let us denote by A the set of real polynomial mappings of degree d and of the following type: x 7! Qða; xÞ ¼ 1 þ
d X
aj xj
j¼1
with a = (a1 , . . . , ad ) and kak 1 (where kk is the Euclidean norm of Rd ). We identify the set A with Bd (0, 1) = {a 2 R d ; kak 1}. We want to bound the -entropy of the set of such polynomials for which the real roots are multiple or almost multiple. We denote by V the set V = {(a, x) 2 Rdþ1 ; Q(a, x) = 0}. At points (a, x) of V with rQ(a, x) 6¼ 0, V is a C1 manifold of codimension 1 of Rdþ1 . We denote by V reg = {(a, x) 2 V;rQ(a,x) 6¼ 0} and by V sing = {(a, x) 2 V; rQ(a, x) = 0} = V n V reg . By Whitney (1957), V sing is a union of smooth manifolds of dimension d 1. A root x of a polynomial Q(a, ) is multiple if and only if Qða; xÞ ¼
@Q ða; xÞ ¼ 0 @x
Consequently, the set A of polynomials of A with multiple roots is (V sing ) [ (jV reg ), where : Rdþ1 ! R d is the standard projection (a, x) = a, and (jV reg ) is the set {(a, x) 2 V reg ; Ox T(a,x) V reg } of critical values of jV reg . By Sard’s theorem
Entropy and Quantitative Transversality
(Theorem 2), dimH ((jV reg )) d 1. Since dimH ((V sing )) d 1, we obtain: dimH (A ) d 1: thus, having distinct roots is a generic property. Let, as above, = (1 , . . . , d ) with 1 d and 0 = 1. A root x of a polynomial Q(a, ) 2 A is said to be -almost multiple if and only if Q(a, x) = 0 and V \j x Ox, that is, (a, x) 2 V sing or sin (T(a, x) V reg ,Ox) d . This condition only concerns d and we can take 1 = = d1 = 1. We denote A, to be the set of polynomials of A with (at least) a -almost multiple root. By Theorem 3, " d # d1 X 1 i 1 ; sing Mð; A nðV ÞÞ C þ d i¼0 But (V sing ) being a finite union of manifolds of dimension at most d 1, we finally obtain " d # d1 X 1 i 1 ; 0 Mð; A Þ C þd i¼0 Thus, having no -almost multiple root is -almost a generic property. In Figure 3, we represent V for d = 3 and a3 = 1,
dþ1 @Q ða; xÞ ¼ 0 W ¼ ða; xÞ 2 R ; @x
241
b ψ
rmax = a + b
a φ Figure 4 Almost-critical points of the distance function of P to the origin.
The next example comes from robotics: let us consider a planar robotic manipulator consisting of two jointed bars of length a and b, as presented in Figure 4. We may parametrize the positions of the endpoint P of this device by the angles and (see Figure 4). Now the distance r from the origin to P is r2 = kPk2 = a2 þ b2 þ 2ab cos ( ). The critical points of r are given by dr ð Þ ¼ 2ab sinð Þ ¼ 0 d and correspond to the circle = 0. The critical value of r is a þ b. Near these critical positions, the control of r with respect to is poor; we would like to avoid those near-critical values. Given > 0, the condition dr ð Þ d
Ox
implies j j arcsin(=2ab), and the -near-critical values of r are r2max r2 2ab½1 cosðarcsinð=2abÞÞ
V
W
where rmax is a þ b; thus, they are contained in an interval of length c 2 =(4ab rmax ), and M(,(r, )) c 2 =(4ab rmax ) (Theorem 3 gives M(, (r, )) C(1 þ =). See also: Entanglement; Entanglement Measures; Quantum Entropy; Singularity and Bifurcation Theory.
A\ AΣ,Λ
AΣ,Λ AΣ
Figure 3 The space of polynomials of type 1 þ a1 x þ a2 x þ x 3 with almost-multiple roots.
Further Reading Comte G (1996) Sur les hypothe`ses du the´ore`me de Sard pour le lieu critique de rang 0. C. R. Academy of Science Paris 323(I): 143–146. de Araujo Moreira CGT (2001) Hausdorff measures and the Morse–Sard theorem. Publication Mathema`tiques 45(1): 149–162.
242 Equivariant Cohomology and the Cartan Model Falconer KJ (1986) The Geometry of Fractal Sets, Cambridge Tracts in Mathematics, vol. 85. Cambridge: Cambridge University Press. Federer H (1969) Geometric Measure Theory, Grundlehren Math. Wiss., vol. 153. Springer. Golubitski V and Guillemin V (1973) Stable Mappings and Their Singularities, Graduate Texts in Mathematics, vol. 14. Springer. Grinberg EL (1985) On the smoothness hypothesis in Sard’s theorem. American Mathematical Monthly 92(10): 733–734. Holm P (1987) The theorem of Brown and Sard. L’enseignement Mathe´matiques 33: 199–202. Ivanov LD (1975) In: Vituskin AG (ed.) Variazii mnozˆhestv i funktsii (Russian) (Variations of Sets and Functions). Moscow: Nauka. Kolmogorov AN and Tihomirov VM (1961) "-Entropy and "-capacity of sets in functional space. American Mathematical Society Translations 17: 277–364. Morse AP (1939) The behavior of a function on its critical set. Annals of Mathematics 40(1): 62–70.
Sard A (1942) The measure of the critical values of differentiable maps. Bulletin of the American Mathematical Society 48: 883–890. Sard A (1958) Images of critical sets. Annals of Mathematics 68: 247–259. Sard A (1965) Hausdorff measure of critical images on Banach manifolds. American Journal of Mathematics 87: 158–174. Whitney H (1935) A function not constant on a connected set of critical points. Duke Mathematical Journal 1: 514–517. Whitney H (1957) Elementary structure of real algebraic varieties. Annals of Mathematics 66: 545–556. Yomdin Y (1983) The geometry of critical and near-critical values of differentiable mappings. Mathematische Annalen 264(4): 495–515. Yomdin Y and Comte G (2004) Tame Geometry with Application in Smooth Analysis, Lecture Notes in Mathematics, vol. 1834. Berlin: Springer.
Equivariant Cohomology and the Cartan Model E Meinrenken, University of Toronto, Toronto, ON, Canada ª 2006 Elsevier Ltd. All rights reserved.
Introduction If a compact Lie group G acts on a manifold M, the space M/G of orbits of the action is usually a singular space. Nonetheless, it is often possible to develop a ‘‘differential geometry’’ of the orbit space in terms of appropriately defined equivariant objects on M. This article is mostly concerned with ‘‘differential forms on M/G.’’ A first idea would be to work with the complex of ‘‘basic’’ forms on M, but for many purposes this complex turns out to be too small. A much more useful complex of equivariant differential forms on M was introduced by Cartan (1950). In retrospect, Cartan’s approach presented a differential form model for the equivariant cohomology of M, as defined by A Borel (1960). Borel’s construction replaces the quotient M/G by a better-behaved (but usually infinite-dimensional) homotopy quotient MG , and Cartan’s complex should be viewed as a model for forms on MG . One of the features of equivariant cohomology are the localization formulas for the integrals of equivariant cocycles. The first instance of such an integration formula was the ‘‘exact stationary phase formula,’’ discovered by Duistermaat and Heckman. This formula was quickly recognized by Berline and Vergne (1983) and Atiyah and Bott (1984), as a localization principle in equivariant cohomology. Today, equivariant localization is a basic tool in mathematical physics, with numerous applications.
This article begins with Borel’s topological definition of equivariant cohomology, then proceeds to describe H Cartan’s more algebraic approach, and concludes with a discussion of localization principles. As additional references for the material covered here, we particularly recommend books by Berline, Getzler, and Vergne (1992) and Guillemin and Sternberg (1999).
Borel’s Model of HG (M) Let G be a topological group. A G-space is a topological space M on which G acts by transformations g 7! ag , in such a way that the action map a:GM!M
½1
is continuous. An important special case of G-spaces are principal G-bundles E ! B, that is, G-spaces locally isomorphic to products U G. Definition 1 A classifying bundle for G is a principal G-bundle EG ! BG, with the following universal property: for any principal G-bundle E ! B, there is a map f : B ! BG, unique up to homotopy, such that E is isomorphic to the pullback bundle f EG. The map f is known as a ‘‘classifying map’’ of the principal bundle. To be precise, the base spaces of the principal bundles considered here must satisfy some technical condition. For a careful discussion, see Husemoller (1994). Classifying bundles exist for all G (by a construction due to Milnor (1956)), and are unique up to G-homotopy equivalence. It is a basic fact that principal G-bundles with contractible total space are classifying bundles.
Equivariant Cohomology and the Cartan Model
Examples 2 1
(i) The bundle R ! R=Z = S is a classifying bundle for G = Z. (ii) Let H be a separable complex Hilbert space, dim H = 1. It is known that unit sphere S(H) is contractible. It is thus a classifying U(1)-bundle, with the projective space P(H) as base. More generally, the Stiefel manifold St(k, H) of unitary k-frames is a classifying U(k)-bundle, with base the Grassmann manifold Gr(k, H) of k-planes. (iii) Any compact Lie group G arises as a closed subgroup of U(k), for k sufficiently large. Hence, the Stiefel manifold St(k, H) also serves as a model for EG. (iv) The based loop group G = L0 K of a connected Lie group K acts by gauge transformations on the space of connections A(S1 ) = 1 (S1 , k). This is a classifying bundle for L0 K, with base K. The quotient map takes a connection to its holonomy. For any commutative ring R (e.g., Z, R, Z2 ), let H( ; R) denote the (singular) cohomology with coefficients in R. Recall that H( ; R) is a graded commutative ring under cup product. Definition 3 The equivariant cohomology HG (M) = HG (M; R) of a G-space M is the cohomology ring of its homotopy quotient MG = EG G M: HG ðM; RÞ ¼ HðMG ; RÞ
½2
Equivariant cohomology is a contravariant functor from the category of G-spaces to the category of R-modules. The G-map M ! pt induces an algebra homomorphism from HG (pt) = H(BG) to HG (M). In this way, HG (M) is a module over the ring H(BG). Example 4 (Principal G-bundles). Suppose E ! B is a principal G-bundle. The homotopy quotient EG may be viewed as a bundle E G EG over B. Since the fiber is contractible, there is a homotopy equivalence EG ’ B
½3
and therefore HG (E) = H(B). Example 5 (Homogeneous spaces). If K is a closed subgroup of a Lie group G, the space EG may be viewed as a model for EK, with BK = EG=K = EG K (G=K). Hence, HG ðG=KÞ ¼ HðBKÞ
½4
Let us briefly describe two of the main techniques for computing HG (M). 1. Leray spectral sequences. If R is a field, the equivariant cohomology may be computed as the E1 term of the spectral sequence for the fibration MG ! BG. If BG is simply connected (as is the
243
case for all compact connected Lie groups), the E2 -term of the spectral sequence reads p;q
E2 ¼ H p ðBGÞ H q ðMÞ
½5
2. Mayer–Vietoris sequences. If M = U1 [ U2 is a union of two G-invariant open subsets, there is a long exact sequence k k k ! HG ðMÞ ! HG ðU1 Þ HG ðU2 Þ ! k kþ1 ðU1 \ U2 Þ ! HG ðMÞ ! ! HG
More generally, associated to any G-invariant open cover, there is a spectral sequence converging to HG (M). Example 6 Consider the standard U(1)-action on S2 by rotations. Cover S2 by two open sets U , given as the complement of the south pole and north pole, respectively. Since Uþ \ U retracts onto the equatorial circle, on which U(1) acts freely, its equivariant cohomology vanishes except in degree 0. On the other hand, U retract onto the poles p . Hence, by k the Mayer–Vietoris sequence the map HU(1) (S2 ) ffi k k HU(1) (pþ ) HU(1) (p ) given by pullback to the fixed points is an isomorphism for k > 0. Since the pullback map is a ring homomorphism, we conclude that HU(1) (S2 ; R) is the commutative ring generated by two elements x of degree 2, subject to a single relation xþ x = 0.
g-Differential Algebras Let G be a Lie group, with Lie algebra g. A G-manifold is a manifold M together with a G-action such that the action map [1] is smooth. We would like to introduce the concept of equivariant differential forms on M. This complex should play the role of differential forms on the infinitedimensional space MG . In Cartan’s approach, the starting point is an algebraic model for the differential forms on the classifying bundle EG. The algebraic machinery will only depend on the infinitesimal action of G. It is therefore convenient to introduce the following concept. Definition 7 Let g be a finite-dimensional Lie algebra. A g-manifold is a manifold M, together with a Lie algebra homomorphism a : g ! X(M), 7! a into the Lie algebra of vector fields on M, such that the map g M ! TM, (, m) 7! a (m) is smooth. Any G-manifold M becomes a g-manifold by taking a to be the generating vector field d ½6 a ðmÞ :¼ aexpð tÞ ðmÞ dt t¼0
244 Equivariant Cohomology and the Cartan Model
Conversely, if G is simply connected, and M is a g-manifold for which all of the vector fields a are complete, the g-action integrates uniquely to an action of the group G. The de Rham algebra ((M), d) of differential forms on a g-manifold M carries graded derivations L = L(a ) (Lie derivatives, degree 0) and = (a ) (contractions, degree 1). One has the following graded commutation relations: ½d; d ¼ 0;
½ ; ¼ 0;
½L ; d ¼ 0;
½L ; L ¼ L½;g ;
½ ; d ¼ L
½L ; ¼ ½;g
½7
½8
More generally, the following definitions are introduced. Definition 8 A g-differential algebra (g-da) is a L n commutative graded algebra A = 1 n¼0 A , equipped with graded derivations d, L , of degrees 1, 0, 1 (where L , depend linearly on 2 g), satisfying the graded commutation relations [7] and [8]. Definition 9 For any g-da TA, one defines the horizontal subalgebra A Thor = ker( ), the invariant subalgebra Ag = ker(L ), and the basic subalgebra Abasic = Ahor \ Ag . Note that the basic subalgebra is a differential subcomplex of A. Definition 10 A connection on a g-da is an invariant element 2 A1 g, with the property = . The curvature of a connection is the element F 2 A2 g given as F = d þ (1/2)[, ]g . g-da’s A admitting connections are the algebraic counterparts of (smooth) principal bundles, with Abasic playing the role of the base of the principal bundle.
Weil Algebra The Weil algebra Wg is the algebraic analog to the classifying bundle EG. Similar to EG, it may be characterized by a universal property: Theorem 11 There exists a g-da Wg with connection W , having the following universal property: if A is a g-da with connection , there is a unique algebra homomorphism c : Wg ! A taking W to . Clearly, the universal property characterizes Wg up to a unique isomorphism. To get an explicit construction, choose a basis {ea } of g, with dual basis {ea } of g . Let ya 2 ^1 g be the corresponding
generators of the exterior algebra, and va 2 S1 g the generators of the symmetric algebra. Let M Wng ¼ Si g ^j g ½9 2iþj¼n
carry the differential a b c dya ¼ va þ 12fbc y y
½10
a b c v y dva ¼ fbc
½11
a = hea , [eb , ec ]g i are the structure constants where fbc of g. Define the contractions a = ea by
a yb ¼ ab ;
a vb ¼ 0
½12
and let La = [d, a ]. Then La are the generators for the adjoint action on Wg. The element W = ya ea 2 W 1 g g is a connection on Wg. Notice that we could also use ya and dya as generators of Wg. This identifies Wg with the Koszul algebra, and implies: Theorem 12 Wg is acyclic, that is, the inclusion R ! Wg is a homotopy equivalence. Acyclicity of Wg corresponds to the contractibility of the total space of EG. The basic subalgebra of Wg is equal to (Sg )g , and the differential restricts to zero on this subalgebra, since d changes parity. Hence, if A is a g-da with connection, the characteristic homomorphism c : Wg ! A induces an algebra homomorphism, (Sg )g ! H(Abasic ). This homomorphism is independent of : Theorem 13 Suppose 0 , 1 are two connections on a g-da A. Then their characteristic homomorphisms c0 , c1 : Wg ! A are g-homotopic. That is, there is a chain homotopy intertwining contractions and Lie derivatives. Remark 14 One obtains other interesting examples of g-da’s if one drops the commutativity assumption from the definition. For instance, suppose g carries an invariant scalar product. Let Cl(g) be the corresponding Clifford algebra, and U(g) the enveloping algebra. The noncommutative Weil algebra (introduced by Alekseev and Meinrenken 2002) Wg ¼ Ug ClðgÞ
½13
is a (noncommutative) g-da, with the derivations d, La , a defined on generators by the same formulas as for Wg.
Equivariant Cohomology and the Cartan Model
Equivariant Cohomology of g-da’s In analogy to HG (M) := H(MG ), we now declare: Definition 15 The equivariant cohomology algebra of a g-da A is the cohomology of the differential algebra Ag := (Wg A)basic :
245
the map [20] specializes to the Chern–Weil homomorphism. There is an algebraic counterpart of the Leray spectral sequence: introduce a filtration M :¼ ðSi g Aq Þg ½21 Fp Apþq g 2i p
Hg ðAÞ :¼ HðAg Þ
½14
The equivariant cohomology Hg (A) has functorial properties parallel to those of HG (M). In particular, Hg (A) is a module over Hg ðf0gÞ ¼ HððWgÞbasic Þ ¼ ðSg Þg
½15
Theorem 16 Suppose A is a g-da with connection , and let c : Wg ! A be the characteristic homomorphism. Then Wg A ! A;
w x 7! cðwÞ x
½16
is a g-homotopy equivalence, with g-homotopy inverse the inclusion A ! Wg A;
x 7!1 x
½17
In particular, there is a canonical isomorphism HðAbasic Þ ffi Hg ðAÞ
½18
Proof By Theorem 13, the automorphism w x 7! 1 c(w) x of Wg A is g-homotopic to the identity map. &
The above definition of the complex Ag is often referred to as the Weil model of equivariant cohomology, while the term Cartan model is reserved for a slightly different description of Ag . Identify the space (Sg A)g with the algebra of equivariant A-valued polynomial functions : g ! A. Define a differential dg on this space by setting ðdg ÞðÞ ¼ dððÞÞ ðÞ
½19
Theorem 17 (H Cartan). The natural projection Wg A ! Sg A restricts to an isomorphism of differential algebras, Ag ffi (Sg A)g . Suppose A carries a connection . The g-homotopy equivalence [16] induces a homotopy equivalence Ag ! Abasic of the basic subcomplexes. By explicit calculation, the corresponding map for the Cartan model is given by ðSg AÞg ! Abasic ;
7! Phor ððF ÞÞ
½20
Here (F ) 2 Ag is the result of substituting the curvature of , and Phor : A ! Ahor is horizontal projection. On elements of (Sg )g (Sg A)g ,
Since second term in the equivariant differential [19] raises the filtration degree by 2, it follows that p;q
E2 ¼ ðSp=2 g Þg H q ðAÞ
½22
q for p even, Ep, 2 = 0 for p odd. In fortunate cases, the spectral sequence collapses at the E2 -stage (see below).
Equivariant de Rham Theory We will now restrict ourselves to the case that A = (M) is the algebra of differential forms on a G-manifold, where G is compact and connected. Theorem 18 (Equivariant de Rham theorem). Suppose G is a compact, connected Lie group, and that M is a G-manifold. Then there is a canonical isomorphism HG ðM; RÞ ffi Hg ððMÞÞ
½23
where the left-hand side is the equivariant cohomology as defined by the Borel construction. Motivated by this result, the notation can be changed slightly; write G ðMÞ ¼ ðSg ðMÞÞG
½24
for the Cartan complex of equivariant differential forms, and dG for the equivariant differential [19]. Remark 19 Theorem 18 fails, in general, for noncompact Lie groups G. A differential form model for the noncompact case was developed by Getzler (1990). Example 20 Let (M, !) be a symplectic manifold, and a : G ! Diff(M) a Hamiltonian group action. That is, a preserves the symplectic form, ag ! = !, and there exists an equivariant moment map : M ! g such that ! þ dh, i = 0. Then the equivariant symplectic form !G () := ! þ h, i is equivariantly closed. Example 21 Let G be a Lie group, and denote, respectively, by L ¼ g 1 dg and R ¼ dgg 1
½25
246 Equivariant Cohomology and the Cartan Model
the left- and right-invariant Maurer–Cartan forms. Suppose g = Lie(G) carries an invariant scalar product ‘‘’’, and consider the closed 3-form 1 L ¼ 12 ½L ; L
½26
Then G ðÞ ¼ þ 12 ðL þ R Þ
½27
is a closed equivariant extension for the conjugation action of G. More generally, transgression gives explicit differential forms j generating the cohomology ring H(G) = ( ^ g )G . Closed equivariant extensions of these forms were obtained by Jeffrey (1995), using a construction of Bott–Shulman. A G-manifold is called equivariantly formal if G
HG ðMÞ ¼ ðSg Þ HðMÞ
½28
G
as an (Sg ) -module. Equivalently, this is the condition that the spectral sequence [22] for HG (M) collapses at the E2 -term. M is equivariantly formal under any of the following conditions: (1) H q (M) = 0 for q odd, (2) the map HG (M) ! H(M) is onto, (3) M admits a G-invariant Morse function with only even indices, and (4) M is a symplectic manifold and the G-action is Hamiltonian. (The last fact is a theorem due to Ginzburg and Kirwan. Example 22 The conjugation action of a compact Lie group is equivariantly formal, by criterion [2]. In this case, eqn [28] is an isomorphism of algebras. It is important to note that eqn [28] is not an algebra isomorphism, in general. Already the rotation action of G = U(1) on M = S2 , discussed in Example 6, provides a counter-example. Theorem 23 (Injectivity). Suppose T is a compact torus, and M is T-equivariantly formal. Then the pullback map HT (M) ! HT (MT ) to the fixed point set is injective. Since the pullback map to the fixed point set is an algebra homomorphism, one can sometimes use this result to determine the algebra structure on HT (M): let r 2 H(M) be generators of the ordinary cohomology algebra, and let (r )T be equivariant extensions. Denote by xr 2 HT (MT ) the pullbacks of (r )T to the fixed point set, and let yi be a basis of t , viewed as elements of St HT (MT ). Then HT (M) is isomorphic to the subalgebra of HT (MT ) generated by the xr and yj . The case of nonabelian compact groups G may be reduced to maximal torus T using the following result. Observe that for any G-manifold M, there is a natural action of the Weyl group W = N(T)=T on HT (M).
Theorem 24
The natural restriction map HG ðM; RÞ ! HT ðM; RÞW
½29
onto the Weyl group invariants is an algebra isomorphism. Remark 25 The Cartan complex [24] may be viewed as a small model for the differential forms on the infinite-dimensional space MG . In the noncommutative case, there exists an even ‘‘smaller’’ Cartan model, with underlying complex (Sg )G (M)G , involving only invariant differential forms on M (see Alekseev and Meinrenken (2005) and Goresky, Kottwitz, and MacPherson (1998)).
Equivariant Characteristic Forms Let G be a compact Lie group, and E ! B a principal G-bundle with connection 2 1 (E) g. Suppose the principal G-action commutes with the action of a compact Lie group K on E, and that is K-invariant. The K-equivariant curvature of is defined as follows: FK ¼ dK þ 12½; 2 2K ðEÞ g By the equivariant version of eqn [20], there is a canonical chain map KG ðEÞ ! K ðBÞ
½30
defined by substituting the K-equivariant curvature for the g-variable, followed by horizontal projection with respect to . The Cartan map [30] is homotopy inverse to the pullback map from K (B) to KG (B). Example 26 The complex KG (E) contains a subcomplex (Sg )G . The restriction of eqn [30] is the equivariant Chern–Weil map ðSg ÞG ! K ðBÞ
½31
Forms in the image of eqn [31] are equivariantly closed; they are called the K-equivariant characteristic forms of E. Example 27 Similarly, if V ! B is a K-equivariant vector bundle with structure group G GL(k), one defines the K-equivariant characteristic forms of V to be those of the corresponding bundle of G-frames in V. For instance, suppose V is an oriented K-equivariant vector bundle of even rank k, with an invariant metric and compatible connection. The Pfaffian defines an invariant polynomial on so(k): 7! det1=2 ð =2 Þ
½32
Equivariant Cohomology and the Cartan Model
(equal to 0 if k is odd). The K-equivariant characteristic form of degree k on B determined by eqn [32] is known as the equivariant Euler form EulK ðVÞ 2
kK ðBÞ
½33
Similarly, one defines equivariant Pontrjagin forms of V, and (for Hermitian vector bundles) equivariant Chern forms. Example 28 Suppose G is a maximal rank subgroup of the compact Lie group K. The bundle K ! K=G admits a unique K-invariant connection. Hence, one obtains a canonical chain map (Sg )G ! K (K=G), realizing the isomorphism HK (K=G) ffi (Sg )G . In particular, any G-invariant element of g defines a closed K-equivariant 2-form on K/G. For instance, symplectic forms on coadjoint orbits are obtained in this way. Suppose M is a G-manifold, and let Q = E G M be the associated bundle. For any K-invariant connection on E, one obtains a chain map G ðMÞ ! KG ðE MÞ ! K ðQÞ
½34
by composing the pullback to E M with the Cartan map for the principal bundle E M ! Q. Example 29 Suppose (M, !) is a Hamiltonian G-manifold, with moment map : M ! g . The image of !G = ! þ under the map [34] defines a closed K-equivariant 2-form on Q. This construction is of importance in symplectic geometry, where it arises in the context of Sternberg’s minimal coupling.
Equivariant Thom Forms Let : V ! B be a G-equivariant oriented real vector bundle of rank k over a compact base B. There is a canonical chain map, called fiber integration
: ðVÞcp ! k ðBÞ
½35
where the subscript indicates ‘‘compact support.’’ It is characterized by the following properties:
247
Theorem 30 (Equivariant Thom isomorphism). Fiber integration defines an isomorphism,
þk
ðVÞcp ! HG ðBÞ HG
½37
An equivariant Thom form for a G-vector bundle is a cocycle ThG (V) 2 kG (V)cp , with the property,
ThG ðVÞ ¼ 1
½38
Given ThG (V), the inverse to eqn [37] is realized on the level of differential forms as G ðBÞ ! þk G ðEÞ;
7! ThG ðVÞ ^
½39
A beautiful ‘‘universal’’ construction of Thom forms was obtained by Mathai and Quillen (1986). Using eqn [34], it suffices to describe an SO(k)equivariant Thom form for the trivial bundle Rk ! {0}. Using multi-index notation for ordered subsets I {1, . . . , k}, 2 c e kxk X 1=2 I k ThSOðkÞ ðR Þð Þ ¼ k=2 I det ðdxÞI ½40 2
I Here the sum is over all subsets I with jIj even, and Ic is the complement of I. The matrix I is obtained from by deleting all rows and columns that are not in I, and det1=2 is defined as a Pfaffian. Finally, I is the sign of the shuffle permutation defined by I, that c is, (dx)I (dx)I = I dx1 dxk . As shown by Mathai and Quillen, the form [40] is equivariantly closed, and clearly eqn [38] holds since the top degree part is just a Gaussian. If k is even, the Mathai–Quillen formula can also be written, on the open dense where 2 so(k) is invertible, as 2 1 1=2 k ThSOðkÞ ðR Þð Þ ¼ det e kxk hdx; ðdxÞi ½41 2
The form ThSO(k) (Rk ) given by these formulas does not have compact support, but is rapidly decreasing at infinity. One obtains a compactly supported Thom form, by applying an SO(k)-equivariant diffeomorphism from Rk onto some open ball of finite radius. Note that the pullback of eqn [40] to the origin is equal to det1=2 ( =2 ) (equal to 0 if k is odd). This implies:
(1) for a form of degree k, the value of its fiber integral at x 2 B is equal to the integral over the fiber V x , and (2)
Theorem 31 Let : B ! V denote the inclusion of the zero section. Then
ð ^ Þ ¼ ^
where EulG (V) 2 kG (B) is the equivariant Euler form.
½36
for all 2 (V)cp and 2 (B). Fiber integration extends to G-equivariant differential forms, and commutes with the equivariant differential.
ThG ðVÞ ¼ EulG ðVÞ
½42
Suppose, M is a G-manifold, and S a closed G-invariant submanifold with oriented normal
248 Equivariant Cohomology and the Cartan Model
bundle S . Choose a G-equivariant tubular neighborhood embedding
S ! U M
½43
and let PDG (S) 2 G (M)cp be the image of ThG (V) under this embedding. The form PDG (S) has the property Z Z PDG ðSÞ ^ ¼ S ½44 M
S
for all closed equivariant forms 2 G (M). It is called an ‘‘equivariant Poincare´ dual’’ of S. By construction, the pullback to S is the equivariant Euler form: S PDG ðSÞ ¼ EulG ð S Þ
½45
Equivariant Poincare´ duality takes transversal intersections of G-manifolds to wedge products, similar to the nonequivariant case. Remark 32 In general, the (Sg )G -submodule generated by Poincare´ duals of G-invariant submanifolds is strictly smaller than HG (M). In this sense, the terminology ‘‘duality’’ is misleading.
Localization Theorem In this section, T will denote a torus. Suppose M is a compact oriented T-manifold. For any component F of the fixed point set of T, the action of T on F fixes only the zero section F. This implies that the normal bundle F has even rank and is orientable. Fix an orientation, and give F the induced orientation. Since T is compact, the list of stabilizer groups of points in M is finite. Call 2 t generic if it is not in the Lie algebra of any of these stabilizers, other than T itself. In this case, value EulT ( F , ) of the equivariant Euler form is invertible as an element of (F). Theorem 33 (Integration formula). Suppose M is a compact oriented T-manifold, where T is a torus. Let 2 T (M) be a closed equivariant form, and let 2 t be generic. Then Z XZ F ðÞ ½46 ðÞ ¼ M F EulT ð F ; Þ F where the sum is over the connected components of the fixed point set. Rather than fixing , one can also view eqn (46) as an equality of rational functions of 2 t. Remark 34 The integration formula was obtained by Berline and Vergne (1983), based on ideas of Bott (1967). The topological counterpart, as a ‘‘localization principle,’’ was proved independently by Atiyah
and Bott (1984). More abstract versions of the localization theorem in equivariant cohomology had been proved earlier by Borel, Chiang–Skjelbred and others. Remark 35 If = PDT (F) ^ , where is equivariantly closed, the integration formula is immediate from the property [44] of Poincare´ duals. The essence of the proof is to reduce to this case. Remark 36 The localization contributions are particularly nice if F = {p} is isolated (which can only happen if dim M is even). In this case, F () is simply the value of the function [0] () at p. For the Euler form, one has Y Eulð F ; Þ ¼ ð 1Þdim M=2 hj ðpÞ; i ½47 where j (p) 2 t are the (real) weights of the action on the tangent space Tp M. (Here we have chosen an isomorphism Tp M ffi Cl compatible with the orientation.) Hence, if all fixed points are isolated, Z X ½0 ðÞðpÞ Q ½48 ðÞ ¼ ð 1Þdim M=2 M j hj ðpÞ; i p Example 37 LetR M be a compact oriented manifold, and e(M) = M Eul(TM) its Euler characteristic. Suppose a torus T acts on M. Then X eðMÞ ¼ eðFÞ ½49 F
where the sum is over the fixed point set of T. This follows from the integral of the equivariant Euler form () = EulT (M, ), by letting ! 0 in the localization formula. In particular, if M admits a circle action with isolated fixed points, the number of fixed points is equal to the Euler characteristic. In a similar fashion, the localization formula gives interesting expressions for other characteristic numbers of manifolds and vector bundles, in the presence of a circle action. Some of these formulas were discovered prior to the localization formula, see in particular Bott (1967). Example 38 In this example, we show that for a simply connected, simple Lie group G the 3-form 2 3 (G) defined in eqn [26] is integral, provided ‘‘’’ is taken to be the basic inner product (for which the length squared of the short coroots equals 2). Since any such G is known to contain an SU(2) subgroup, it suffices to prove this for G = SU(2). Consider the conjugation action of the maximal torus T ffi U(1), consisting of diagonal matrices. The fixed point set for this action is T itself. The normal
Equivariant Cohomology and the Cartan Model
bundle F is trivial, with T acting on the fiber g=t by the negative root . Hence, Eul( F , ) = h, i. 2 t be the coroot, defined by h, i = 2. Let = 2. Let us integrate the TBy definition, h, i equivariant extension T () (cf. [27]). Its pullback to T is T , where T 2 (T, t) is the Maurer–Cartan form. The integral of T is a generator of the integral Thus, lattice, that is, it equals . R Z T ¼ ¼1 ½50 T ðÞ ¼ T h; i h; i SUð2Þ
Duistermaat–Heckman Formulas In this section, we discuss the Duistermaat–Heckman formula, for the case of isolated fixed points. Let T be a torus, and (M, !) a compact Hamiltonian T-space, with moment map : M ! t . Denote by !T = ! þ the equivariant extension of !. Assuming isolated fixed points, the localization formula gives, for all integers k 0, Z X hðpÞ; ik Q ½51 ð! þ h; iÞk ¼ ð 1Þn M j hj ðpÞ; i p where n = (1=2) dim M. Note that both sides are homogeneous of degree k n in , but the terms on the right-hand side are only rational functions while the left-hand side is a polynomial. For k = n, both sides are independent of , and compute the integral R n ! . For k < n, the integral [51] is zero, and the M cancellation of the terms on the right-hand side gives identities among the weights j (p). Equation [51] also implies Z X ehðpÞ;i Q ½52 e!þh;i ¼ ð 1Þn M j hj ðpÞ; i p Assume, in particular, that T = U(1), and let = t0 , where 0 is the generator of the integral lattice in t. Identify t ffi R in such a way that 0 corresponds to 1 2 R. Then H = h, 0 i is a Hamiltonian function with periodic flow. Write aj (p) = hj (p), 0 i 2 Z. Then eqn [52] reads Z !n ð 1Þn X etHðpÞ Q ¼ ½53 etH n! tn M j aj ðpÞ p The right-hand side of eqn [53] is the leading term for the stationary phase approximation of the integral on the left. For this reason, eqn [52] is known as the Duistermaat–Heckman exact stationary phase theorem. Formula [52] has the following consequence for the push-forward of the Liouville measure under the
249
moment map, the so-called Duistermaat–Heckman measure H (!n =n!). Let be the Heaviside measure (i.e., the characteristic measure of the positive real axis). Theorem 39 (Duistermaat–Heckman). The pushforward H (!n =n!) is piecewise polynomial measure of degree n 1, with singularities at the set of all H(p) for fixed points p of the action. One has the formula n X ! ð HðpÞÞn 1 Q H ð HðpÞÞ ½54 ¼ n! j aj ðpÞ p Proof It is enough to show that the Laplace transforms of the two sides are equal. Multiplying by et and integrating over (take t < 0 to ensure convergence of the integral), the resulting identity is & just eqn [53]. Remark 40 The theorem generalizes to Hamiltonian actions of higher-rank tori, and also to nonisolated fixed points. See the paper by Guillemin, Lerman, and Sternberg (1988) for a detailed discussion of this formula and of its ‘‘quantum analog.’’
Equivariant Index Theory By definition, the Cartan model consists of equivariant forms () with polynomial dependence on the equivariant parameter . However, the integration formula holds in much greater generality. For instance, one may consider generalized Cartan complexes (Kumar and Vergne 1993). Here the parameter varies in some invariant open subset of g, and the polynomial dependence is replaced by smooth dependence. The use of these more general complexes in equivariant index theory was pioneered by Berline and Vergne (1992). Assume that M is an even-dimensional, compact oriented Riemannian manifold, equipped with a Spin-c structure. According to the Atiyah–Singer theorem, the index of the corresponding Dirac operator D is given by the formula Z c=2 ^ indðDÞ ¼ AðMÞe ½55 M
Here c is the curvature 2-form of the complex line ^ bundle associated to the Spin-c structure, and A(M) ^ ^ is the A-form. Recall that A(M) is obtained by substituting the curvature form in the formal power ^ = det1=2 ((x=2)= series expansion of the function A(x) sinh(x=2)) on so(n). Suppose now that a compact, connected Lie group G acts on M by isometries, and that the action lifts to the Spin-c bundle. Replacing curvatures with equivariant curvatures, one defines the equivariant form ^ ^ A(M)() and the form c(). Note that A() is only
250 Ergodic Theory
defined for in a sufficiently small neighborhood of ^ 0, since the function A(x) is not analytic for all x. The G-index of the equivariant Spin-c Dirac operator is a virtual character g 7! ind(D)(g) of the group G. For g = exp sufficiently small, it is given by the formula Z cðÞ=2 ^ indðDÞðexp Þ ¼ AðMÞðÞe ½56 M
For sufficiently small, the fixed point set of g coincides with the set of zeroes of the vector field a . The localization formula reproduces the Atiyah– Segal formula for ind(D)(g), as an integral over Mg . Berline and Vergne (1996) gave similar formulas for the equivariant index of any G-equivariant elliptic operator, and more generally for operators that are transversally elliptic in the sense of Atiyah. See also: Cohomology Theories; Compact Groups and Their Representations; Hamiltonian Group Actions; K-theory; Lie Groups: General Theory; Mathai–Quillen Formalism; Path-Integrals in Noncommutative Geometry; Stationary Phase Approximation.
Further Reading Alekseev A and Meinrenken E (2000) The non-commutative Weil algebra. Inventiones Mathematical 139: 135–172. Alekseev A and Meinrenken E (2006) Equivariant Cohomology and the Maurer–Cartan Equation. Atiyah MF and Bott R (1984) The moment map and equivariant cohomology. Topology 23(1): 1–28. Berline N, Getzler E, and Vergne M (1992) Heat Kernels and Dirac Operators. Grundlehren der Mathematischen Wissenschaften, vol. 298. Berlin: Springer. Berline N and Vergne M (1983) Z’ero d’un champ de vecteurs et classes caracte´ristiques e´quivariantes. Duke Mathematics Journal 50: 539–549.
Berline N and Vergne M (1996) L’indice e´quivariant des ope´rateurs transversalement elliptiques. Inventiones Mathematical 124(1–3): 51–101. Borel A (1960) Seminar on Transformation Groups (with contributions by Bredon G, Floyd EE, Montgomery D, and Palais R), Annals of Mathematics Studies, No. 46. Princeton: Princeton University Press. Bott R (1967) Vector fields and characteristic numbers. Michigan Mathematical Journal 14: 231–244. Cartan H (1950) La transgression dans un groupe de Lie et dans un fibre´ principal, Colloque de topologie (espaces fibre´s) (Bruxelles), Centre belge de recherches mathe´matiques, Georges Thone, Lie`ge, Masson et Cie., Paris, pp. 73–81. Cartan H (1950) Notions d’alge`bre diffe´rentielle; application aux groupes de Lie et aux varie´te´s ou` ope`re un groupe de Lie., Colloque de topologie (espaces fibre´s) (Bruxelles), Georges Thone, Lie`ge, Masson et Cie., Paris. Duistermaat JJ and Heckman GJ (1982) On the variation in the cohomology of the symplectic form of the reduced phase space. Inventiones Mathematical 69: 259–268. Getzler E (1994) The equivariant Chern character for non-compact Lie groups. Advances in Mathematics 109(1): 88–107. Goresky M, Kottwitz R, and MacPherson R (1998) Equivariant cohomology, Koszul duality, and the localization theorem. Inventiones Mathematical 131(1): 25–83. Guillemin V, Lerman E, and Sternberg S (1988) On the Kostant multiplicity formula. Journal of Geometry and Physics 5(4): 721–750. Guillemin V and Sternberg S (1999) Supersymmetry and Equivariant de Rham Theory. Springer. Husemoller D (1994) Fibre Bundles, Graduate Texts in Mathematics. 3rd edn., vol. 20. New York: Springer. Jeffrey L (1995) Group cohomology construction of the cohomology of moduli spaces of flat connections on 2-manifolds. Duke Mathematical Journal 77: 407–429. Kumar S and Vergne M (1993) Equivariant cohomology with generalized coefficients. Aste´risque 215: 109–204. Mathai V and Quillen D (1986) Superconnections, Thom classes, and equivariant differential forms. Topology 25: 85–106. Milnor J (1956) Construction of universal bundles. II. Annals of Mathematics 63(2): 430–436.
Ergodic Theory M Yuri, Hokkaido University, Sapporo, Japan ª 2006 Elsevier Ltd. All rights reserved.
Introduction The ergodic theory was developed from the following Poincare´’s work, which served as the starting point in the measure theory of dynamical systems in the sense of the study of the properties of motions that take place at ‘‘almost all’’ initial states of a system: let (X, B, ) be a probability space and a transformation T : X ! X preserve (i.e., (T 1 A) = (A) for any A 2 B). If (A) > 0, then for almost all points x 2 A the orbit {T n x}n 0 returns to A infinitely more often (the Poincare´–Caratheodory recurrence theorem).
The main theme of the ergodic theory is to know whether averages of quantities generated in a stationary manner converge. In the classical situation the stationary is described by a measure-preserving transformation T, and one considers averages taken along a sequence f , f T, f T 2 , . . . for integrable f. This corresponds to the probabilistic concept of stationarity. Hence, traditionally, the ergodic theory is the qualitative study of iterates of an individual transformation, of one parameter flow of transformations (such as that obtained from the solution of an autonomous ordinary differential equation). We should note that an important purpose behind this theory is to verify significant facts from a statistical point of view (e.g., the law of large numbers, convergence to limit distributions). The oldest branch
Ergodic Theory
of this theory is the study of ergodic theorems. It was started in 1931 by Birkhoff (1931) and von Neumann (1932), having its origins in statistical mechanics. More specifically, the central notion is that of ergodicity, which is intended to capture the idea that a flow is ‘‘random’’ or ‘‘chaotic.’’ In dealing with the motion of molecules, Boltzmann and Gibbs made such hypotheses from the beginning. One of the earliest precise definitions of randomness of a dynamical system was ‘‘minimality’’: the orbit of almost every point is dense. In order to describe such phenomena in measure-theoretical setting, von Neumann and Birkhoff required the stronger assumption of ergodicity as follows. Let (X, B, ) be a measure space and Ft a measurable flow on X. We call Ft ergodic if the only invariant measurable sets are ; or all of X. Here, the invariance of the set A means that Ft (A) = A for all t 2 R and we agree to write A = B if A and B differ by a null set with respect to . Note that ergodicity implies minimality if we are on a second countable Borel space. A function f : X ! R will be called a ‘‘constant of the motion’’ iff f Ft = f a.e. for each t 2 R. Then we see that a flow Ft on X is ergodic iff the only constants of the motion are constant a.e. In case of a measurable transformation T on X, the invariance of the set A means that T 1 A = A, and the measurable function f is called invariant if f T = f a.e. Then we call T ergodic provided if A is invariant then either (A) = 0 or (A) = 1; equivalently, any invariant function is constant a.e. (Cornfeld et al. 1982). The most basic example where ergodicity can be verified is the following: if M is a compact Riemannian and has negative sectional curvatures at each point, then the geodesic flow on each sphere bundle is ergodic (Hopf–Hadamard). In general, verifying ergodicity can still be very difficult. In the Hamiltonian case, the first step is to pass to an energy surface. For example, Sinai (1970) shows that one has ergodicity on an energy surface of a classical model for molecular motion, that is, a collection of hard spheres in a box.
Ergodic Theorems Koopman (1931) published the following significant observation: if T is an invertible measure-preserving transformation of a measure space (X, B, ), then the operator U, defined on L2 (X, B, ) by Uf (x) := f (Tx), is unitary. Thus, the association of U with T replaces a nonlinear finite-dimensional problem with a linear infinite-dimensional one. Then von Neumann (1932) showed an intimate connection between measure-preserving transformations and unitary operators (the mean ergodic theorem): let U be a unitary operator on a Hilbert
251
space H. Denote by P the orthogonal projection onto the subspace H0 := {f 2 HjUf = f }. For any f 2 H, one has 1N 1 X n U f Pf ¼ 0 lim N!1 N n¼0 H
As a corollary, one can show that if T : X ! X is an ergodic measure-preserving transformation on a probability space (X, B, ) then, for any f 2 L1 (X, B, ), 1 X 1N 1 f ðT n xÞ ¼ lim N!1 N ðXÞ n¼0
Z
f d
X
in L1 -norm. We also know that T is ergodic if and only if U has 1 as a simple eigenvalue. In the case of a continuous invertible process, the setting is the following. Let M be a manifold and a volume on M, with the corresponding measure. If Ft is a volume-preserving flow on M, then Ft induces a linear one-parameter group of isometries on H = L2 (M, ) by Ut (f ) = f Ft . Then Ut has 1 as a simple eigenvalue for all t if and only if Ft is ergodic. On the other hand, Birkhoff (1931) proved the following almost everywhere statement (the pointwise ergodic theorem): for any f 2 L1 (X, B, ), there exists a function f 2 L1 (X, B, ) such that for -a.e. x, f T(x) = f (x) and 1 X 1N f ðT n xÞ ¼ f ðxÞ N!1 N n¼0
lim
In particular, if T is ergodic then -a.e. R x, f (x) = X f d. Thus, the Birkhoff theorem allows one to prove the ergodic hypothesis by Boltzmann– Gibbs, that is, the space average of an observable function coincides with its time averages almost everywhere, and guarantees the existence, for almost everywhere, of the mean number of occurrences in any measurable set. On the other hand, physical meanings of the mean ergodic theorem can be explained as follows. We now turn to one-parameter flow of transformations. In order to study continuous averages 1 t
Z
t
f ðFs xÞds 0
fix some s0 2 R and consider the averages of the form 1 X 1N f ðT n xÞ N n¼0
252 Ergodic Theory
where T = Fs0 . In reality, the measurements can be done only approximately at times t = 0, 1, . . . , N 1, and it is natural to consider the perturbed averages X 1 N1 f ðT nþn xÞ N n¼0 where {n }n2N is an independent random sequence in a small interval (, ). Assuming that T = Fs0 is ergodic, we would like to know whether for large N, the averages 1 X 1N f ðT nþn xÞ N n¼0
{n 2 N jT n x 2 E} =P{n 2 N j TA n x 2 E}. P1 Therefore, n n for a.e. x 2 E, 1 n = 1 1E T (x) = n = 1 1E TA (x). This equality allows us to see that if (T, ) is ergodic then (TA , A ) is ergodic. Indeed, suppose TA 1 E = E c c and P1 (A \ En ) > 0. Then for x 2 A \ E , we have 1 T (x) = 0. On the other hand, as E E A S1 S1 S1n = 1 n n n T E (mod ), T E = T E is a n=1 n=1 n=0 T-invariantS set. Hence, ergodicity of (T, ) allows us n to see that 1 n = 1 T E = X ( mod 0), which implies P 1 n 1 T (x) = 1. In the S case when T is invertible, n=1 E R we can write A RA d = ( n0 T n A), so that Kac’s formula (Darling and Kac 1957): Z RA dA ¼ ðAÞ1 A
are close to Z
S
f ðxÞdðxÞ
X
The answer to this question is satisfactory if one is concerned with norm convergence (see, e.g., Bergelson et al. (1994)).
Induced Transformations and Tower Constructions
ðEÞ ¼ ðAÞ
Suppose T is a measure-preserving transformation on a probability space (X, B, ) and A 2 B with (A) > 0. Let us transform A into a space with normalized measure by choosing the -algebra BA consisting of all subsets E A, E 2 B and setting A (E) = (A \ E)=(A). Let RA : A ! N [ {1} be the ‘‘first return function,’’ that is, RA (x) := inf {n 2 N j T n x 2 A}. Then it follows from the Poincare´ reccurrence theorem that A ({x 2 A j RA (x) < 1} = 1. Define TA : {x 2 A j RA (x) < 1} ! A by TA x := T RA (x) x, which is called the ‘‘induced transformation’’ over A (constructed from T). For each n 2 N we define An := {x 2 A j RA (x)S = n}. Then for n every E 2 BA we see that TA 1 E = 1 n = 1 T (An \ E). Hence, if T is invertible, then we have immediately A (TA 1 E) = A (E); thus, A is invariant under TA . Even if T is noninvertible, since for every k 1 the equality, ! k1 [ T j Ac \ T k ðA \ EÞ j¼0
¼ ðAkþ1 \ T ðkþ1Þ EÞ þ
k [
is valid when n0 T n A = X (mod ). In particular, S ( n0 T n A) = 1 if T is ergodic. The key to establish the Kac formula is to show that T i Ak (0 i k 1, k 1) are pairwise disjoint. This property holds when T is invertible. On the other hand, in S1 then case when T is noninvertible, if n = 0 T A = X( mod 0) then we can establish, for every E 2 B,
!
T j Ac \ T ðkþ1Þ ðA \ EÞ
j¼0
P k holds, we have (E) = 1 k = 1 (Ak \ T E) = 1 (TA E), which allows us to see that TA preserves A . We note that for every E 2 BA with (E) > 0,
Z
RAX ðxÞ1 A
1E T h ðxÞ dA ðxÞ
½1
h¼0
by noting that the following equality holds for all n 1: ! n k X \ h c k ðEÞ ¼ jA A \ T A \T A k¼1
h¼1
þ jA ðA \ EÞ þ
n [
!c j
T A
! \T
n
E
j¼0
Then choosing E = X allows one to establish the Kac formula. As we S have observed in the above, the n assumption that 1 n = 0 T A = X( mod 0) is automatically satisfied if (T, )Sis ergodic. Conversely, if n (TA , A ) is ergodic and 1 n = 1 T A(mod ) holds, then (T, ) is ergodic. We should remark that the formula [1] allows one to obtain a T-invariant measure when a TA -invariant measure A is obtained previously. Even if RA is nonintegrable, we may have a -finite infinite invariant measure. Then if A is ergodic, obtained by [1] is still ergodic (i.e., T 1 E = E implies that (E) = 0 or c (E 0) under the assumption that S1 ) = n T A = X(mod ) (cf. Aaronson (1997)). In n=1 particular, the recent progress in the study of nonhyperbolic systems strongly depends on such constructions of induced maps over hyperbolic regions. More specifically, if one can find a subset A over which the induced map possesses an
Ergodic Theory
invariant measure satisfying nice statistical properties, then the formula [1] may give a -finite invariant measure for the original map T which reflects the statistical properties of the induced system. The fundamental problem in the study of nonhyperbolic phenomena arising from complex systems is to clarify how to predict statistical properties of nonhyperbolic systems (T, ) by using those of induced systems (TA , A ) over hyperbolic regions. We should claim that induced maps are well defined over positive-measure sets with respect to a reference measure that is ‘‘conservative.’’ Here conservativity of (T, ) implies that there are no wandering sets of positive measure with respect to . In many cases, the reference measures are physical measures (e.g., Lebesgue measures, conformal measures) which satisfy nonsingularity with respect to T. Here nonsingularity of means that T 1 . Then as long as we obtain a TA -invariant measure A which is equivalent to j A, the formula [1] may give us a T-invariant -finite measure which is equivalent to . At the end of this section, we will explain that the folmula [1] can be obtained via Rohlin tower (Kakutani’s skyscraper) in the case when T is invertible. This tower construction is a dual construction to the construction of induced transformations. Assuming that we are given an invertible transformation T of the measure space (X, B, ), consider the measurable integer-valued positive function f 2 L1 (X, B, ). By using this function, construct a new measure space Xf , whose points are of the form (x, i), where x 2 X, 1 i f (x) and i is an integer. The -algebra Bf of measurable sets in Xf is constructed in an obvious way. The measure f is defined as follows: for any subset of the form (A, i), A 2 B we put ðAÞ f ððA; iÞÞ :¼ R X f d Let T f ðx; iÞ ¼
ðx; i þ 1Þ if i þ 1 f ðxÞ ðTx; 1Þ if i þ 1 > f ðxÞ
It is easy to see that T f preserves f . The space can naturally be visualized as a tower whose foundation is the space X and which has f (x) floors over the point x 2 X. The space X is identified with the set of points (x, 1). We see that T = (T f )X and the construction of (Xf , T f ) is called the Rohlin tower over X. Let T be an invertible measure-preserving transformation on a probability space (X, B, ) and A 2 B with (A) > 0. Suppose that
253
S n X= 1 n = 0 T A (mod ). Then T is represented as the Rohlin tower (ARA , BRA , (A )RA ) over A as follows. We define p : (ARA , BRA , (A )RA ) ! (X, B, ) by p(x, i) := T i x. Then p is an isomorphism satisfying p(T RA )A = Tp (almost everywhere). Moreover, we can verify that (A )RA p1 = by assuming ergodicity of . This is because 8E 2 B we have ! 1 [ n T A \E n¼0
¼
1 n1 [ [
pððAn \ T i EÞ figÞ
n¼1 i¼0
so that 1 X n1 X A ðAn \ T i EÞ R A RA dA n¼1 i¼0 Z RAX ðxÞ1 ¼ ðAÞ 1E T h ðxÞ dA ðxÞ
ðA ÞRA ðp1 EÞ ¼
A
h¼0
On the other hand, in the case when T is noninvertible, the formula [1] is not necessarily obtained by any tower construction, except in very special cases. For example, even if T is not invertible, the tower construction is valid if TjA and TjAc are one-to-one and TA = X.
Convergence to Equilibrium States and Mixing Properties Let T : X ! X be a measure-preserving transformation on a probability space (X, B, ). We call T to be ‘‘weak mixing’’ if for any A, B 2 B 1 X 1N ðT n A \ BÞ ðAÞðBÞ ¼ 0 lim N!1 N n¼0 The weak-mixing property of (T, ) can be represented by; 8f , g 2 L2 (X, B, ) Z Z 1Z X 1N ðf T n Þg d ¼0 f d g d lim N!1 N X X X n¼0 and this is equivalent to the ergodicity of (T T, ). Moreover, (T, ) is weak mixing if and only if the unitary operator U : H ! H defined by Uf (x) = f (Tx) has no eigenfunctions that are not constants ( mod 0). We say that the operator U has continuous spectrum if there are no eigenvectors. If H is the closure of the linear span of the eigenvectors, then we say that the operator U has pure point spectrum. The weak-mixing property of (T, ) just implies that U restricted on the orthonormal subspace of the subspace consisting of
254 Ergodic Theory
constant functions has continuous spectrum. We recall that if U has one as a simple eigenvalue then T is ergodic. Additionally, if there are no other eigenvalues, then T is weakly mixing. Hence, if T is weak mixing, then it is necessarily ergodic. The next property corresponds to the term ‘‘relaxation’’ in physics literature which is used to describe processes under which the system passes to a certain stationary state independently of its original state. We call T (strong) mixing if for any A, B 2 B lim ðT n A \ BÞ ¼ ðAÞðBÞ
n!1
Then (T, ) is (strong) mixing if and only if for any f , g 2 L2 (X, B, ) Z Z Z ðf T n Þg d ¼ f d g d lim n!1
X
X
X
and mixing is necessarily weak mixing. Moreover, for any probability measure absolutely continuous with respect to , one can show that limn!1 (T n A) = (A) for every A 2 B. Thus, any nonequilibrium distribution tends to an equilibrium one with time. The mixing property has a significant meaning from a physical point of view, as it implies decay of correlation of observable functions; moreover, limiting distributions of averaged observables are determined by the decay rates of correlation functions for many cases (e.g., hyperbolic systems). For any f 2 L2 (X, B, ) we consider the scalar products sn = sn (f ) = (Un f , f ), n 0 and define sn := sn for n < 0. The sequence {sn }n2Z is positive definite and so byR Bohner’s theorem, we can write 1 sn (f ) = 0 exp [2 in ] df ( ), where f is a finite Borel measure on the unit circle S1 and satisfies the condition that f (S1 ) = kf k2 . Such a measure is called a spectral measure of f. We see R that T is mixing iff for any f 2 L2 (X, B, ) with X f d = 0 the Fourier coefficients {sn } of the spectral measure f tend to zero as jnj ! 1. Let (X, B, ) be isomorphic to ([0, 1], B0 , ), where B0 is the Borel -algebra on [0, 1] and is the normalized Lebesgue measure of [0, 1]. Then we call a measure-preserving transformation T on (X, B, ) an exact endomorphism if T1 n T B = {X, ;}( mod 0). We can verify that an n=0 exact endomorphism is (strong) mixing (Rohlin 1964). Moreover, is exact if for any positive-measure set A 2 B with T n A 2 B(8n 0) limn!1 (T n A) = 1 holds. Let T be a nonsingular transformation on (X, B, ), that is, T 1 . Then we can define the transfer (Perron–Frobenius) operator L : L1 (X, ) ! L1 (X, ) by L f := d(f )T 1 =d, which satisfies Z Z ðL f Þg d ¼ f ðgTÞd ð8g 2 L1 ðX; ÞÞ X
X
We T say that a nonsingular measure is exact if n c A2 1 n = 0 T B implies (A)(A ) = 0. By Lin’s theorem (Lin 1971) the exactness of R can be described as follows; 8f 2 L1 (X, ) with X f d = 0, limn!1 kL n f k1 = 0. Let = h be an exact T-invariant probability measure equivalent to . Then the upper bounds of mixing rates of the exact measure = h are determined by the speed of L1 convergence of the iterated transfer operators {L n }. This Ris because L h = h and for every f 2 L1 (X, ) with X f d = 1, limn!1 kL n f hk1 = 0. Hence, the property L f = h1 L (hf ) allows one to see that for every f , g 2 L1 (X, ) the correlation function Z Z Z n f d g d Cf ; g ðnÞ :¼ ðf T Þg d X
X
X
is bounded from above by Z n kf k1 kL g g dk1 X
¼ kf k1 kh1 fL n ðghÞ PðghÞgjj1 where P : L1 (X, )R! L1 (X, ) is a linear operator defined by Pf := h X f d. The operator P is the onedimensional projection operator associated to the eigenvalue 1 (which is maximal in many cases) of L satisfying P2 = P and PL = L P = P. Moreover, since L n P = (L P)n , the exponential decay of mixing rates follows from the spectral gap of L , that is, 1 is the simple isolated maximal eigenvalue of L .
Entropy and Reversibility We recall one of the fundamental problems of ergodic theory, namely deciding when two automorphisms T1 , T2 of probability spaces (X1 , B1 , 1 ) and (X2 , B2 , 2 ) are equivalent. The approach developed for this problem involved the study of spectral properties of the associated isometric operators Ui : L2 (Xi , i ) ! L2 (Xi , i )(i = 1, 2) and is based on the concept of the entropy of automorphism T, introduced by Kolmogorov (1958). The entropy is a non-negative number, which is the same for equivalent automorphisms. For example, the entropy of the Bernoulli shift : n2Z {1, 2, . . . , d} ! n2Z {1, 2, . . . , d} with P probability vector (p1 , p2 , . . . , pd ) is equal to dk = 1 pk log pk . A remarkable theorem of Ornstein (1970) states that Bernoulli shifts with the same entropy are equivalent. On the other hand, Shannon (1948) introduced a notion of entropy in his work information theory, which is essentially the same as Kolmogorov’s. Let T : X ! X be a measure-preserving transformation on a probability space (X, B, ). We define the entropy of a measurable partition of X by
Ergodic Theory
P H ( ) = A2A (A) log (A) and define the entropy of T with respect to by ! n1 _ 1 T i
h ðT; Þ :¼ lim H n!1 n i¼0 Then the (measure-theoretic) entropy of T is defined by h ðTÞ ¼
sup
h ðT; Þ
:H ð Þ 2 (x) > > d (x) and a decomposition Tx M = E1 (x) þ E2 (x) þ þ Ed (x) such that lim
1
n!1 n
log kDT n ðxÞuk ¼ j ðxÞ
for every 0 6¼ u 2 Ej (x) and every 1 j d. Let be the set of regular points of T. Then we define a function X ðxÞ :¼ j ðxÞ dim Ej ðxÞ j ðxÞ0
In the case when all Lyapunov exponents at x are negative, we put (x) = 0. Then for every T-invariant Borel probability measure on (X, B), it holds that R h (T) X d (Ruelle 1978). Moreover, the equality holds whenever T is C1 -Ho¨lder and is absolutely continuous with respect to the Lebesgue measure of X (Pesin 1977). Let T be a transitive C1 -Ho¨lder Anosov diffeomorphism. Es , Eu denote the stable and unstable fiber bundles of T. Suppose that þ is the unique T-invariant probability measure which satisfies Z M
n1 1X f T k ðxÞ n!1 n k¼0
f dþ ¼ lim
for every continuous function f : M ! R and almost everywhere x 2 M with respect to the Lebesgue measure. The probability measure is the so-called Sinai–Ruelle–Bowen (SRB) measure. Then we have Z hþ ðTÞ ¼ log j det DTðxÞjEux jdþ ðxÞ M
On the other hand, we have Z hþ ðTÞ ¼ log j det DT 1 ðxÞjEsx jdþ ðxÞ M Z þ log j det DTðxÞjdþ ðxÞ M
We also define unti-SRB measure by replacing T by T 1 . Then the SRB measure þ is absolutely continuous with respect to the Lebesgue measure of M iff þ coincides with the unti-SRB measure (Bowen 1975). RHence, the SRB measure is absolutely continuous iff M log j det DT(x)jdþ (x) = 0. This property is sometimes explained as ‘‘zero entropy production’’ and also as ‘‘reversibility’’ in the context of nonequilibrium statistical mechanics (Ruelle 1997). See also: Chaos and Attractors; Determinantal Random Fields; Dissipative Dynamical Systems of Infinite Dimension; Dynamical Systems and Thermodynamics; Finitely Correlated States; Fourier Law; Fractal Dimensions in Dynamics; Homeomorphisms and Diffeomorphisms of the Circle; Hyperbolic Billiards; Hyperbolic dynamical Systems; Intermittency in
256 Euclidean Field Theory Turbulence; Large Deviations in Equilibrium Statistical Mechanics; Lyapunov Exponents and Strange Attractors; Nonequilibrium Statistical Mechanics: Interaction Between Theory and Numerical Simulations; Nonequilibrium Statistical Mechanics (Stationary): Overview; Phase Transitions in Continuous Systems; Polygonal Billiards; Regularization for Dynamical Zeta Functions; Singularity and Bifurcation Theory; von Neumann Algebras: Introduction, Modular Theory, and Classification Theory.
Further Reading Aaronson J (1997) An Introduction to Infinite Ergodic Theory. Mathematical Surveys and Monographs 50. Providence, RI: American Mathematical Society. Bergelson V, Boshernitzan M, and Bourgain J (1994) Some results on non-linear recurrence. Journal d’analyse mathematique 62: 444–458. Billingsley P (1965) Ergodic Theory and Information. New York: Wiley. Birkhoff G (1931) Proof of the ergodic theorem. Proceedings of the National Academy of Sciences of the USA 17: 656–660. Bowen R (1975) Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms. Springer Lecture Notes in Mathematics, vol. 470. Berlin: Springer. Bunimovich LA et al. (2000) Dynamical Systems, Ergodic Theory and Applications. Encyclopedia of Mathematical Science, vol. 100. Berlin: Springer. Cornfeld IP, Fomin SV, and Sinai YaG (1982) Ergodic Theory. Grundlehren der mathematischen Wissenschaften, vol. 245. Berlin: Springer.
Darling DA and Kac M (1957) On occupation times for Markov processes. Transactions of the American Mathematical Society 84: 444–458. Kolmogorov AN (1958) A new metric invariant of transitive dynamical systems and Lebesgue space automorphisms. Doklady Akademii Nauk USSR 119: 861–864. Koopman BO (1931) Hamiltonian systems and transformations in Hilbert space. Proceedings of the National Academy of Sciences of the USA 17: 315–318. Lin M (1971) Mixing for Markov operators. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete 19: 231–234. Ornstein D (1970) Bernoulli shifts with the same entropy are isomorphic. Advances in Mathematics 4: 337–352. Pesin YB (1977) Characteristic Lyapunov exponents and smooth ergodic theory. Russian Mathematical Surveys 32: 54–114. Rohlin VI (1964) Exact endomorphism of a Lebesgue space. American Mathematical Society Translations 39: 1–36. Ruelle D (1978) An inequality for the entropy of differential maps. Boletim da Sociedade Brasileira de Mathematica 9: 83–87. Ruelle D (1997) Entropy production in nonequilibrium statistical mechanics. Communications in Mathematical Physics 189: 365–371. Shanonn C (1948) A mathematical theory of communications. Bell System Technical Journal 27: 379–423, 623–656. Sinai YaG (1970) Dynamical systems with elastic reflections. Russian Mathematical Surveys 25: 137–189. von Neumann J (1932) Proof of the quasi-ergodic hypothesis. Proceedings of the National Academy of Sciences of the USA 18: 70–82. Walters P (1981) An Introduction to Ergodic Theory. GTM, Springer Verlag.
Euclidean Field Theory F Guerra, Universita` di Roma ‘‘La Sapienza,’’ Rome, Italy ª 2006 Elsevier Ltd. All rights reserved.
Introduction In this article, we consider Euclidean field theory as a formulation of quantum field theory which lives in some Euclidean space, and is expressed in probabilistic terms. Methods arising from Euclidean field theory have been introduced in a very successful way in the study of concrete models of constructive quantum field theory. Euclidean field theory was initiated by Schwinger (1958) and Nakano (1959), who proposed to study the vacuum expectation values of field products analytically continued into the Euclidean region (Schwinger functions), where the first three (spatial) coordinates of a world point are real and the last one (time) is purely imaginary (Schwinger points). The
possibility of introducing Schwinger functions, and their invariance under the Euclidean group are immediate consequences of the now classic formulation of quantum field theory in terms of vacuum expectation values given by Wightman (Streater and Wightman 1964). The convenience of dealing with the Euclidean group, with its positive-definite scalar product, instead of the Lorentz group, is evident, and has been exploited by several authors, in different contexts. The next step was made by Symanzik (1966), who realized that Schwinger functions for boson fields have a remarkable positivity property, allowing to introduce Euclidean fields on their own sake. Symanzik also pointed out an analogy between Euclidean field theory and classical statistical mechanics, at least for some interactions (Symanzik 1969). This analogy was successfully extended, with a different interpretation, to all boson interactions by Guerra et al. (1975), with the purpose of using rigorous results of modern statistical mechanics for
Euclidean Field Theory
the study of constructive quantum field theory, within the program advocated by Wightman (1967), and further pursued by Glimm and Jaffe (see Glimm and Jaffe (1981) for an overall presentation). The most dramatic advance of Euclidean theory was due to Nelson (1973a, b). He was able to isolate a crucial property of Euclidean fields (the Markov property) and gave a set of conditions for these fields, which allow us to derive all properties of relativistic quantum fields satisfying Wightman axioms. The Nelson theory is very deep and rich in new ideas. Even after so many years since the basic papers were published, we lack a complete understanding of the radical departure from the conventional theory afforded by Nelson’s ideas, especially about their possible further developments. By using the Nelson scheme, in particular a very peculiar symmetry property, it was very easy to prove (Guerra 1972) the convergence of the ground-state energy density, and the van Hove phenomenon in the infinite-volume limit for two-dimensional boson theories. A subsequent analysis (Guerra et al. 1972) gave other properties of the infinite-volume limit of the theory, and allowed a remarkable simplification in the proof of a very important regularity property for fields, previously established by Glimm and Jaffe. Since then, all work on constructive quantum field theory has exploited in different ways ideas coming from Euclidean field theory. Moreover, a very important reconstruction theorem has been established by Osterwalder and Schrader (1973), allowing a reconstruction of relativistic quantum fields from the Euclidean Schwinger functions, and avoiding the previously mentioned Nelson reconstruction theorem, which is technically more difficult to handle. This article is intended to be an introduction to the general structure of Euclidean quantum field theory, and to some of the applications to constructive quantum field theory. Our purpose is to show that, 50 years after its introduction, the Euclidean theory is still interesting, both from the point of view of technical applications and physical interpretation. The article is organized as follows. In the next section, by considering simple systems made of a single spinless relativistic particle, we introduce the relevant structures in both Euclidean and Minkowski worlds. In particular, a kind of (pre)Markov property is introduced already at the one-particle level. Next we present a description of the procedure of second quantization on the one-particle structure. The free Markov field is introduced, and its crucial Markov property explained. Following Nelson, we use probabilistic concepts and methods, whose relevance for constructive quantum field theory became immediately more and more apparent. The
257
very structure of classical statistical mechanics for Euclidean fields is firmly based on these probabilistic methods. This is followed by an introduction of interaction, and we show the connection between the Markov theory and the Hamiltonian theory, for two-dimensional space-cutoff interacting scalar fields. In particular, we present the Feynman–Kac– Nelson formula that gives an explicit expression of the semigroup generated by the space-cutoff Hamiltonian in o space. We also deal with some applications to constructive quantum field theory. This is followed by a short discussion about the physical interpretation of the theory. In particular, we discuss the Osterwalder–Schrader reconstruction theorem on Euclidean Schwinger functions, and the Nelson reconstruction theorem on Euclidean fields. For the sake of completeness, we sketch the main ideas of a proposal, advanced in Guerra and Ruggiero (1973), according to which the Euclidean field theory can be interpreted as a stochastic field theory in the physical Minkowski spacetime. Our treatment will be as simple as possible, by relying on the basic structural properties, and by describing methods of presumably very long lasting power. The emphasis given to probabilistic methods, and to the statistical mechanics analogy, is a result of the historical development. Our opinion is that not all possibilities of Euclidean field theory have been fully exploited yet, both from technical and physical points of view.
One-Particle Systems A system made of only one relativistic scalar particle, of mass m > 0, has a quantum state space represented by the positive-frequency solutions of the Klein–Gordon equation. In momentum space, with points p , = 0, 1, 2, 3, let us introduce the upper mass hyperboloid, characterized by the conP straints p2 p20 3i = 1 p2i = m2 , p0 m, and the relativistic invariant measure on it, formally given by d(p) = (p0 )(p2 m2 ) dp, where is the step function (x) = 1 if x 0, and (x) = 0 otherwise, and dp is the four-dimensional Lebesgue measure. The Hilbert space of quantum states F is given by the square-integrable functions on the mass hyperboloid equipped with the invariant measure d(p). Since in some reference frame the mass hyperboloid is uniquely characterized by the space values of the momentum p, with the energy given by p0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !(p) = p2 þ m2 , the Hilbert space F of the states is, in fact, made of those complex-valued tempered distributions f in the configuration space R3 whose Fourier transforms, ~f (p), are square-integrable functions in momentum space with respect to the image of the relativistic invariant measure dp=2!(p), where
258 Euclidean Field Theory
dp is the Lebesgue measure in momentum space. The scalar product on F is defined by Z dp 3 ~f ðpÞ~ gðpÞ hf ; giF ¼ ð2Þ 2!ðpÞ where we have normalized the Fourier transform in such a way that Z f ðxÞ ¼ expðip:xÞ~f ðpÞ dp Z ~f ðpÞ ¼ ð2Þ3 expðip:xÞ~f ðxÞ dx Z expðip:xÞ dp ¼ ð2Þ3 ðxÞ The scalar product on F can also be expressed in the form ZZ f ðx0 Þ Wðx0 xÞgðxÞ dx0 dx h f ; gi F ¼ where we have introduced the two-point Wightman function at fixed time, defined by Z dp Wðx0 xÞ ¼ ð2Þ3 expðip:ðx0 xÞÞ 2!ðpÞ A unitary irreducible representation of the Poincare´ group can be defined on F in the obvious way. In particular, the generators of space translations are given by multiplication by the components of p in momentum space, and the generator of time translations (the energy of the particle) is given by !(p). For the scalar product of time-evolved wave functions, we can write hexpðit0 Þf ; expðitÞgiF ZZ ¼ f ðx0 Þ Wðt0 t; x0 xÞgðxÞ dx0 dx where we have introduced the two-point Wightman function, defined by Wðt0 t; x0 xÞ Z dp ¼ ð2Þ3 expðiðt t0 ÞÞ expðip:ðx0 xÞÞ 2!ðpÞ
P points. Here x, p 2 R4 , and p x = 4i = 1 xi pi . Here dp and dx are the Lebesgue measures in the R4 momentum and configuration spaces, respectively. The function S(x) is positive and analytic for x 6¼ 0, decreases as exp (mkxk) as x ! 1, and satisfies the equation ð þ m2 ÞSðxÞ ¼ ðxÞ P4 where = i = 1 @ 2 =@x2i is the Laplacian in four dimensions. The mathematical image we are looking for is described by the Hilbert space N of those tempered distributions in four-dimensional configuration space R4 whose Fourier transforms p areffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi square integrable with respect to the measure dp= p2 þ m2 . The scalar product on N is defined by Z dp 4 ~f ðpÞ~gðpÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hf ; giN ¼ ð2Þ 2 p þ m2 Four-dimensional Fourier transforms are normalized as follows: Z f ðxÞ ¼ exp ðip:xÞ~f ðpÞ dp Z ~f ðpÞ ¼ ð2Þ4 exp ðip:xÞ~f ðxÞ dx Z exp ðip:xÞ dp ¼ ð2Þ4 ðxÞ We also write ZZ
f ðxÞSðx yÞgðyÞ dx dy D E ¼ f ; ð þ m2 Þ1 g
hf ; giN ¼
where h , i is the ordinary Lebesgue product defined on Fourier transforms and, in momentum space, ( þ m2 )1 amounts to a multiplication by (p2 þ m2 )1 . The Schwinger function S(x y) is formally the kernel of the operator ( þ m2 )1 . The Hilbert space N is the carrier space of a unitary (nonirreducible) representation of the fourdimensional Euclidean group E(4). In fact, let (a, R) be an element of E(4) ða; RÞ : R4 ! R4
To the physical single-particle system living in Minkowski spacetime, we associate a kind of mathematical image, living in Euclidean space, from which all properties of the physical system can be easily derived. We start from the two-point Schwinger function Z 1 expðip xÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dp SðxÞ ¼ 4 ð2Þ p2 þ m2
where a 2 R4 , and R is an orthogonal matrix, RRT = RT R = 14 . Then the transformation u(a, R) defined by
which is the analytic continuation of the previously given two-point Wightman function into the Schwinger
provides the representation. In particular, we consider the reflection r0 with respect to the hyperplane
x ! Rx þ a
uða; RÞ : N ! N f ðxÞ ! ðuða; RÞf ÞðxÞ ¼ f ðR1 ðx aÞÞ
Euclidean Field Theory
x4 = 0, and the translations u(t) in the x4 -direction. Then we have r0 u(t)r0 = u(t), and analogously for other hyperplanes. Now we introduce a local structure on N by considering, for any closed region A of R4 , the subspace NA of N made by distributions in N with support on A. We call eA the orthogonal projection on NA . It is obvious that if A 2 B then NA 2 NB and eA eB = eB eA = eA . A kind of (pre)Markov property for one-particle systems is introduced as follows. Consider a closed three-dimensional piecewise smooth manifold , which divides R4 in two closed regions A and B, having in common. Therefore, 2 A, 2 B, A \ B = , A [ B = R4 . Let NA , NB , N , and eA , eB , e be the associated subspaces and projections, respectively. Then N NA , N NB , and e eA = eA e = e , e eB = eB e = e . It is very simple to prove the following: Theorem 1 Let eA , eB , e be defined as above, then eA eB = eB eA = e . Clearly, it is enough to show that for any f 2 N we have eA eB f 2 N . In that case, e eA eB f = eA eB f , from which the theorem easily follows. Since eA eB f has support on A, we must show that for any C1 0 function g with support on A we have hg, eA eB f i = 0. Then eA eB f has support on , and the proof is complete. Now we have hg; eA eB f i ¼ ð þ m2 Þg; eA eB f N ¼ eA ð þ m2 Þg; eB f N ¼ ð þ m2 Þg; eB f N ¼ hg; eB f i ¼ 0 where we have used the definition of h iN in terms of h i, the fact that eA ( þ m2 )g = ( þ m2 )g, since ( þ m2 )g has support on A , and the fact that eB f has support on B. This ends the proof of the (pre)Markov property for one-particle systems. A very important role in the theory is played by subspaces of N associated to hyperplanes in R4 . To fix ideas, consider the hyperplane x4 = 0 and the associated subspace N0 . A tempered distribution in N with support on x4 = 0 has necessarily the form (f 0 )(x) f (x)(x4 ), with f 2 F. By using the basic magic formula, for x 0 and M > 0, Z þ1 expðipxÞ dp ¼ exp ðMxÞ 2 þ M2 p M 1 it is immediate to verify that k f 0 kN = kf kF . Therefore, we have an isomorphic and isometric identification of the two Hilbert spaces F and N0 . Obviously, similar considerations hold for any hyperplane. In particular, we consider the
259
hyperplanes x4 = t and the associated subspaces Nt . Let us introduce injection operators jt defined by jt : F ! N f ! f t where f is a generic element of F, with values f (x), and (f t )(x) = f (x)(x4 t). It is immediate to verify the following properties for jt and its adjoint jt : the range of jt is Nt ; moreover, jt is an isometry, so that jt jt = 1F , jt jt = et , where 1F is the identity on F, and et is the projection on Nt . Moreover, et jt = jt and jt = jt et . If we introduce translations u(t) along the x4 -direction and the reflection r0 with respect to x4 = 0, then we also have the covariance property u(t)js = jtþs , and the reflexivity property r0 j0 = j0 , j0 r0 = j0 . The reflexivity property is very important. It tells us that r0 leaves N0 pointwise invariant, and it is an immediate consequence of the fact that (x4 ) = (x4 ). Therefore, if we start from N we can obtain F, by taking the projection j with respect to some hyperplane , in particular x4 = 0. It is also obvious that we can induce on F a representation of E(3) by taking those elements of E(4) that leave invariant. Let us now see how we can define the Hamiltonian on F starting from the properties of N. Since we are considering the simple case of the one-particle system, we could just perform the following construction explicitly by hand, through a simple application of the basic magic formula given earlier. But we prefer to follow a route that emphasizes Markov property and can be immediately generalized to more complicated cases. Let us introduce the operator p(t) on F defined by the dilation p(t) = j0 jt = j0 u(t)j0 , t 0. Then we prove the following: Theorem 2 The operator p(t) is bounded and selfadjoint. The family {p(t)}, for t 0, is a normcontinuous semigroup. Proof Boundedness and continuity are obvious. Selfadjointness is a consequence of reflexivity. In fact, p ðtÞ ¼ j0 uðtÞj0 ¼ j0 r0 uðtÞr0 j0 ¼ j0 uðtÞj0 ¼ pðtÞ The semigroup property is a consequence of the Markov property. In fact, let us introduce Nþ , N0 , N as subspaces of N made by distributions with support in the regions x4 0, x4 = 0, x4 0, respectively, and call eþ , e0 , e the respective projections. By Markov property, we have e0 = e eþ . Now write, for s, t 0, pðtÞpðsÞ ¼ j0 uðtÞj0 j0 uðsÞj0 ¼ j0 uðtÞe0 uðsÞj0
260 Euclidean Field Theory
If e0 could be cancelled, then the semigroup property would follow from the group property of the translations u(t)u(s) = u(t þ s) (a miracle of the dilations!). For this, consider the matrix element
½u1 u2 un ¼
hf ; pðtÞpðsÞgiF ¼ huðtÞj0 f ; e0 uðsÞj0 giN recall e0 = e eþ , u(t)j0 f 2 N .
and
use
u(s)j0 g 2 Nþ
n X ½u1 ui ½u1 u2 un 0 i¼2
and
Let us call h the generator of p(t), so that p(t) = exp (th), for t 0. By definition, h is the Hamiltonian of the physical system. A simple explicit calculation shows that h is just the energy ! introduced earlier. Starting from the representation of the Euclidean group E(3) already given and from the Hamiltonian, we immediately get a representation of the full Poincare´ group on F. Therefore, all physical properties of the one-particle system have been reconstructed from its Euclidean image on the Hilbert space N. As a last remark of this section, let us note that we can consider the real Hilbert spaces Nr and Fr , made of real elements (in configuration space) in N and F. The operators u(a, t), u(t), r0 , j , j , eA are all reality preserving, that is, they map real spaces into real spaces. This completes our discussion about the oneparticle system. For more details we refer to Guerra et al. (1975) and Simon (1974). We have introduced the Euclidean image, discussed its main properties, and shown how we can derive all properties of the physical system from its Euclidean image. In the next sections, we will show how this kind of construction carries through the second-quantized case and the interacting case.
Second Quantization and Free Fields We begin this section with a short review about the procedure of second quantization based on probabilistic methods, by following mainly Nelson (1973b); see also Guerra et al. (1975) and Simon (1974). Probabilistic methods are particularly useful in the framework of the Euclidean theory. Let H be a real Hilbert space with symmetric scalar product h , i. Let (u) be the elements of a family of centered Gaussian random variables indexed by u 2 H, uniquely defined by the expectation values E((u)) = 0, E((u)(v)) = hu, vi. Since is Gaussian, we also have Eðexpð ðuÞÞÞ ¼ exp 12 2 hu; ui and
Here [ . . . ] is the Hafnian of elements [ui uj ] = ui , uj , defined to be zero for odd n, and for even n given by the recursive formula
0
where in [ . . . ] the terms u1 and ui are suppressed. Hafnians, from the Latin name of Copenhagen, the first seat of the theoretical group of CERN, were introduced in quantum field theory by Caianiello (1973), as a useful tool when dealing with Bose statistics. Let (Q, , ) be the underlying probability space where are defined as random variables. Here Q is a compact space, a -algebra of subsets of Q, and a regular, countable additive R probability measure on , normalized to (Q) = Q d = 1. The fields (u) are represented by measurable functions on Q. The probability space is uniquely defined, but for trivial isomorphisms, if we assume that is the smallest -algebra with respect to which all fields (u), with u 2 H, are measurable. Since (u) are Gaussian, they are represented by Lp (Q, , ) functions, for any p with 1 p < 1, and the expectations will be given by Eððu1 Þðu2 Þ ðun ÞÞ Z ¼ ðu1 Þðu2 Þ ðun Þ d Q
where, by a mild abuse of notation, (ui ) on the right-hand side denote the Q space functions which represent the random variables (ui ). We call the complex Hilbert space F = (H) = L2 (Q, , ) the ok space constructed on H, and the function 0 1 on Q the ok vacuum. In order to introduce the concept of second quantization of operators, we must introduce subspaces of F with a ‘‘fixed number of particles.’’ Call F (0) = { 0 }, where is any complex number. Define F ( n) as the subspace of F generated by complex linear combinations of monomials of the type (u1 ) (uj ), with ui 2 H, and j n. Then F ( n1) is a subspace of F ( n) . We define F (n) , the n-particle subspace, as the orthogonal complement of F ( n1) in F ( n) , so that F ð nÞ ¼ F ðnÞ F ð n1Þ By construction, the F (n) are orthogonal, and it is not difficult to verify that F ¼
Eððu1 Þðu2 Þ ðun ÞÞ ¼ ½u1 u2 un
1 M n¼0
F ðnÞ
Euclidean Field Theory
Let us now introduce the Wick normal products by the definition :ðu1 Þðu2 Þ ðun Þ :¼ EðnÞ ðu1 Þðu2 Þ ðun Þ where E(n) is the projection on F (n) . It is not difficult to prove the usual Wick theorem (see, e.g., Guerra et al. (1975), and its inversion given by Caianiello (1973). It is interesting to remark that, in the framework of the second quantization performed with probabilistic methods, it is not necessary to introduce creation and destruction operators as in the usual treatment. However, the two procedures are completely equivalent, as shown, for example, in Simon (1974). Given an operator A from the real Hilbert space H1 to the real Hilbert space H2 , we define its second-quantized operator (A) through the following definitions: ðAÞ01 ¼ 02 ðAÞ :1 ðu1 Þ1 ðu2 Þ . . . 1 ðun Þ: ¼ :2 ðAu1 Þ2 ðAu2 Þ 2 ðAun Þ: where we have introduced the probability spaces Q1 and Q2 , their vacua 01 and 02 , and the random variables 1 and 2 , associated to H1 and H2 , respectively. The following remarkable theorem by Nelson (1973b) gives a full characterization of (A), very useful in the applications. Theorem 3 Let A be a contraction from the real Hilbert space H1 to the real Hilbert space H2 . Then (A) is an operator from L1(1) to L1(2) which is positivity preserving, (A)u 0 if u 0, and such that E((A)u) = E(u). Moreover, (A) is a contraction from Lp(1) to Lp(2) for any p, 1 p < 1. Finally, p q (A) is also a contraction from L(1) to L(2) , with 2 q p, if kAk (p 1)=(q 1). We have indicated with Lp(1) , Lp(2) the Lp spaces associated to H1 and H2 , respectively. This is the celebrated best hypercontractive estimate given by Nelson. For the proof, we refer to the original paper of Nelson (1973b); see also Simon (1974). This completes our short review on the theory of second quantization based on probabilistic methods. The usual time-zero quantum field (u), u 2 Fr , in the ok representation, can be obtained through second quantization starting from Fr . We call , ) the underlying probability space, and (Q, , ) the Hilbert ok space of F = (Fr ) = L2 (Q, the free physical particles. Now we introduce the free Markov field (f ), f 2 Nr , by taking Nr as the starting point. We call (Q, , ) the associated probability space. We
261
introduce the Hilbert space N = (Nr ) = L2 (Q, , ), and the operators U(a, R) = (u(a, R)), R0 = (r0 ), U(t) = (u(t)), EA = (eA ), and so on, for which the previous Nelson theorem holds (take H1 = H2 = Nr ). Since in general (AB) = (A)(B), we have immediately the following expression of the Markov property E = EA EB , where the closed regions A, B, of the Euclidean space have the same properties as explained earlier in the proof of the (pre)Markov property for one-particle systems. It is obvious that EA can also be understood as conditional expectation with respect to the sub-algebra A generated by the field (f ) with f 2 Nr and the support of f on A. The relation, previously pointed out, between Nt subspaces and F are also valid for their real parts Nrt and Fr . Therefore, they carry out through the second quantization procedure. We introduce Jt = (jt ) and Jt = (jt ); then the following proper , ) ties hold. Jt is an isometric injection of Lp (Q, into Lp (Q, , ); the range of Jt as an operator L2 ! L2 is obviously N t = (Nrt ); moreover, Jt Jt = Et . The free Hamiltonian H0 is given for t 0 by J0 Jt ¼ expðtH0 Þ ¼ ðexpðt!ÞÞ Moreover, we have the covariance property U(t)J0 = Jt , and the reflexivity R0 J0 = J0 , J0 R0 = J0 . These relations allow a very simple expression for the matrix elements of the Hamiltonian semigroup in terms of Markov quantities. In fact, for u, v 2 F we have Z ð Jt uÞ J0 v d hu; expðtH0 Þvi ¼ Q
In the next section, we will generalize this representation to the interacting case. Finally, let us derive the hypercontractive property of the free Hamiltonian semigroup. Since k exp (t!)k exp (tm), where m is the mass of the particle, we have immediately, by a simple application of Nelson theorem, k expðtH0 Þkp;q 1 provided q 1 (p 1) exp (2tm), where k . . . kp, q denotes the norm of an operator from Lp to Lq spaces.
Interacting Fields The discussion of the previous sections was limited to free fields both in Minkowski and Euclidean spaces. Now we must introduce interaction in order to get nontrivial theories.
262 Euclidean Field Theory
First, as a general motivation, we will proceed quite formally and then we will resort to precise statements. Let us recall that in standard quantum field theory, for scalar self-coupled fields, the time-ordered products of quantum fields in Minkowski spacetime can be expressed formally through the formula R Tððx1 Þ ðxn Þ expði L dxÞÞ R T expði L dxÞ where T denotes time ordering, are free fields in Minkowski spacetime, L is the interaction Lagrangian, and h. . .i are vacuum averages. As is well known, this expression can be put, for example, at the basis of perturbative expansions, giving rise to terms expressed through Feynman graphs. The appropriately chosen normalization provides automatic cancelation of the vacuum to vacuum graphs. Now we can introduce a formal analytic continuation to the Schwinger points, as previously done for the one-particle system, and obtain the following expression for the analytic continuation of the field time-ordered products, now called Schwinger functions, hðx1 Þ ðxn Þ exp Ui Sðx1 ; . . . ; xn Þ ¼ hexp Ui Here x1 , . . . , xn denote points in Euclidean space, are the Euclidean fields introduced earlier. The chronological time ordering disappears, because the fields are commutative, and there is no distinguished ‘‘time’’ direction in Euclidean space. Here the symbol h. .R.i denotes the expectation values represented by . . . d, as explained earlier, and U is the Euclidean ‘‘action’’ of the system formally given by the integral on Euclidean space Z U ¼ PððxÞÞ dx if the field self-interaction is produced by the polynomial P. Therefore, these formal considerations suggest that the passage from the free Euclidean theory to the fully interacting one is obtained through a change of the free probability measure d to the interacting measure Z exp U d= exp U d Q
The analogy with classical statistical mechanics is evident. The R expression exp U acts as Boltzmannfaktor, and Z = Q exp U d is the partition function. Our task will be to make these statements precise from a mathematical point of view. We will be
obliged to introduce cutoffs, and then be involved in their careful removal. For the sake of convenience, we make the substantial simplification of considering only twodimensional theories (one space, one time dimension in the Minkowski region) for which the well-known ultraviolet problem of quantum field theory gives no trouble. There is no difficulty in translating the contents of the previous sections to the twodimensional case. Let P be a real polynomial, bounded below and normalized to P(0) = 0. We introduce approximations h to the Dirac function at the origin of the two-dimensional Euclidean space R2 , with h 2 Nr . Let hx be the translate of h by x, with x 2 R2 . The introduction of h, equivalent to some ultraviolet cutoff, is necessary, because local fields, of the formal type (x), have no rigorous meaning, and some smearing is necessary. For some compact region in R2 , acting as space cutoff (infrared cutoff), introduce the Q space function Z ðhÞ :Pððhx ÞÞ: dx U ¼
where dx is the Lebesgue measure in R2 . It is immediate to verify that U(h) is well defined, bounded below and belongs to Lp (Q, , ), for any p, 1 p < 1. This is the infrared and ultraviolet cutoff action. Notice the presence of the Wick normal products in its definition. They provide a kind of automatic introduction of counterterms, in the framework of renormalization theory. The following theorem allows us to remove the ultraviolet cutoff. Theorem 4 Let h ! , in the sense that the Fourier ~ are uniformly bounded and converge transforms h pointwise in momentum space to the Fourier transform of the -function given by (2)2 . Then U(h) is Lp -convergent for any p, 1 p < 1, as h ! . Call U the Lp -limit, then U , exp U 2 Lp (Q, , ), for 1 p < 1. The proof uses standard methods of probability theory, and originates from pioneering work of Nelson in (1966). It can be found for example in Guerra et al. (1975), and Simon (1974). Since U is defined with normal products, and the interaction polynomial P is normalized to P(0) = 0, an elementary application of Jensen inequality gives Z
exp U d exp : Q
Z Q
U d ¼ 1
Euclidean Field Theory
Therefore, we can rigorously define the new space cutoff measure in Q space: Z exp U d d ¼ exp U d= Q
The space-cutoff interacting Euclidean theory is defined by the same fields on Q space, but with a change in the measure and, therefore, in the expectation values. The correlations for the interacting fields are the cutoff Schwinger functions 1 Þ ðx nÞ S ðx1 ; . . . ; xn Þ ¼ ðx ¼ Z1 hðx1 Þ ðxn Þ exp U i where the partition function is Z ¼ hexp U i We see that the analogy with statistical mechanics is complete here. Of course, the introduction of the space cutoff destroys translation invariance. The full Euclidean covariant theory must be recovered by taking the infinite-volume limit ! R2 on field correlations. For the removal of the space cutoff, all methods of statistical mechanics are available. In particular, correlation inequalities of ferromagnetic type can be easily exploited, as shown, for example, in Guerra et al. (1975) and Simon (1974). We would like to conclude this section by giving the connection between the space-cutoff Euclidean theory and the space-cutoff Hamiltonian theory in the physical ok space. For ‘ 0, t 0, consider the rectangle in R2 , ‘ ‘ ð‘; tÞ ¼ ðx1 ; x2 Þ: x1 ; 0 x2 t 2 2 and define the operator in the physical ok space P‘ ðtÞ ¼ J0 exp U ð‘; tÞJt where J0 and Jt are injections relative to the lines x2 = 0 and x2 = t, respectively. Then the following theorem, largely due to Nelson, holds. Theorem 5 The operator P‘ (t) is bounded and selfadjoint. The family {P‘ (t)}, for ‘ fixed and t 0, is a strongly continuous semigroup. Let H‘ be its lower bounded self-adjoint generator, so that P‘ (t) = exp (tH‘ ). On the physical ok space, there is a core D for H‘ such that on D the equality H‘ = H0 þ V‘ holds, where H0 is the free Hamiltonian introduced earlier and V‘ is the volume-cutoff interaction given by V‘ ¼ lim
Z
‘=2
‘=2
x ÞÞ : dx1 : Pððh 1
263
where hx1 are the translates of approximations to the -function at the origin on the x1 -space, and the limit is taken in Lp , in analogy to what has been explained for the two-dimensional case in the definition of U . While we refer to Guerra et al. (1975) and Simon (1974) for a full proof, we mention here that boundedness is related to hypercontractivity of the free Hamiltonian, self-adjointness is a consequence of reflexivity, and the semigroup property follows from Markov property. This theorem is remarkable, because it expresses the cutoff interacting Hamiltonian semigroup in an explicit form in the Euclidean theory through probabilistic expectations. In fact, we have Z ð Jt uÞ J0 v exp U ð‘; tÞ d hu; expðtH‘ Þvi ¼ Q
We could call this expression as the Feynman–Kac– Nelson formula, in fact it is nothing but a path integral expressed in stochastic terms, and adapted to the Hamiltonian semigroup. By comparison with the analogous formula given for the free Hamiltonian semigroup, we see that the introduction of the interaction inserts the Boltzmannfaktor under the integral. As an immediate consequence of the Feynman– Kac–Nelson formula, together with Euclidean covariance, we have the following astonishing Nelson symmetry: h0 ; expðtH‘ Þ0 i ¼ h0 ; expð‘Ht Þ0 i which was at the basis of Guerra (1972) and Guerra et al. (1972), and played some role in showing the effectiveness of Euclidean methods in constructive quantum field theory. It is easy to establish, through simple probabilistic reasoning, that H‘ has a unique ground state ‘ of lowest energy E‘ . For a convenient choice of normalization and phase factor, one has k‘ k2 = 1, and ‘ > 0 almost everywhere on Q space (for bosonic systems, ground states have no nodes in configuration space!). Moreover, ‘ 2 Lp , for any 1 p < 1. If ‘ > 0 and the interaction is not trivial, then ‘ 6¼ 0 , E‘ < 0, and k‘ k1 < 1. Obviously, kexp (tH‘ )k2, 2 = exp (tE‘ ). The general structure of Euclidean field theory, as explained in this section, has been at the basis of all applications in constructive quantum field theory. These applications include the proof of the existence of the infinite-volume limit, with the establishment of all Wightman axioms, for two- and three-dimensional theories. Moreover, the existence of phase transitions and symmetry breaking has been firmly established.
264 Euclidean Field Theory
Extensions have also been given to theories involving Fermions, and to gauge field theory. Due to the scope of this review, limited to a description of the general structure of Euclidean field theory, we cannot give a detailed treatment of these applications. Therefore, we refer to recent general reviews on constructive quantum field theory for a complete description of all results (see, e.g., Jaffe (2000)). For recent applications of Euclidean field theory to quantum fields on curved spacetime manifolds we refer, for example, to Schlingemann (1999).
The Physical Interpretation of Euclidean Field Theory Euclidean field theory has been considered by most researchers as a very useful tool for the study of quantum field theory. In particular, it is quite easy, for example, to obtain the fully interacting Schwinger functions in the infinite-volume limit in twodimensional spacetime. At this point, there arises the problem of connecting these Schwinger functions with observable physical quantities in Minkowski spacetime. A very deep result of Osterwalder and Schrader (1973) gives a very natural interpretation of the resulting limiting theory. In fact, the Euclidean theory, as has been shown earlier, arises from an analytic continuation from the physical Minkowski spacetime to the Schwinger points, through a kind of analytic continuation in time (also called Wick rotation, because Wick exploited this trick in the study of the Bethe–Salpeter equation). Therefore, having obtained the Schwinger functions for the full covariant theory, after all cutoff removal, it is very natural to try to reproduce the inverse analytic continuation in order to recover the Wightman functions in Minkowski spacetime. Therefore, Osterwalder and Schrader have been able to identify a set of conditions, quite easy to verify, wich allow us to recover Wightman functions from Schwinger functions. A key role in this reconstruction theorem is played by the so-called reflection positivity for Schwinger functions, a property quite easy to verify. In this way, a fully satisfactory solution for the physical interpretation of Euclidean field theory is achieved. From a historical point of view, an alternate route is possible. In fact, at the beginning of the exploitation of Euclidean methods in constructive quantum field theory, Nelson was able to isolate a set of axioms for the Euclidean fields (Nelson 1973a), allowing the reconstruction of the physical theory. Of course, Nelson axioms are more difficult to
verify, since they also involve properties of the Euclidean fields and not only of the Schwinger functions. However, it is still very interesting to investigate whether the Euclidean fields play only an auxiliary role in the construction of the physical content of relativistic theories, or if they have a more fundamental meaning. From a physical point of view, the following considerations could also lead to further developments along this line. By its very structure, the Euclidean theory contains the fixed-time quantum correlations in the vacuum. In elementary quantum mechanics, it is possible to derive all physical content of the theory from the simple knowledge of the ground state wave function, including scattering data. Therefore, at least in principle, it should be possible to derive all physical content of the theory directly from the Euclidean theory, without any analytic continuation. We conclude this short section on the physical interpretation of the Euclidean theory with a mention of a quite surprising result (Guerra and Ruggiero 1973) obtained by submitting classical field theory to the procedure of stochastic quantization in the sense of Nelson (1985). The procedure of stochastic quantization associates a stochastic process to each quantum state. In this case, in a fixed reference frame, the procedure of stochastic quantization, applied to interacting fields, produces, for the ground state, a process in the physical spacetime that has the same correlations as Euclidean field theory. This opens the way to a possible interpretation of Euclidean field theory directly in Minkowski spacetime. However, a consistent development along this line requires a new formulation of representations of the Poincare´ group in the form of measure-preserving transformations in the probability space where the Euclidean fields are defined. This difficult task has not been accomplished as yet.
Acknowledgments Research connected with this work was supported in part by MIUR (Italian Minister of Instruction, University and Research), and by INFN (Italian National Institute for Nuclear Physics). See also: Axiomatic Quantum Field Theory; Constructive Quantum Field Theory; Feynman Path Integrals; Functional Integration in Quantum Physics; High Tc Superconductor Theory; Malliavin Calculus; Quantum Chromodynamics; Quantum Field Theory: A Brief Introduction; Quantum Fields with Indefinite Metric: Non-Trivial Models; Relativistic Wave Equations including Higher Spin Fields; Renormalization: General Theory; Two-dimensional Models.
Evolution Equations: Linear and Nonlinear
Further Reading Caianiello ER (1973) Combinatorics and Renormalization in Quantum Field Theory. Reading, MA: Benjamin. Glimm J and Jaffe A (1981) Quantum Physics, A Functional Integral Point of View. Berlin: Springer. Guerra F (1972) Uniqueness of the vacuum energy density and van Hove phenomenon in the infinite volume limit for twodimensional self-coupled Bose fields. Physical Review Letters 28: 1213. Guerra F, Rosen L, and Simon B (1972) Nelson’s symmetry and the infinite volume behavior of the vacuum in P()2 . Communication in Mathematical Physics 27: 10. Guerra F, Rosen L, and Simon B (1975) The P()2 Euclidean quantum field theory as classical statistical mechanics. Annuals of Mathematics 101: 111. Guerra F and Ruggiero P (1973) New interpretation of the Euclidean Markov field in the framework of physical Minkowski space-time. Physical Review Letters 31: 1022. Jaffe A (2000) Constructive quantum field theory, available on http://www.arthurjaffe.com. Nakano T (1959) Quantum field theory in terms of Euclidean parameters. Progress in Theoretical Physics 21: 241. Nelson E (1966) A quartic interaction in two dimensions. In: Goodman R and Segal I (eds.) Conference on the Mathematical Theory of Elementary Particles. Cambridge, MA: MIT Press.
265
Nelson E (1973a) Construction of quantum fields from Markoff fields. Journal of Functional Analysis 12: 97. Nelson E (1973b) The free Markoff field. Journal of Functional Analysis 12: 211. Nelson E (1985) Quantum Fluctuations. Princeton, NJ: Princeton University Press. Osterwalder K and Schrader R (1973) Axioms for Euclidean Green’s functions. Communication in Mathematical Physics 31: 83. Schlingemann D (1999) Euclidean field theory on a sphere, available on http://arXiv.org. Schwinger J (1958) On the Euclidean structure of relativistic field theory. Proceedings of the National Academy Sciences 44: 956. Simon B (1974) The P()2 Euclidean (Quantum) Field Theory. Princeton, NJ: Princeton University Press. Streater R and Wightman AS (1964) PCT, Spin and Statistics and All That. New York: Benjamin. Symanzik K (1966) Euclidean quantum field theory, I, Equations for a scalar model. Journal of Mathematical Physics 7: 510. Symanzik K (1969) Euclidean quantum field theory. In: Jost R (ed.) Local Quantum Theory. New York: Academic Press. Wightman AS (1967) An introduction to some aspects of the relativistic dynamics of quantized fields. In: Levy M (ed.) 1964 Carge´se Summer School Lectures. New York: Gordon and Breach.
Evolution Equations: Linear and Nonlinear J Escher, Universita¨t Hannover, Hannover, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction In this article we present the semigroup approach to linear and nonlinear evolution equations in general Banach spaces. In the first part we introduce the general frame and we explain the cornerstones of the widely developed theory of linear evolution equations. Besides the classical approach to linear evolution equations based on C0 -semigroups, we also give a brief introduction to the more recent theory of maximal regularity. The entire linear theory is not only important on its own (which we prove by discussing applications to the heat equation, Schro¨dinger equation, wave equation, and Maxwell equations) but it is also the indispensable basis for the theory of nonlinear evolution equation, which we present in the second part.
Linear Evolution Equations Let E0 be a Banach space, T > 0, and assume that A := {A(t); t 2 [0, T]} is a family of closed linear operators in E0 . By this we mean that, given t 2 [0, T], there is a linear subspace D(A(t)) of E0 and
linear mapping A(t) : D(A(t)) E0 ! E0 such that the graph {(x, A(t)x); x 2 D(A(t))} of A(t) is a closed subspace of E0 E0 . Given a mapping f : [0, T] ! E0 and a vector u0 2 E0 , we study the following initialvalue problem for (A, f , u0 ): find a function u 2 C1 ((0, T], E0 ) such that u(t) 2 D(A(t)) for t 2 (0, T] and u0 ðtÞ ¼ AðtÞuðtÞ þ f ðtÞ;
t 2 ð0; T ;
uð0Þ ¼ u0
½1
Sometimes we call [1] also the Cauchy problem of the linear evolution equation u0 (t) = A(t)u(t) þ f (t). In the following, we will specify different conditions on (A, f , u0 ) which guarantee the well-posedness of [1], and we shall discuss several examples of equations of type [1] which are relevant in mathematical physics. Autonomous Homogeneous Equations
As in the case of ordinary differential equations in finite-dimensional spaces, it is convenient to consider first the autonomous version of [1], that is, we assume that A is trivial in the sense that T = 1 and that A(0) = A(t) for all t 0. In order to simplify our notation, we set A := A(0). We consider first the homogeneous problem u0 ðtÞ ¼ AuðtÞ;
t 2 ð0; 1Þ;
uð0Þ ¼ u0
½2
266 Evolution Equations: Linear and Nonlinear
where u0 2 E0 is given. The question of the wellposedness of [2] is closely tied to the notion of a C0 -semigroup in E0 . Let L(E0 ) denote the Banach space of all bounded linear operators on E0 , endowed with the usual operator norm. A oneparameter family T = {T(t) 2 L(E0 ); t 0} is called ‘‘C0 -semigroup’’ in L(E0 ) iff 1. T(0) = idE0 (normalization), 2. T(s þ t) = T(s)T(t) for all s, t 0 (semigroup property), and 3. limt!0 T(t)x = x for all x 2 E0 (strong continuity at 0). Given a C0 -semigroup T , we define its (infinitesimal) generator B by setting TðtÞx x exists in E0 domðBÞ :¼ x 2 E0 ; lim t!0 t and by defining Bx :¼ lim t!0
TðtÞx x t
for x 2 domðBÞ
This clearly defines a linear operator in E0 and it is well known that B is closed and densely defined. Moreover, we have Theorem 1 Assume that A : D(A) E0 ! E0 is the generator of a C0 -semigroup {T(t); t 0}. Then, given u0 2 D(A), problem [2] possesses a unique solution u in C1 ([0, 1), E0 ), which is given by u(t) = T(t)u0 . Under suitable additional assumptions it can be shown that the converse of Theorem 1 also holds true. However, we shall not go into these details but we prefer to present the following characterization of generators of C0 -semigroups: Theorem 2 (Hille–Yosida). The operator A : D(A) E0 ! E0 generates a C0 -semigroup iff it is closed, densely defined, and there exists !, M 2 R such that the resolvent set (A) of A contains the ray (!, 1) and such that k( !)n ( A)n k M for all > ! and all n 2 N. In applications, it is in general rather difficult to derive a uniform estimate of powers of the resolvent of an unbounded operator. Luckily, generators of C0 -semigroups of contractions (i.e., kT(t)kL(E0 ) 1 for all t 0) can be characterized in a rather useful way. To formulate this result we call an operator B : D(B) E0 ! E0 ‘‘dissipative’’ iff for any x 2 D(B) there is an x0 2 E00 with hx0 , xi = kxk2E0 = kx0 k2E0 0 such that Rehx0 , Bxi 0. Here h , i denotes the duality pairing between E00 and E0 . The operator B is called ‘‘m-dissipative’’ if it is dissipative and im(0 A) = E0 for some 0 > 0.
Theorem 3 (Lumer–Phillips). Let A : D(A) E0 ! E0 be a closed and densely defined operator. Then A generates a C0 -semigroup of contractions in L(E0 ) iff A is m-dissipative. Before we shall discuss examples of C0 -semigroups and their infinitesimal generators, let us introduce the following definition: given 2 (0, ], let := {z 2 C; jarg (z)j < } denote the sector in C of angle 2. A family of operators T = {T(z) 2 L(E0 ); z 2 } is called a ‘‘holomorphic C0 -semigroup’’ in L(E0 ) iff 1. [z 7! T(z)] : ! L(E0 ) is holomorphic, 2. T(0) = idE0 and limz!0 T(z)x = x for all x 2 E0 , and 3. T(w þ z) = T(w)T(z) for all w, z 2 . Generators of holomorphic C0 -semigroups can be characterized in the following way: Theorem 4 A densely defined closed linear operator A : D(A) E0 ! E0 generates a holomorphic C0 -semigroup iff there exist M > 0 and !0 0 such that 2 (A) and k( A)1 k M for all 2 C with Re > !0 . Examples 5 (i) Self-adjoint generators. Let E0 be a Hilbert space and assume that A is self-adjoint and that there exists an 0 2 R such that A 0 . Then A generates a holomorphic C0 -semigroup {T(t); t 0}. If {EA (); 2 R} R denotes the spectral resolution of A, then T(t) = R exp (t) dEA () for t 0. (ii) Dissipative operators in Hilbert spaces. Assume again that E0 is a Hilbert space. Then, by Riesz’ representation formula, an operator A is dissipative iff Re(ujAu) 0 for all u 2 D(A). (iii) The heat semigroup. Let M be either a smooth compact closed Riemannian manifold or Rm with the Euclidean metric and write for the Laplace–Beltrami operator on M. Then it is known that 2 L(D0 (M)), where D0 (M) is the space of all distributions on M. Given 1 p < 1, let Dðp Þ :¼ fu 2 Lp ðMÞ; u 2 Lp ðMÞg and set p u = u for u 2 D(p ). Then p generates a holomorphic C0 -semigroup on Lp (M), the so-called ‘‘diffusion’’ or ‘‘heat semigroup’’ on M. If 1 < p < 1, then it can be shown that D(p ) = Wp2 (M), where Wpk (M) denotes the Sobolev space of order k 2 N, built over Lp (M). If M = Rm then the operators T(t) of the semigroup generated by Rm are given by ! Z 1 jx yj2 TðtÞuðxÞ ¼ exp uðyÞ dy 4t ð4tÞm=2 Rm for all t > 0 and almost all x 2 Rm .
Evolution Equations: Linear and Nonlinear
Observe that the case L1 (M) is excluded here. In fact, it is known that if a linear operator A generates a C0 -semigroup on L1 (M), then A must be bounded. However, it can be shown that suitable realizations of the Laplace–Beltrami operator on spaces of continuous and Ho¨lder continuous functions generate holomorphic semigroups. For more details on that topic the reader is referred to the ‘‘Further reading’’ section. (iv) Stone’s theorem and the Schro¨dinger equation. Let E0 be a Hilbert space and assume that A is self-adjoint. Then Theorem 3 and Remark (ii) imply that iA generates a C0 -group {U(t); t 2 R} of unitary operators. In fact, Stone’s theorem ensures that every generator of a C0 -group of unitary operators is of the form iA with a self-adjoint operator A. As an example of particular interest, let us consider the Schro¨dinger equation 1 @u ¼ u Vu i @t
½3
with a bounded potential V : Rm ! R. Letting D(A) := H 2 (Rm ) and Au := u Vu, it follows that A is self-adjoint in L2 (Rm ). Hence, the evolution of [3] is governed by the group of unitary operators generated by iA. Of course, the assumption that V be bounded is rather restrictive. In fact, there are numerous contributions which show that this assumption can be weakened considerably. Again reader is referred to the ‘‘Further reading’’ section for more details in this direction. (v) The wave equation. Let us consider the following initial-value problem &uðt; xÞ
m
¼ 0;
x2R ;
uð0; xÞ ¼ ’1 ðxÞ; ¼ ’2 ðxÞ;
t>0
@u=@tð0; xÞ x 2 Rm
½4
for the d’Alembert operator & = @ 2 u=@t2 Rm in m þ 1 dimensions. In order to associate with [4] a semigroup, let us formally re-express [4] as the following first-order system: dU ¼ AU; dt
t > 0;
Uð0Þ ¼
where U ¼ ðu; u0 Þ;
A¼
0
id ; ¼ ð’1 ; ’2 Þ 0
and Letting now E0 := H 1 (Rm ) L2 (Rm ) m 2 D(A) := H (R ) H 1 (Rm ), it can be shown that A generates a C0 -group of linear operators in L(E0 ). Hence, given any initial datum (’1 , ’2 ) 2 H 2 (Rm ) H 1 (R m ), there exists a unique solution u 2 C1 ([0, 1), L2 (Rm )) to the initial-value problem [4]. It
267
can be shown that this solution possesses the following additional regularity: u 2 C2 ð½0; 1Þ; L2 ðRm ÞÞ \ Cð½0; 1Þ; H 2 ðR m ÞÞ Hence, eqns [4] are satisfied for all t 2 [0, 1) and for almost all x 2 Rm . (vi) Maxwell equations. Let E and H denote the electric and magnetic field vector, respectively, " and the electrical permittivity and magnetic permeability, respectively, and consider the initial-value problem for Maxwell equations in vacuum and without charges and currents: given sufficiently smooth vector fields (E0 , H 0 ) find a pair (E, H) such that @E rot H ¼ 0 in ð0; 1Þ R3 @t @H þ rot E ¼ 0 in ð0; 1Þ R3 @t Eð0; Þ ¼ E0 ; Hð0; Þ ¼ H 0 in R3
"
½5
We assume that " and belong to L1 (R 3 , Lsym (R3 )) and are uniformly positive definite, that is, we assume that there are "0 > 0 and 0 > 0 such that ð"ðxÞyjyÞ "0 jyj2 ;
ððxÞyjyÞ 0 jyj2
for all x, y 2 R3 . Based on these assumptions we endow the space L2 (R 3 ) L2 (R3 ) with the inner product ððu1 ; u2 Þjðv1 ; v2 ÞÞ :¼ ð"u1 jv1 ÞL2 þ ðu2 jv2 ÞL2 for (u1 , u2 ), (v1 , v2 ) 2 L2 (R 3 ) L2 (R3 ), and call this Hilbert space E0 . We further set E1 :¼ fðu1 ; u2 Þ 2 E0 ; ðrot u1 ; rot u2 Þ 2 E0 g Finally, given u = (u1 , u2 ) 2 E1 , let Au :¼ "1 rot u2 ; 1 rot u1 It can be shown that iA is self-adjoint in E0 . Hence, Stone’s theorem ensures that A generates a C0 -group of unitary operators in L(E0 ). Therefore, given (E0 , H 0 ) 2 E1 , there exists a unique solution (E(), H()) of [5]. For this solution, the energy functional Z 1 ð"EðtÞjEðtÞÞR3 þ ðHðtÞjHðtÞÞR3 dx EðtÞ ¼ 2 R3 is constant on [0, 1). Autonomous Inhomogeneous Equations
Next, we study problem [1] in the case A(t) = A for all t 2 [0, T). Throughout this section we assume that the following minimal hypotheses 1. A generates a C0 -semigroup in L(E0 ), 2. f 2 L1 ((0, T), E0 ), and 3. u0 2 E0
268 Evolution Equations: Linear and Nonlinear
are satisfied. Later on we shall discuss several more restrictive assumptions on (A, f , u0 ). A function u : [0, T] ! E0 is called a ‘‘(classical) solution’’ of u0 ðtÞ ¼ AuðtÞ þ f ðtÞ;
t 2 ð0; T;
uð0Þ ¼ u0
½6
iff u 2 C([0, T], E0 ) \ C1 ((0, T], E0 ), u(t) 2 D(A) for all t 2 (0, T], and u satisfies [6] pointwise on [0, T]. It can be shown that [6] has at most one solution. If it has a solution, this solution is represented by the following variation-of-constant-formula: Z t uðtÞ ¼ TðtÞu0 þ Tðt sÞf ðsÞ ds; t 2 ½0; T ½7 0
where {T(t); t 0} denotes the semigroup generated by A. Observe that the function u : [0, T] ! E0 , defined by [7], is continuous, but in general not differentiable on (0, T]. For this reason one calls [7] the ‘‘mild solution’’ of [6]. It is not difficult to see that if u0 2 D(A) and f 2 C1 ([0, T], E0 ), then the mild solution is a classical solution, that is, [6] is uniquely solvable in the classical sense. In application to nonlinear problems, the assumption f 2 C1 ([0, T], E0 ) is often too restrictive. Fortunately, in the case of generators of holomorphic semigroups, this assumption on f can be weakened in two different directions. Let kxkA := kxkE0 þ kAxkE0 denote the graph norm on D(A). Then the closedness of A implies that (D(A); k kA ) is a Banach space. In the following, we call this Banach space E1 . Moreover, given 2 (0, 1), we write E = (E0 , E1 ) for the complex interpolation space between E0 and E1 . Then we have the following result. Theorem 6 Let A generate a holomorphic C0 -semigroup in L(E0 ) and assume that there is a constant 2 (0, 1) such that
again draw the reader’s attention to the ‘‘Further reading’’ section. The Banach space E0 is called an unconditionality of martingale ‘‘differences’’ (UMD) space if the Hilbert transform is bounded on Lq (R, E0 ) for some q 2 (1, 1). It is known that Hilbert spaces, the Lebesgue spaces Lp (X, d) with 1 < p < 1 and with a -finite measure space (X, ), and closed subspaces of UMD spaces are UMD spaces. Furthermore, UMD spaces are without exception reflexive. Thus, the spaces L1 (X, d), L1 (X, d), and spaces of continuous or Ho¨lder continuous functions are not UMD spaces. Next, assume that A generates a holomorphic C0 -semigroup in L(E0 ) and that [0, 1) (A). Then, it is known that, given z 2 C, the fractional power Az of A is a densely defined closed operator in E0 . We say that A has bounded imaginary powers (BIP) of angle 0 if there exist positive constants M and " such that Ait 2 LðE0 Þ
and
t 2 ð"; "Þ
½8
In order to have a neat notation, we write A 2 BIP( ) if [8] holds true. Remarks 7 In the following, we assume that A generates a holomorphic C0 -semigroup in L(E0 ) and that [0, 1) (A). (i) If Re z < 0, then Az is bounded on E0 . (ii) There are several representation formulas for the fractional powers of A. Among them we picked the following: if Re z 2 (1, 1) and x 2 D(A), then Az x ¼
sinðzÞ z
f 2 C ð½0; T; E0 Þ þ Cð½0; T; E Þ Then, given u0 2 E0 , the Cauchy problem [6] possesses a unique classical solution. It is given by Z t uðtÞ ¼ TðtÞu0 þ Tðt sÞf ðsÞ ds; t 2 ½0; T 0
where {T(t); t 0} stands for the semigroup generated by A. In the following, we discuss an alternative approach to the Cauchy problem [6], which is based on the so-called theory of maximal regularity. There are several different types of results on maximal regularity, which we cannot discuss in full detail here. We decided to give a brief introduction to the theory of the so-called ‘‘maximal Lp -regularity.’’ For further results on maximal regularity, we
kAit kLðE0 Þ M expð jtjÞ
Z
1
sz ðs þ AÞ2 Ax ds
0
(iii) Assume that E0 is a Hilbert space, that A is selfadjoint, and that there is a positive constant such that A . Further, let {EA () 2 R} be the spectral resolution of A; then Z 1 Az :¼ z dEA ðÞ; z 2 C 0
Moreover, A 2 BIP(0). (iv) Let again E0 be a Hilbert space and assume that A is m-dissipative and satisfies 0 2 (A). Then A 2 BIP(=2). Given p 2 (1, 1), Sobolev’s embedding theorem ensures that Wp1 ((0, T), E0 ) is continuously injected into C([0, T], E0 ). Consequently, given any function u 2 Wp1 ((0, T), E0 ) and t 2 [0, T], the pointwise
Evolution Equations: Linear and Nonlinear
evaluation u(t) is well defined. In particular, the trace at 0 with respect to time tr : Wp1 ðð0; TÞ; E0 Þ ! E0 ;
u 7! uð0Þ
is a well-defined and bounded linear operator. In order to formulate the next result, let Es,p = (E0 , E1 )s,p , with p 2 (1, 1) and s 2 (0, 1), denote the real interpolation space between the basic space E0 and E1 , the domain D(A) of A, endowed with the graph norm. Furthermore, we set E0 :¼ Lp ðð0; TÞ; E0 Þ E1 :¼ Lp ðð0; TÞ; E1 Þ \ Wp1 ðð0; TÞ; E0 Þ and we write Isom(E, F) for the set of all topological isomorphisms mapping the Banach space E onto the Banach space F. Theorem 8 (Dore and Venni). Suppose that E0 is a UMD space and that A 2 BIP( ) for some 2 [0, =2). Then, given p 2 (1, 1), we have ð@t þ A; trÞ 2 IsomðE1 ; E0 E11=p;p Þ This means that, given (f , u0 ) 2 Lp ((0, T), E0 ) E11=p,p , there exists a unique solution u 2 Lp ((0, T), E1 ) \ Wp1 ((0, T), E0 ) of the Cauchy problem [6]. Moreover, u depends continuously on (f , u0 ) and fulfills the following a priori estimate: kukE1 cðkf kE0 þ ku0 kE11=p;p Þ where c := k(@t þ A, tr)1 kL(E0 E11=p, p , E1 ) .
269
is stable, since any m-dissipative operator B satisfies the estimate k( B)1 k 1= for all > 0. It turns out that the stability of a family of generators is not sufficient to construct a solution of [1] even in the case f 0. We also need a certain time regularity of the mapping t 7! A(t). For this we say that the family {A(t); t 2 [0, T]} has a common domain D iff D is a dense subspace of E0 such that D(A(t)) = D for all t 2 [0, T]. The family {A(t); t 2 [0, T]} is called ‘‘strongly differentiable’’ iff it has a common domain D and, given v 2 D, the function t 7! A(t)v belongs to C1 ([0, T], E0 ). We are now prepared to formulate the following result. Theorem 9 (Kato). Let {A(t); t 2 [0, T]} be a stable and strongly differentiable family of generators of C0 -semigroups with common domain D. If f 2 C1 ([0, T], E0 ) and u0 2 D then [1] possesses a unique classical solution. The above result is based on the construction of an evolution operator U(t, s), which can be considered as the generalization of the notion of a C0 -semigroup for autonomous equations to the case evolution equations of the form u0 ðtÞ ¼ AðtÞuðtÞ;
t 2 ðs; T;
uðsÞ ¼ v
for fixed s 2 [0, T). Once an evolution operator is available, the solution of [1] is given by Z t uðtÞ ¼ Uðt; 0Þu0 þ Uðt; sÞf ðsÞ ds; t 2 ½0; T 0
Nonautonomous Equations of Hyperbolic Type
According to Theorem 1 and the corresponding remark, it is reasonable to impose in the study of the Cauchy problem [1] the minimal hypothesis that, given s 2 [0, T], each individual operator A(s) be the generator of a C0 -semigroup {Ts (t); t 0} in L(E0 ). If this semigroup is holomorphic, we call [1] of ‘‘parabolic type.’’ Otherwise the evolution equation [1] is said to be of ‘‘hyperbolic type.’’ A family {A(t); t 2 [0, T]} of generators of C0 -semigroups in L(E0 ) is called ‘‘stable’’ iff there exist positive constants M and ! such that (!, 1) (A(t)) for all 2 [0, T] and such that
Y k
1
k for > !
ð Aðtj ÞÞ Mð !Þ
j¼1 and every finite sequence 0 t1 t2 tk T with k 2 N. Observe that the resolvent operators ( A(tj ))1 do not commute in general. Therefore, the order of the terms on the left-hand side of the above estimate has to be obeyed. Assume that A = {A(t); t 2 [0, T]} is a family of m-dissipative operators. Then, A
Of course, this generalizes [7] and if A(t) is independent of t, then U(t, s) = T(t s), where {T(t); t 0} is the semigroup generated by A(0). Furthermore, there are several extensions of the Kato’s result. Among them the most interesting contributions are concerned to weaken the time regularity of f and to weaken the assumption that {A(t); t 2 [0, T]} be strongly differentiable. In particular, it is possible to study [1] for families without a common domain. For the construction of evolution operators as well as generalizations of Theorem 9, the reader is again referred to the ‘‘Further reading’’ section. Nonautonomous Equations of Parabolic Type
Throughout this section we assume that E0 and E1 are Banach spaces such that E1 is dense and continuously injected in E0 . In the study of parabolic evolution equations, the class of all operators in L(E1 , E0 ), considered as unbounded operators in E0 with common domain E1 , which generate holomorphic C0 -semigroups in L(E0 ) has turned out to
270 Evolution Equations: Linear and Nonlinear
be very useful. In the following, we call this class H(E1 , E0 ). It is known that A 2 H(E1 , E0 ) iff there exist constants ! > 0 and 1 such that ! A 2 Isom(E1 , E0 ) and such that
1
kð AÞxk0
jjkxk0 þ kxk1
x 2 E1 nf0g;
Re !
where k kj denotes the norm of Ej . Using the above characterization, it can be shown that H(E1 , E0 ) is an open subset of L(E1 , E0 ). In the following, we always endow H(E1 , E0 ) with the topology induced by the norm of L(E1 , E0 ). As a consequence of this convention it is meaningful to consider, for example, continuous mappings from [0, T] into H(E1 , E0 ). Observe that if A 2 C([0, T], H(E1 , E0 )), then A = {A(t); t 2 [0, T]} is a family of generators of holomorphic semigroups with the common domain E1 . Then we have the following result. Theorem 10 (Sobolevskii, Tanabe). there is a 2 (0, 1) such that
Assume that
ðA; f Þ 2 C ð½0; T; HðE1 ; E0 Þ E0 Þ Then, given u0 2 E0 , the Cauchy problem [1] possesses a unique classical solution u. This solution has the additional regularity u 2 C ðð0; T; E1 Þ \ C1þ ðð0; T; E0 Þ 1
Finally, if u0 2 E1 , then u 2 C ([0, T], E0 ). As in the hyperbolic case, the proof of Theorem 10 is based on the evolution operator U(t, s) for the homogeneous problem, although the constructions of the corresponding evolution operators are completely different. In addition, there are several extensions and generalizations of Theorem 10. In particular, the assumption that the family {A(t); t 2 [0, T]} possesses a common domain can be weakened considerably. Furthermore, it is possible to look at parabolic evolution equations in the so-called interpolation and extrapolation scales. This offers a great flexibility in the study of nonlinear problems. Further details in this direction can be found in the ‘‘Further reading’’ section.
Cauchy problem for the following nonlinear evolution equation u0 ðtÞ ¼ Fðt; uðtÞÞ;
t 2 ð0; T;
uð0Þ ¼ u0
½9
in the Banach space E0 . We will always assume that the nonlinear operator F either carries a quasilinear structure or is of fully nonlinear parabolic type. By a ‘‘quasilinear structure,’’ we mean that there is mapping A 2 C([0, T] V, L(E1 , E0 )) and a suitable ‘‘lower-order term’’ f 2 C([0, T] V, E0 ) such that Fðt; vÞ ¼ Aðt; vÞv þ f ðt; vÞ for all ðt; vÞ 2 ½0; T V Problem [9] is of fully nonlinear parabolic type if F 2 C1 ([0, T] V, E0 ) and if the Fre´chet derivative D2 F(0, u0 ) of F with respect to v at (0, u0 ) belongs to the class H(E1 , E0 ). Quasilinear Evolution Equations of Hyperbolic Type
Assume that E0 is a reflexive Banach space and let u0 2 V E1 be chosen as above. We consider the following abstract quasilinear evolution equation of hyperbolic type: u0 ðtÞ ¼ Aðt; uðtÞÞuðtÞ þ f ðt; uðtÞÞ; t 2 ð0; T uð0Þ ¼ u0
½10
and assume that the following hypotheses are satisfied: (H1) A 2 C([0, T] V, L(E1 , E0 )) is bounded on bounded subsets of V and, given (t, v) 2 [0, T] V, the operator A(t, v) is m-dissipative and there is a constant A such that kAðt; vÞ Aðt; wÞkLðE1 ;E0 Þ A kv wkE0 for all t 2 [0, T] and all v, w 2 V. (H2) There is a Q 2 Isom(E1 , E0 ) such that QA(t, v)Q1 = A(t, v) þ B(t, v), where B(t, v) 2 L(E0 ) is bounded, uniformly on bounded subsets of V. Moreover, kBðt; vÞ Bðt; wÞkLðE0 Þ B kv wkE1 for all t 2 [0, T] and all v, w 2 V. (H3) f 2 C([0, T] V, E1 ) is bounded on bounded subsets of V and there are 0 and 1 such that
Nonlinear Evolution Equations Let E0 , E1 be Banach spaces such that E1 is dense and continuously embedded in E0 . Assume further that u0 2 E1 and that we are given a nonlinear operator F 2 C([0, T] V, E0 ), where V is an open neighborhood of u0 in E1 . In this section, we will discuss the well-posedness of the
kf ðt; vÞ f ðt; wÞkEj j kv wkEj for all v; w 2 V; j 2 f0; 1g Then we have the following result. Theorem 11 (Kato). Assume that (H1 ), (H2 ), and (H3 ) are satisfied. Then there is a maximal
Evolution Equations: Linear and Nonlinear
tþ 2 (0, T], depending only on ku0 kE1 , and a unique solution u to[10] such that u ¼ uð; u0 Þ 2 Cð½0; tþ Þ; VÞ \ C1 ð½0; tþ Þ; E0 Þ
There are many applications of Theorem 11 to different concrete partial differential equations (PDEs), including symmetric hyperbolic firstorder systems, the Korteweg–de Vries equation, nonlinear elastodynamics, quasilinear wave equations, Navier–Stokes and Euler equations, and coupled Maxwell–Dirac equations. We decided to explain in some detail an application to the so-called periodic Camassa–Holm equation: ut uxxt þ 3uux ¼ 2ux uxx þ uuxxx ½11
where S1 stands for the unit circle. In the above model, the function u is the height of a unilinear water wave over a flat bottom. Set X := L2 (S1 ),V := H1 (S1 ), and Q := (I @x2 )1=2 . With y := u uxx , eqn [11] can be re-expressed as yt þ ðQ2 Þyx ¼ 2yðQ2 yÞx
in L2 ðS1 Þ
which is of type [10] with AðyÞ ¼ ðQ2 yÞ@x ;
f ðyÞ ¼ 2yðQ2 yÞx ;
possesses a unique classical solution u :¼ uð; u0 Þ 2 Cð½0; tþ Þ; X Þ \ C1 ðð0; tþ Þ; E0 Þ Assume A and f are independent of t and let u( , u0 ) be the solution to corresponding autonomous problem
Moreover, the mapping u0 7! u( , u0 ) is continuous from V to C([0, tþ ), V) \ C1 ([0, tþ ), E0 ).
t > 0; x 2 S1
271
y2V
where dom(A(y)) := {v 2 L2 (S1 ); (Q2 y)v 2 H 1 (S1 )}. Quasilinear Evolution Equations of Parabolic Type
Assume that E0 and E1 are Banach spaces such that E1 is dense and continuously injected in E0 . Moreover, let ( , )0 for each 2 (0, 1) be an admissible interpolation functor (e.g., the real or complex interpolation functor) and set E := (E0 , E1 ) for 2 (0, 1). Given a subset X E for some 2 (0, 1), we set X := X \ E for 2 [0, 1], equipped with the topology induced by E . Finally, we write C1 (M, N) for the class of all locally Lipschitz continuous functions mapping the metric space M into the metric space N. Theorem 12 (Amann). Suppose that 0 < < < 1, that X is open in E , and that 1
ðA; f Þ 2 C ð½0; T X ; HðE1 ; E0 Þ E Þ Then, given u0 2 X , there exists a unique maximal tþ 2 (0, T], such that the quasilinear parabolic Cauchy problem u0 ðtÞ ¼ Aðt; uðtÞÞuðtÞ þ f ðt; uðtÞÞ; t 2 ð0; T; uð0Þ ¼ u0
u0 ðtÞ ¼ AðuðtÞÞuðtÞ þ f ðuðtÞÞ; t 2 ð0; 1Þ; uð0Þ ¼ u0 Then the mapping (t, u0 ) 7! u(t, u0 ) is a semiflow on X . Due to its clarity and flexibility, Theorem 12 has found a plethora of applications, which we cannot discuss in detail here. Let us at least mention the following: reaction–diffusion systems, population dynamics, phase transition models, flows through porous media, Stefan problems, and nonlinear and dynamic boundary conditions in boundary-value problems. In addition, many geometric evolution equations fall into the scope of Theorem 12. Consider, for example, the volume-preserving gradient flow of the area functional of a compact hypersurface M in Rmþ1 with respect to L2 (M) and W21 (M), respectively. These flows are known as the averaged mean curvature flow and the surface diffusion flow, respectively, and have been investigated on the basis of Theorem 12. Fully Nonlinear Evolution Equations of Parabolic Type
Based on the theory of maximal regularity for linear evolution equations, it is possible to investigate abstract fully nonlinear parabolic problems of type [9]. As there are different techniques of maximal regularity, there are also different approaches to [9]. We present here a result which uses maximal regularity properties in singular Ho¨lder spaces C
. Let E0 and E1 be Banach spaces such that E1 is continuously embedded into E0 (density of E1 in E0 is not needed here). As before, V is an open subset of E1 and D2 F stands for the Fre´chet derivative of F(t, v) with respect to the second variable. Theorem 13 (Lunardi). Assume that F 2 C2 ([0, T] V, E0 ) such that D2 F 2 C1 ([0, T] V, H(E1 , E0 )). Then, given u0 2 V, there is a maximal tþ 2 (0, T] such that problem [9] has a solution u 2 C([0, tþ ), E1 ) \ C1 ([0, tþ ), E0 ). This solution is unique in the class [
C
ðð0; tþ "; E1 Þ \ Cð½0; tþ "; E1 Þ
0< 0. One finds that S(k) = S0 (k)(1 þ A (k)) with jA (k)j Cjj, that is, the interaction has essentially no influence on the physical properties of the system at high temperatures.
Landau Fermi Liquids We consider next an intermediate region of temperatures, that is, ea=jj T jj for some constants a, . In this region, the naive expansion in power series of fails and other techniques, such as renormalization group, are necessary. Such a method allows us to perform a suitable resummation of the naive power series in , and one gets, for small enough, T ea=jj and "(k) = jkj2 =2m, ^SðkÞ ¼
1 1 þ A ðkÞ ZðÞ ik0 þ vF ðÞ½jkj kF ðÞ
½9
where Z() = 1 þ z(), vF () = hkF =m þ v(), and kF () = kF þ (), with z() = O(2 ), () = O(), v() = O(2 ), and z(), (), vF () essentially temperature independent; moreover, jA (k)j is O(). The above formula has been proved rigorously for d = 2 (see Rivasseou (1994), and references therein); for d = 3, it has been proved at the level of formal perturbation theory (Benfatto and Gallavotti 1995). The case "(k) = jkj2 =2m is quite special, as the shape
Fermionic Systems 305
of the interacting Fermi surface is fixed by the rotation-invariant symmetry; it is necessarily circular (d = 2) or spherical (d = 3), whereas in general the interaction can also modify its shape. For d = 2, if the interacting Fermi surface is symmetric, smooth and convex, a formula like [9] still holds (with a function kF (, k) replacing kF ()) up to exponentially small temperatures (see references in Gentile and Mastropietro (2001)). It is apparent from [9] that one cannot derive such a formula from a power-series expansion in ; by expanding [9] as a series in , one immediately finds that the nth term is O(n n ), which means that the naive perturbative expansion cannot be convergent up to exponentially small temperatures. It can be derived only by selecting and resumming some special class of terms in the original expansion. A peculiar property of [9] is that the wave function renormalization Z() is essentially independent of the temperature. Such temperature independence is a consequence of cancellations in the perturbative series essentially due to the curvature of the Fermi surface. For d = 1, a formula similar to [9] is also valid; however, such cancellations are not present and one finds Z() = 1 þ O(2 log ). Comparing S(k) given by [9] with the Fourier transform S0 (k) of [6], we note that the Schwinger function of the interacting system is still very similar to the Schwinger function of a free Fermi gas, with physical parameters (e.g., the Fermi momentum, the wave function renormalization, or the Fermi velocity) which are changed by the interaction. This property is quite remarkable: the eigenstates cannot be constructed when = 0 starting from the singleparticle states but, nevertheless, the physical properties of the interacting system (which can be deduced from the Schwinger functions) are qualitatively very similar to the ones of the free Fermi gas, although with different parameters; this explains why the free Fermi gas model works so well to explain the properties of crystals, although one neglects the interactions between fermions which are, of course, quite relevant. A fermionic system with such a property is called a Landau Fermi liquid (see, e.g., Arbikosov et al. 1965, Mahan 1990, Pines 1961), after Landau, who postulated in the 1950s that interacting systems may evolve continuously from the free system in many cases. It was generally accepted that metals in this range of temperatures were all Landau Fermi liquids (except one-dimensional systems). However, the experimental discovery of the high-Tc superconductors (see, e.g., Anderson (1997)) has changed this belief, as such metals in their normal state, that is, above Tc are not Landau Fermi liquids; their
wave function renormalization behaves like 1 þ O(2 log ) instead of 1 þ O(2 ) as in Landau Fermi liquid. This behavior has been called marginal-Fermi-liquid behavior and many attempts have been devoted to predict such behavior from [7]. In order to see deviations from Fermi liquid behavior, one could consider Fermi surfaces with flat or almost flat sides or corners (which are quite possible; e.g., in a square lattice with one conduction electron per atom, such as in the ‘‘half-filled Hubbard model’’). Let us finally consider the last regime, that is, temperatures lower than O(ea=jj ). Except for very exceptional cases (e.g., asymmetric Fermi surfaces, i.e., such that "(k) 6¼ "( k) except for a finite number of points, in which Fermi liquid behavior is found down to T = 0 (Feldman et al. 2002)), a strong deviation from Fermi liquid behavior is observed; the interacting Schwinger function is not similar to the free one and the physical properties in this regime are totally new.
One-Dimensional Systems up to T = 0 The only case in which the Schwinger functions of the Hamiltonian [3] can be really computed down to T = 0 occurs for d = 1; in such a case, an expression like [9] is not valid anymore and the system is not a Fermi liquid. On the contrary, when u = 0 and for small repulsive > 0, one can prove, for spinning fermions (see Benfatto and Gallavotti (1995), Gentile and Mastropietro (2001) and references therein) that h iðÞ k20 þ v2F ðÞðjkj kF ðÞÞ2 ^SðkÞ ¼ ½1 þ A ðkÞ ik0 þ vF ðÞ½jkj kF ðÞ ½10 where kF () = kF þ O() and () = a2 þ O(3 ) is a critical index. This means that the interaction changes qualitatively the nature of the singularity at the Fermi surface; S(k) is still diverging at the Fermi surface but with an exponent which is no longer 1 but is 1 2(), with () a nonuniversal (i.e., -dependent) critical index. As a consequence, the physical properties are different with respect to the free Fermi gas; for instance, the occupation number nk is not discontinuous at k = kF () when T = 0. Nonuniversal critical indices appear in all the other response functions. Fermionic systems behaving in this way are called Luttinger liquids, as they behave like the exactly solvable Luttinger model describing relativistic spinless fermions with linear dispersion relation. The solvability of this model, due to Mattis and Lieb (1966), relies
306 Fermionic Systems
on the possibility of mapping its Hamiltonian in a system of free bosons. Such a mapping is not possible for the Hamiltonian [3], which is not solvable; however, one can use renormalization group methods and suitable Ward identities to show that its behavior is similar to the Luttinger model (in a sense, one makes perturbation theory not around the free Fermi gas, but around the Luttinger model). If we take into account the interaction with an external periodic potential with period a, that is, consider u 6¼ 0, we find that if kF 6¼ n=a, then the Schwinger function behaves essentially like [8]. On the contrary, in the filled-band case, kF = n=a, one finds that there is still an energy gap which becomes O(u1þ1 ) with 1 = O(); this means that the renormalization of the gap is described by a critical 1þ1 index; moreover, S(x) ’ O(ejuj jxj ). A similar behavior is also observed in the presence of quasiperiodic potential. In the attractive case, < 0, u = 0, the behavior is much less understood; it is believed that the interaction produces a gap in the spectrum which is nonanalytic in , and S(x) shows an exponential decay rather than a powerlaw decay, and the interaction converts the system from a metal to an insulator. Finally, it is remarkable that a large variety of models, like Heisenberg spin chains or bidimensional classical statistical mechanics models, such as the eight-vertex or the Ashkin–Teller model, can be mapped into interacting d = 1 fermionic systems, and consequently their critical behavior can be understood by using fermionic techniques (see Gentile and Mastropietro (2001), and references therein).
Superconductors The theory up to T = 0 for d = 2, 3 systems with dispersion relation jkj2 =2m is based only on approximate computations, predicting the phenomenon of superconductivity. According to the theory of Bardeen, Cooper, and Schrieffer (BCS theory), the interaction between fermions leads to the formation of a gap in the energy spectrum, below the critical temperature. There are many ways to derive the BCS theory. One is based on the fact that one verifies, by perturbative computations, that the effective interaction is stronger when the four momenta of the fermions are such that k1 ’ k3 and k2 ’ k4 . This suggests, heuristically, to replace in [7] with 1 X þ þ
BCS ¼ 3d k; k; k0 ; 0 k0 ; 0 L 0 k;k
which is an interaction between pairs of electrons with opposite spin and momenta, which are called Cooper pairs. Replacing with BCS has the great advantage that it makes the Schwinger functions exactly computable and explains the mechanism of superconductivity in many metals (but not in the recently discovered high-Tc superconductors). On the other hand, proving that [7] with or BCS has a similar behavior is still an important open problem. The two-point Schwinger function in the model with BCS can be written, after the so-called Hubbard–Stratonovitch transformation, as R ^Sðk0 ; kÞ ¼ Ld
ik0 "ðkÞþ Ld ðuÞ e k20 þ"2 ðkÞþu2
R
e
Ld ðuÞ
du
du
½11
where (u) is a function with a global minimum in u = 0 for repulsive interactions < 0, whereas for > 0 and sufficiently small temperatures (for T Tc , with Tc = O(ea=jj )), it has the form of a double well with two minima at u = with = O(ea=jj ); for T greater than Tc , there is only a global minimum at u = 0. By the saddle-point theorem, we find, for T Tc and < 0, lim SðkÞ ¼
L!1
ik0 "ðkÞ þ k20 þ ð"ðkÞ Þ2 þ 2
½12
The physical properties predicted by [12] are completely different with respect to the free case: the occupation number is continuous, there is an energy gap in the spectrum, the specific heat is O(e T ) and the phenomenon of superconductivity appears. The fact that the interaction generates a gap is called mass generation; a similar mechanism appears in particle theory.
Conclusions Many other physical phenomena, observed experimentally, can be essentially understood by studying fermionic systems, but a clear mathematical comprehension is still lacking. We mention: the Kondo effect, that is, the resistance minimum observed in some metals due to magnetic impurities; Mott transition, in which a strong interaction produces an insulating state in a system which should be conductors; antiferromagnetism; fractional quantum Hall effect, and many others. We can say that the situation in this area of study reminds one of the classical mechanics at the end of the nineteenth century; there is agreement on the models to consider, which are believed to be able to take into
Feynman Path Integrals 307
account the marvelous properties of the matter experimentally found, but to extract information from them requires deeper and complex analytical and mathematical investigations. See also: Falicov–Kimball Model; Fractional Quantum Hall Effect; Quantum Statistical Mechanics: Overview; Renormalization: Statistical Mechanics and Condensed Matter.
Further Reading Abrikosov AA, Gorkov LP, and Dzyaloshinskii IE (1965) Methods of Quantum Field Theory in Statistical Physics. New York: Dover. Anderson PW (1985) Basic Principles in Solid State Physics. Menlo Park, CA: Benjamin Cummings. Anderson PW (1997) The Theory of Superconductivity in High Tc Cuprates. Princeton, NJ: Princeton University Press. Bardeen J, Cooper LN, and Schrieffer JR (1957) Theory of superconductivity. Physical Review 108: 1175–1204. Benfatto G and Gallavotti G (1995) Renormalization Group. Princeton: Princeton University Press.
Berezin FA (1966) The Method of Second Quantization. New York: Academic Press. Bratelli O and Robinson D (1979) Operator Algebras and Quantum Statistical Mechanics. Berlin: Springer. Feldman J, Knorrer H, and Trubowitz E (2002) Fermionic Functional Integrals and Renormalization Group, CRM Monograph Series, vol. 16. Providence: American Mathematical Society. Gallavotti G (1985) Renormalization group and ultraviolet stability for scalar fields via renormalziation group methods. Reviews of Modern Physics 57: 471–562. Gentile G and Mastropietro V (2001) Renormalization group for one dimensional fermions. A review of mathematical results. Physics Reports 352: 273–437. Mahan GD (1990) Many-Particle Physics. New York: Plenum. Mattis DC and Lieb E (1966) Mathematical Physics in One Dimension. New York: Academic Press. Negele JW and Orland H (1988) Quantum Many Particle Systems. New York: Addison-Wesley. Pastur L and Figotin A (1991) Spectra of Random and Almost Periodic Operators. Berlin: Springer. Pines D (1961) The Many Body Problem. New York: AddisonWesley. Rivasseou V (1994) From Perturbative to Constructive Renormalization. Princeton, NJ: Princeton University Press.
Feynman Path Integrals space of the system with finite energy passing at the point x at time t: !1 Z
S Mazzucchi, Universita` di Trento, Povo, Italy ª 2006 Elsevier Ltd. All rights reserved.
eði=hÞSt ðÞ D
ðt; xÞ ¼
fjðtÞ¼xg
Introduction In nonrelativistic quantum mechanics, the state of a d-dimensional particle is represented by a unitary vector in the complex separable Hilbert space L2 (Rd ), the so-called ‘‘wave function,’’ while its time evolution is described by the Schro¨ dinger equation: @ h2 ¼ þV i h @t 2m ð0; xÞ ¼ 0 ðxÞ
½1
where h is the reduced Planck constant, m > 0 is the mass of the particle, and F = rV is an external force. In 1942 R P Feynman, following a suggestion by Dirac, proposed an alternative (Lagrangian) formulation of quantum mechanics, and a heuristic but very suggestive representation for the solution of eqn [1]. According to Feynman, the wave function of the system at time t evaluated at the point x 2 Rd is given as an ‘‘integral over histories,’’ or as an integral over all possible paths in the configuration
Z
eði=hÞSt ðÞ
0 ðð0ÞÞD
½2
fjðtÞ¼xg
St () is the classical action of the system evaluated along the path Z t St ðÞ St ðÞ VððsÞÞ ds ½3 0
St ðÞ
m 2
Z
t
2 _ jðsÞj ds
½4
0
D is a heuristic Lebesgue ‘‘flat’’ measure on the R space of paths and ( {j(t) = x} e(i=h) St ()D)1 is a normalization constant. Some time later, Feynman himself extended formula [2] to more general quantum systems, including the case of quantum fields. The Feynman path-integral formulation of quantum mechanics is particularly suggestive, as it provides a spacetime visualization of quantum dynamics, reintroducing in quantum mechanics the concept of trajectory (which was banned in the ‘‘orthodox interpretation’’ of the theory) and creating a connection between the classical description of
308 Feynman Path Integrals
the physical world and the quantum one. Indeed, it provides a quantization method, allowing, at least heuristically, to associate a quantum evolution to each classical Lagrangian. Moreover, the application of the stationary-phase method for oscillatory integrals allows the study of the semiclassical limit of the Schro¨dinger equation, that is, the study of the detailed behavior of the solution when the Planck constant is regarded as a parameter converging to 0. Indeed, when h is small, the integrand in [2] is strongly oscillating and the main contributions to the integral should come from those paths that make stationary the phase function S(). These, by Hamilton’s least action principle, are exactly the classical orbits of the system. Feynman path integrals allow also a heuristic calculus in path space, leading to variational calculations of quantities of physical and mathematical interest. An interesting application can be found in topological field theories, as, for instance, Chern–Simons models. In this case, heuristic calculations based on the Feynman path-integral formulation of the theory, where the integration is performed on a space of geometrical objects, lead to the computation of topological invariants. Even if from a physical point of view, formula [2] is a source of important results, from a mathematical point of view, it lacks rigor: indeed, neither the ‘‘infinite-dimensional Lebesgue measure,’’ nor the normalization constant in front of the integral is well defined. In this article, we shall describe the main approaches to the rigorous mathematical realization of Feynman path integrals, as well as their most important applications.
Possible Mathematical Definitions of Feynman’s Measure In the rigorous mathematical definition of Feynman’s complex measure !1 Z
eði=hÞSt ðÞ D
F :¼
eði=hÞSt ðÞ D
½5
fjðtÞ¼xg
one has to face mainly two problems. First of all, the integral is defined on a space of paths, that is, on an infinite-dimensional space. The implementation of an integration theory is nontrivial: for instance, it is well known that a Lebesgue-type measure cannot be defined on infinite-dimensional Hilbert spaces. Indeed, the assumption of the existence of a -additive measure which is invariant under rotations and translations and assigns a positive finite measure to all bounded open sets leads to a
contradiction. In fact, by taking an orthonormal system {ei }i 2 N in an infinite-dimensional Hilbert space H and by considering the open balls Bi = {x 2 H, kx ei k < 1=2}, one has that they are pairwise disjoint and their union is contained in the open ball B(0, 2) = {x 2 H, kxk < 2}. By the Euclidean invariance of the Lebesgue-type measure , one can deduce that (Bi ) = a, 0 < a < 1, for all i 2 N. By the -additivity, one has X ðBð0; 2ÞÞ ð[i Bi Þ ¼ ðBi Þ ¼ 1 i
but, on the other hand, (B(0, 2)) should be finite as B(0, 2) is bounded. As a consequence, we can also deduce that the term D in [2] does not make sense. The second problem is the fact that the exponent in the density e(i=h)St () is imaginary, so that the exponential oscillates. EvenR in finite dimensions, i(x) integrals of the form f (x) dx, with RN e N , f : R ! R are continuous functions and f is not summable, have to be suitably defined, in order to exploit the cancelations in the integral due to the oscillatory behavior of the exponential. The study of the rigorous foundation of Feynman path integrals began in the 1960s, when Cameron proved that Feynman’s heuristic complex measure [5] cannot be realized as a complex bounded variation -additive measure, even on very nice subsets of the space (Rd )[0, t] of paths, contrary to the case of complex measures on Rn of the form 2 e(i=2)jxj dx. In other words, it is not possible to implement an integration theory in the traditional (Lebesgue) sense. As a consequence, mathematicians tried to realize [5] as a linear continuous functional on a sufficiently rich Banach algebra of functions, inspired by the fact that a bounded measure can be regarded as a continuous functional on the space of bounded continuous functions. In order to mirror the features of the heuristic Feynman’s measure, such a functional should have some properties: 1. it should behave in a simple way under ‘‘translations and rotations in path space,’’ as D denotes a ‘‘flat’’ measure; 2. it should satisfy a Fubini-type theorem, concerning iterated integrations in path space (allowing the construction, in physical applications, of a one-parameter group of unitary operators); 3. it should be approximable by finite-dimensional oscillatory integrals, allowing a sequential approach in the spirit of Feynman’s original work; and 4. it should be sufficiently flexible to allow a rigorous mathematical implementation of an infinitedimensional version of the stationary-phase
Feynman Path Integrals 309
method and the corresponding study of the semiclassical limit of quantum mechanics. Nowadays, several implementations of this program can be found in the literature of physics and mathematics, for instance, by means of analytic continuation of Wiener integrals, or as an infinite-dimensional distribution in the framework of Hida calculus, or via ‘‘complex Poisson measures,’’ or via nonstandard analysis, or as an infinite-dimensional oscillatory integral. The last of these methods is particularly interesting as it allows the systematic implementation of an infinitedimensional version of the stationary-phase method, which can be applied to the study of the semiclassical limit of the solution of the Schro¨dinger equation [1]. Analytic Continuation
In one of the first approaches in the definition of Feynman path integrals, formula [2] was realized as the analytic continuation in a suitable complex parameter of a (nonoscillatory) Gaussian integral on the space of paths. In 1949, inspired by Feynman’s work, M Kac observed that by considering the heat equation @ 1 u¼ u þ VðxÞu @t 2m uð0; xÞ ¼ 0 ðxÞ
½6
instead of the Schro¨dinger equation [1] and by replacing the oscillatory term e(i=h)S0 () in Feynman complex measure with the fast decreasing one e(1=h)S0 () , it is possible to give a well-defined mathematical meaning to Feynman’s heuristic formula [2] in terms of a well-defined integral on the space of continuous paths Wt, x = {w 2 C(0, t; Rd ) : w(0) = x} with respect to the Wiener Gaussian measure Pt, x : Z R t pffiffiffiffiffiffiffi Vð 1=mwðÞÞd uðt; xÞ ¼ e 0 Wt;x
0ð
pffiffiffiffiffiffiffiffiffiffi 1=m wðtÞÞ dPt;x ðwÞ
½7
The path-integral representation [7] for the solution of the heat equation [6] is called Feynman–Kac formula. The underlying idea of the analytic continuation approach comes from the fact that by introducing in [6] a suitable parameter , proportional, for instance, to the time t as in the case = 1 , @ 1 2 u¼ h u þ VðxÞu @t 2m Z R t pffiffiffiffiffiffiffiffiffiffiffiffiffi ð1=1 hÞ Vð h=ðm1 ÞwðÞÞ d 0 uðt; xÞ ¼ e
1 h
Wt;x
0
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h=ðm1 ÞwðtÞ dPt;x ðwÞ
or to the Planck constant, as in the case = 2 , @ 1 2 u¼ u þ VðxÞu @t Z 2m 2 R t pffiffiffiffiffiffiffiffi ð1= Þ Vð 2 =m wðÞÞ d uðt; xÞ ¼ e 2 0 2
Wt;x
0ð
pffiffiffiffiffiffiffiffiffiffiffiffi 2 =m wðtÞÞ dPt;x ðwÞ
or to the mass, as in the case = 3 , @ 1 u¼ u iVðxÞu @t 23 Z R t pffiffiffiffiffiffiffiffi i V 1=3 wðÞ d e 0 uðt; xÞ ¼ Wt;x
0ð
pffiffiffiffiffiffiffiffiffiffi 1=3 wðtÞÞ dPt;x ðwÞ
and by allowing to assume complex values, then one gets, at least heuristically, Schro¨dinger equation and its solution by substituting, respectively, 1 = i, 2 = i h, or 3 = im. These procedures can be made completely rigorous under suitable conditions on the potential V and initial datum 0 . The Approach via Fourier Transform
This approach has its roots in a couple of papers by K Ito in the 1960s and was extensively developed by S Albeverio and R Høegh-Krohn in the 1970s. The main idea is the definition of oscillatory integrals with quadratic phase function on a real separable Hilbert space (H, h , i), the Fresnel integrals, Zf 2 eði=2hÞkxk f ðxÞ dx ½8 H 2
as the distributional pairing between e(i=2h)kxk and a complex-valued function f belonging to the space F (H) of functions that are Fourier transforms of complex bounded variation measures on H, that is, Z f ¼ ^f ; f ðxÞ ¼ eihx;yi df ðyÞ H
F (H) is a Banach algebra, where the product is the pointwise one and the identity is the function f (x) = 1 8x 2 H. The norm of an element f is the total variation P of the corresponding measure f , that is, kf k = sup i jf (Ei )j, where the supremum is taken over all sequences {Ei } of pairwise-disjoint Borel subsets of H, such that [i Ei = H. Given a function f 2 F (H), f = ^f , its Fresnel integral is defined by the Parseval formula: Zf Z 2 2 eði=2hÞkxk f ðxÞ dx :¼ eðih=2Þkxk df ðxÞ ½9 H
H
310 Feynman Path Integrals
where the right-hand side is a well-defined absolutely convergent integral with respect to a -additive measure on H. It is important to recall that this approach provides the implementation of a method of stationary phase for the expansion of the integral in powers of the small parameter h occurring in the integrand. We postpone the discussion of these results, as well as the application to the solution of the Schro¨dinger equation, to the next section where a generalization of the present approach is described. Infinite-Dimensional Oscillatory Integrals
The main idea of this approach is the extension of the definition of oscillatory integrals with quadratic phase function [8] to infinite-dimensional Hilbert spaces by means of a twofold limiting procedure. The study of integrals of the form Z IðhÞ :¼ eði=hÞðxÞ f ðxÞ dx ½10 RN
where (x) : R N ! R is the phase function and f : R N ! C a complex-valued continuous function, is a classical topic, largely developed in connection with various problems in mathematics (such as the theory of pseudodifferential operators) and physics (such as optics). Particular effort has been devoted to the study of the detailed behavior of the above integral in the limit of ‘‘strong oscillations,’’ that is, when h ! 0, by means of the method of stationary phase. Thanks to the cancellations due to the oscillatory term e(i=2h)(x) , the integral can still be defined, even if the function f is not summable, as the limit of a sequence of regularized, hence absolutely convergent, integrals. According to a Ho¨rmander’s proposal, the oscillatory integral of a function f : RN ! C is well defined if, for each test function 2 S(RN ), such that (0) = 1, the limit Z lim eði=2hÞðxÞ ðxÞf ðxÞ dx !0
RN
exists and is independent of . This definition has been generalized in the 1980s by D Elworthy and A Truman to the case where the underlying space RN is replaced by a real separable infinite-dimensional Hilbert space (H, h , i), under the assumption that the phase function is quadratic, that is, (x) = kxk2 =2. The ‘‘infinite-dimensional oscillatory integral’’ Zf 2 eði=2hÞkxk f ðxÞ dx H
is defined as the limit of a sequence of finitedimensional approximations. More precisely, a function f : H ! C is ‘‘integrable’’ if, for each increasing sequence {Pn }n 2 N of finite-dimensional projector operators in H converging strongly to the identity operator as n ! 1, the limit Z 1 2 lim eði=2hÞkPn xk dPn x n!1 P H Z n 2 eði=2hÞkPn xk f ðPn xÞ dPn x ½11 Pn H
exists and is independent of the sequence {Pn }n 2 N . In this case, the limit is denoted by Zf 2 eði=2hÞkxk f ðxÞ dx H
The description of the largest class of integrable functions is still an open problem, even in finite dimension, but it is possible to find some interesting subsets of it. In particular, any function belonging to F (H), the Banach algebra considered in the approach by Fourier transform, is integrable. Indeed, by assuming that the function f in [11] is of the type f ðxÞ ¼ eði=hÞhx;Lxi gðxÞ where L : H ! H is a linear self-adjoint trace-class operator on H such that R(I L) is invertible and g 2 F (H), that is, g(x) = H ehx, yi dg (y), then it is possible to prove that f is integrable in the sense of definition [11] and the corresponding infinitedimensional oscillatory integral can be explicitly computed in terms of a well-defined integral with respect to a bounded variation measure f by means of the following Parseval’s type equality: Zf 2 eði=2hÞkxk eði=hÞhx;Lxi gðxÞdx H
¼ detðI LÞ
1=2
Z
1
eðih=2Þhx;ðILÞ
xi
df ðxÞ
½12
H
det (I L) being the Fredholm determinant of the operator I L, that is, the product of its eigenvalues, counted with their multiplicity. If L = 0, then we obtain eqn [9], so that we can look at the infinitedimensional oscillatory integrals approach as a generalization of the Fourier transform approach, since it allows at least in principle to integrate a class of function larger than F (H). In fact, recently this feature has been used by S Albeverio and S Mazzucchi in the proof of a Parseval’s type equality similar to [12] for infinite-dimensional oscillatory integrals with polynomially growing phase functions. Feynman’s heuristic formula [2] for the representation of the solution of the Schro¨dinger equation [1] can
Feynman Path Integrals 311
be realized as an infinite-dimensional oscillatory integral on the Hilbert space Ht of absolutely continuous paths : [0, t] 2 Rd with R t fixed endpoint (t) = 0 and finite kinetic energy 0 _ 2R()d < 1, endowed with the inner t product h1 , 2 i = 0 _ 1 ()_ 2 ()d. One has to take an initial datum 0 2 L2 (Rd ) that is the Fourier transform of a complex variation measure on Rd , R bounded that is, 0 (x) = Rd eikx d0 (k). Moreover, one has to assume that the potential V in [1] is the sum of a harmonic oscillator part plus a bounded perturbation V1 that is the Fourier transform of a complex bounded variation measure v on R d : VðxÞ ¼ 12 x2 x þ V1 ðxÞ Z eikx dv ðkÞ V1 ðxÞ ¼ Rd
(2 being a symmetric positive d d matrix). In this case, it is possible to prove that the linear operator L on Ht defined by Z t ð; LÞ ðÞ2 ðÞd is self-adjoint and trace class, and (I L) is invertible. Moreover, by considering the function v : Ht ! C Z t vðÞ V1 ððÞ þ xÞd 0 Z t ðÞd; 2 Ht þ 2x2 0
it is possible to prove that the function f : Ht ! C given by f ðÞ ¼ eði=hÞvðÞ
0 ðð0Þ þ xÞ
ði=2 hÞ
Rt 0
_ 2 ðÞd ði=hÞ
e
Rt 0
VððÞþxÞd
0 ðð0Þ
þ xÞd
ðtÞ¼0
¼
Z g0
eði=2hÞð;ðILÞÞ eði=hÞvðÞ
0 ðð0Þ
Ii ð0Þ ¼ f ðci Þð2 ihÞN=2 ðdet D2 ðci ÞÞ1=2 If some critical point is degenerate, the situation is more complicated: one has to take into account the type of degeneracy and apply the theory of unfoldings of singularities. These results can be generalized to infinitedimensional oscillatory integrals of the form IðhÞ ¼
is the Fourier transform of a complex bounded variation measure f on Ht and the infinitedimensional Fresnel integral of the function g() = e(i=2h)(, L) f (), that is,: e
ci 2 C
where Ii : R ! C are C1 functions of R, such that
0
Z
such potentials had been a stumbling block for many years). In this framework, it is possible to implement an infinite-dimensional version of the stationary-phase method and study the asymptotic behavior of the oscillatory integrals in the limit h ! 0. The method of stationary phase was originally proposed by Stokes, who noted that when h ! 0 the oscillatory integral [10] is O(hn ) for any n 2 N, provided that there are no critical points of the phase function in the support of the function f. As a consequence, one can deduce that the leading contribution to the integral [10] should come from a neighborhood of those points c 2 RN , such that r(c) = 0. More precisely, by assuming that the set C of critical points is finite, that is, C = {c1 , . . . , ck } and that every critical point is nondegenerate, that is, det D2 (ci ) 6¼ 0 8ci 2 C, then one has X eði=hÞðci Þ Ii ðhÞ ½14 IðhÞ
þ xÞd
½13
Ht
is well defined and it is equal to Z 1 detðI LÞ1=2 eðih=2Þð;ðILÞ Þ df ðÞ Ht
Moreover, it is a representation of the solution of equation [1] evaluated at x 2 R d at time t. Recently, solutions of the Schro¨dinger equation with quartic anharmonic potential via infinite-dimensional oscillatory integrals have been provided by S Albeverio and S Mazzucchi using a combination of Parseval formula and a new analytic method (the inclusion of
Zf
eði=2hÞhx;ðILÞxi eði=hÞvðxÞ gðxÞdx
½15
H
R R with v(x) = H eihx, yi d(y), g(x) = H eihx, yi d (y), ,
being complex bounded variation measures on H satisfying suitable assumptions and L : H ! H is a self-adjoint and trace-class linear operator, such that (I L) is invertible. Under suitable growth condition on the moments of the measures , and by assuming that the phase function (x) = hx, (I L)xi v(x) has a finite number of nondegenerate critical points c1 , . . . , cs , it is possible to prove that the integral I( h) in [15] is equal to IðhÞ ¼
s X
eði=hÞðck Þ Ik ðhÞ þ I0 ðhÞ
k¼1
for some C1 functions Ik satisfying: Ik ð0Þ ¼ ½detðI L D2 Vðck ÞÞ1=2 gðck Þ k ¼ 1; . . . ; s ðjÞ I0 ð0Þ
¼ 0;
j ¼ 0; 1; 2; . . .
312 Feynman Path Integrals
Moreover, under some additional smallness assumptions on v, it has been proved that the phase function has a unique stationary point c and as h!0 IðhÞ eði=hÞðcÞ I ð hÞ for some C1 function I . Each term of the asymptotic expansion in powers of h of the function I can be explicitly computed, and it is possible to prove that such an asymptotic expansion is Borelsummable and determines I uniquely. The application of these results to the infinitedimensional oscillatory integral representation [13] for the solution of the Schro¨dinger equation allows the study of its semiclassical limit. One has to consider a potential V that is the Fourier transform of a complex variation measure on (R d ), R bounded jj such that Rd e djj() < 1 for some > 0, and a particular form for the initial wave function (i= h)(x) (x), where is real and 0 (x) = e 1 , 2 C0 (Rd ) are independent of h. This initial datum corresponds to an initial particle distribution
0 (x) = j j2 (x) and to a limiting value of the probability current Jh = 0 = r(x) 0 (x)=m, giving an initial particle flux associated to the velocity field r(x)=m. One also has to assume that the Lagrange manifold Lf (y, rf ) intersects transversally the subset V of the phase space made of all points (y, p) such that p is the momentum at y of a classical particle that starts at time zero from x, moves under the action of V, and ends at y at time t. In this case, the Feynman path integral [13] has an asymptotic expansion in powers of h for h ! 0, whose leading term is the sum of the values of the function !! 1=2 ðjÞ @ k ðjÞ ðjÞ ðy ; tÞ eði=2Þ m eði=hÞS eði=hÞ det ðjÞ @yl taken at the points y(j) such that a classical particle starting at y(j) at time zero with momentum r(y(j) ) is at x at time t. S is the classical action along this classical path (j) and m(j) is the Maslov index of the path (j) , that is, m(j) is the number of zeros of !! ðjÞ @ k ðjÞ det ðy ; Þ ðjÞ @yl as varies on the interval (0, t). White-Noise Calculus
The leading idea of the present approach, which was originally proposed by C DeWitt-Morette and P Kre´e and presently realized in the framework of white-noise calculus by T Hida, L Streit, and many other authors, is the realization of the Feynman
integrand e(i=h)St () as an infinite-dimensional distribution. This idea is similar to the one of the approach via Fourier transform, where the expresR sion (2 i)d=2 Rd e(i=2)(x, x) f (x)dx is realized as a distributional pairing between e(i=2)(x, x)=(2 i)d=2 and the function f 2 F (R d ) by means of the Parseval-type equality [9] and generalized to infinite-dimensional spaces. In white-noise calculus, the pairing is realized in a different measure space. Indeed, by manipulating the integrand in Z ð2 iÞd=2 eði=2Þðx; xÞ f ðxÞdx Rd
one has Z eði=2Þðx;xÞ
f ðxÞdx ð2 iÞd=2 Z eði=2Þðx;xÞþð1=2Þðx;xÞ eð1=2Þðx;xÞ ¼ f ðxÞ dx id=2 Rd ð2 Þd=2
Rd
½16
where the latter line can be interpreted as the distributional pairing of eði=2Þðx;xÞþð1=2Þðx;xÞ id=2 and f not with respect to Lebesgue measure but rather with respect to the standard Gaussian measure eð1=2Þðx;xÞ ð2 Þd=2
dx
on Rd . The RHS of [16] can be generalized to the case in which Rd is replaced by a path space, thanks to the fact that on infinite-dimensional spaces, even if Lebesgue measure is meaningless, Gaussian measures are well defined and can be used as reference measures. The detailed realization of this idea as well as its application to the mathematical realization of the Feynman integrand are rather technical and we certainly do not provide details here. We recall that this approach has been successfully applied to the rigorous realization of Feynman path-integral formulation of Chern–Simons models. Other Possible Approaches
Another possible mathematical definition of Feynman path integrals is based on Poisson measures. It was originally proposed by A M Chebotarev and V P Maslov and further developed by several authors such as S Albeverio, Ph Blanchard, Ph Combe, R Høegh-Krohn, M Sirugue, and V Kolokol’tsov. It can be applied to ‘‘phase-space integrals,’’ to the Dirac equation and in particular algebraic settings, as well as to the Schro¨dinger
Finite-Dimensional Algebras and Quivers
equation, with potentials of the same type ‘‘Fourier transform of bounded measure’’ discussed in the subsection ‘‘Infinite-dimensional oscillatory integrals.’’ Another possible definition of Feynman path integrals is based on a ‘‘time-slicing’’ approximation and a limiting procedure, rather closed to Feynman’s original work based on Trotter product formula. The ‘‘sequential approach’’ was proposed originally by A Truman and further extensively developed by D Fujiwara and N Kumano-go. The paths in formula [2] are approximated by piecewise linear paths and the Feynman path integral is correspondingly approximated by a finite-dimensional integral. In particular, D Fujiwara and N Kumano-go proved that the integrals defined in this way have some important properties, such as invariance under translations and orthogonal transformations. It is also possible to interchange the order of integration with Riemann–Stieltjies integrals and study the semiclassical approximation. Finally, it is worthwhile to recall a very interesting and intuitive approach to the Feynman integration which is based on nonstandard analysis. It was introduced by S Albeverio, J E Fenstad, R HøeghKrohn, and T Linstrøm in the 1980s, but it has not been systematically developed yet.
Abbreviations D Pt, x St St V Wt, x h
Heuristic Lebesgue-type measure on the space of paths Wiener Gaussian measure on Wt, x Action functional Action functional for the free particle Potential Space of continuous paths with fixed initial point Wt, x = {w 2 C(0, t; Rd ) : w(0) = x} Reduced Planck constant
ˆ
H Rf H Rf H
h,i kk
313
Phase function Path, : [0, t] ! Rd Fourier transform of the measure Wave function, solution of the Schro¨dinger equation Hilbert space Fresnel integral on the Hilbert space H Infinite-dimensional oscillatory integral on the Hilbert space H inner product norm
See also: Chern–Simons Models: Rigorous Results; Euclidean Field Theory; Functional Integration in Quantum Physics; Path Integrals in Noncommutative Geometry; Quillen Determinant; Singularity and Bifurcation Theory; Stationary Phase Approximation.
Further Reading Albeverio S (1997) Wiener and Feynman – path integrals and their applications. Proceedings of the Norbert Wiener Centenary Congress 1994 (East Lansing, MI, 1994), 163–194, Proc. Sympos. Appl. Math., 52. Providence, RI: American Mathematical Society. Albeverio S and Høegh-Krohn R (1977) Oscillatory integrals and the method of stationary phase in infinitely many dimensions, with applications to the classical limit of quantum mechanics. Inventiones Mathematicae 40(1): 59–106. Albeverio S and Høegh-Krohn R (2005) Mathematical Theory of Feynman Path Integrals, 2nd edn., with Mazzucchi S Lecture Notes in Mathematics, vol. 523. Berlin: Springer. Elworthy D and Truman A (1984) Feynman maps, Cameron– Martin formulae and anharmonic oscillators. Annales de l’Institut Henri Poincare Physique Theorique 41(2): 115–142. Hida T, Kuo HH, Potthoff J, and Streit L (1995) White Noise. Dordrecht: Kluwer. Johnson GW and Lapidus ML (2000) The Feynman Integral and Feynman’s Operational Calculus. New York: Oxford University Press. Special issue of Journal of Mathematical Physics on functional integration, vol. 36, no. 5 (1995).
Finite-Dimensional Algebras and Quivers A Savage, University of Toronto, Toronto, ON, Canada ª 2006 A Savage. Published by Elsevier Ltd. All rights reserved.
Introduction Algebras and their representations are ubiquitous in mathematics. It turns out that representations of finite-dimensional algebras are intimately related to quivers, which are simply oriented graphs. Quivers
arise naturally in many areas of mathematics, including representation theory, algebraic and differential geometry, Kac–Moody algebras, and quantum groups. In this article, we give a brief overview of some of these topics. We start by giving the basic definitions of associative algebras and their representations. We then introduce quivers and their representation theory, mentioning the connection to the representation theory of associative algebras. We also discuss in some detail the relationship between quivers and the theory of Lie algebras.
314 Finite-Dimensional Algebras and Quivers
Associative Algebras
Quivers and Path Algebras
An ‘‘algebra’’ is a vector space A over a field k equipped with a multiplication which is distributive and such that
A ‘‘quiver’’ is simply an oriented graph. More precisely, a quiver is a pair Q = (Q0 , Q1 ) where Q0 is a finite set of vertices and Q1 is a finite set of arrows (oriented edges) between them. For a 2 Q1 , we let h(a) denote the ‘‘head’’ of a and t(a) denote the ‘‘tail’’ of a. A path in Q is a sequence x = 1 2 . . . m of arrows such that h(iþ1 ) = t(i ) for 1 i m 1. We let t(x) = t(m ) and h(x) = h(1 ) denote the initial and final vertices of the path x. For each vertex i 2 Q0 , we let ei denote the trivial path which starts and ends at the vertex i. Fix a field k. The path algebra kQ associated to a quiver Q is the k-algebra whose underlying vector space has basis the set of paths in Q, and with the product of paths given by concatenation. Thus, if x = 1 . . . m and y = 1 . . . n are two paths, then xy = 1 . . . m 1 . . . n if h(y) = t(x) and xy = 0 otherwise. We also have ( ei if i ¼ j ei ej ¼ 0 if i 6¼ j ( x if hðxÞ ¼ i ei x ¼ 0 if hðxÞ 6¼ i ( x if tðxÞ ¼ i xei ¼ 0 if tðxÞ 6¼ i
aðxyÞ ¼ ðaxÞy ¼ xðayÞ;
8a 2 k; x; y 2 A
When we wish to make the field explicit, we call A a k-algebra. An algebra is ‘‘associative’’ if (xy)z = x(yz) for all x, y, z 2 A. A has a ‘‘unit,’’ or ‘‘multiplicative identity,’’ if it contains an element 1A such that 1A x = x1A = x for all x 2 A. From now on, we will assume all algebras are associative with unit. A is said to be ‘‘commutative’’ if xy = yx for all x, y 2 A and finite dimensional if the underlying vector space of A is finite dimensional. A vector subspace I of A is called a ‘‘left (resp. right) ideal’’ if xy 2 I for all x 2 A, y 2 I (resp. x 2 I, y 2 A). If I is both a right and a left ideal, it is called a two-sided ideal of A. If I is a two-sided ideal of A, then the factor space A=I is again an algebra. An algebra homomorphism is a linear map f : A1 ! A2 between two algebras such that f ð1A1 Þ ¼ 1A2 f ðxyÞ ¼ f ðxÞf ðyÞ;
8x; y 2 A
A representation of an algebra A is an algebra homomorphism : A ! Endk (V) for a k-vector space V. Here Endk (V) is the space of endomorphisms of the vector space V with multiplication given by composition. Given a representation of an algebra A on a vector space V, we may view V as an A-module with the action of A on V given by a v ¼ ðaÞv;
a 2 A; v 2 V
A morphism : V ! W of two A-modules (or equivalently, representations of A) is a linear map commuting with the action of A. That is, it is a linear map satisfying a ðvÞ ¼ ða vÞ;
8a 2 A; v 2 V
Let G be a commutative monoid (a set with an associative multiplication and a unit element). A G-graded k-algebra is a k-algebra which can be expressed as a direct sum A = g 2 G Ag such that aAg Ag for all a 2 k and Ag1 Ag2 Ag1 þg2 for all g1 , g2 2 G. A morphism : A ! B of G-graded algebras is a k-algebra morphism respecting the grading, that is, satisfying (Ag ) Bg for all g 2 G.
for x 2 kQ. This multiplication is associative. Note that ei A and Aei have bases given by the set of paths ending and starting at i, P respectively. The path algebra has a unit given by i 2 Q0 ei . Example 1
Let Q be the following quiver: ρ 1
σ 2
λ 3
4
then kQ has a basis given by the set of paths {e1 , e2 , e3 , e4 , , , , }. Some sample products are = 0, = 0, = 0, e3 = e2 = , e2 = 0. Example 2 Let Q be the following quiver (the so-called ‘‘Jordan quiver’’). ρ
1
Then kQ ffi k[t], the algebra of polynomials in one variable. Note that the path algebra kQ is finite dimensional if and only if Q has no oriented cycles (paths with the same head and tail vertex).
Finite-Dimensional Algebras and Quivers
Example 3 1
Let Q be the following quiver: 2
3
n–2
n– 1
n
Then for every 1 i j n, there is a unique path from i to j. Let f : kQ ! Mn (k) be the linear map from the path algebra to the n n matrices with entries in the field k that sends the unique path from i to j to the matrix Eji with (j, i) entry 1 and all other entries zero. Then one can show that f is an isomorphism onto the algebra of lower triangular matrices.
for v 2 Vt() , w 2 Wt() , 2 Q1 . A representation V is ‘‘trivial’’ if Vi = 0 for all i 2 Q0 and ‘‘simple’’ if its only subrepresentations are the zero representation and V itself. We say that V is ‘‘decomposable’’ if it is isomorphic to W U for some nontrivial representations W and U. Otherwise, we call V ‘‘indecomposable.’’ Every representation of a quiver has a decomposition into indecomposable representations that is unique up to isomorphism and permutation of the components. Thus, to classify all representations of a quiver, it suffices to classify the indecomposable representations. Example 4
Let Q be the following quiver:
Representations of Quivers
ρ
Fix a field k. A representation of a quiver Q is an assignment of a vector space to each vertex and to each arrow a linear map between the vector spaces assigned to its tail and head. More precisely, a representation V of Q is a collection fVi ji 2 Q0 g of finite-dimensional k-vector spaces together with a collection fV : VtðÞ ! VhðÞ j 2 Q1 g of k-linear maps. Note that a representation V of a quiver Q is equivalent to a representation of the path algebra kQ. The dimension of V is the map dV : Q0 ! Z 0 given by dV (i) = dim Vi for i 2 Q0 . If V and W are two representations of a quiver Q, then a morphism : V ! W is a collection of k-linear maps f
i
: Vi ! Wi ji 2 Q0 g
such that W
tðÞ
¼
hðÞ V ;
8 2 Q1
Proposition 1 Let A be a finite-dimensional k-algebra. Then the category of representations of A is equivalent to the category of representations of the algebra kQ/I for some quiver Q and some twosided ideal I of kQ. It is for this reason that the study of finitedimensional associative algebras is intimately related to the study of quivers. We define the direct sum V W of two representations V and W of a quiver Q by ðV WÞi ¼ Vi Wi ;
315
i 2 Q0
and (V W) : Vt() Wt() ! Vh() Wh() by ðV WÞ ððv; wÞÞ ¼ ðV ðvÞ; W ðwÞÞ
1
2
Then Q has three indecomposable representations U, V, and W given by: U1 ¼ k;
U2 ¼ 0;
U ¼ 0
V1 ¼ 0;
V2 ¼ k;
V ¼ 0
W1 ¼ k;
W2 ¼ k;
W ¼ 1
Then any representation Z of Q is isomorphic to Z ffi Ud1 r V d2 r W r where d1 = dim Z1 , d2 = dim Z2 , r = rank Z . Example 5 Let Q be the Jordan quiver. Then representations V of Q are classified up to isomorphism by the Jordan normal form of V where is the single arrow of the quiver. Indecomposable representations correspond to single Jordan blocks. These are parametrized by a discrete parameter n (the size of the block) and a continuous parameter (the eigenvalue of the block). A quiver is said to be of ‘‘finite type’’ if it has only finitely many indecomposable representations (up to isomorphism). If a quiver has infinitely many isomorphism classes but they can be split into families, each parametrized by a single continuous parameter, then we say the quiver is of ‘‘tame’’ (or ‘‘affine’’) type. If a quiver is of neither finite nor tame type, it is of ‘‘wild type.’’ It turns out that there is a rather remarkable relationship between the classification of quivers and their representations and the theory of Kac–Moody algebras. The ‘‘Euler form’’ or ‘‘Ringel form’’ of a quiver Q is defined to be the asymmetric bilinear form on ZQ0 given by X X h; i ¼ ðiÞðiÞ ðtðÞÞðhðÞÞ i 2 Q0
2 Q1
316 Finite-Dimensional Algebras and Quivers
In the standard coordinate basis of ZQ0 , the Euler form is represented by the matrix E = (aij ) where aij ¼ ij #f 2 Q1 j tðÞ ¼ i; hðÞ ¼ jg Here ij is the Kronecker delta symbol. We define the ‘‘Cartan form’’ of the quiver Q to be the symmetric bilinear form given by
underlying graph of Q. The correspondence is given by X dV ðiÞi V 7! i 2 Q0
The Dynkin graphs of type A, D, and E are as follows.
ð; Þ ¼ h; i þ h; i
An
Note that the Cartan form is independent of the orientation of the arrows in Q. In the standard coordinate basis of ZQ0 , the Cartan form is represented by the Cartan matrix C = (cij ) where cij = aij þ aji .
Dn
Example 6 matrix is
For the quiver in Example 1, the Euler 0
1 B0 E¼B @0 0
1
1 1 0 0
0 1 1 1
0 0C C 0A 1
and the Cartan matrix is 0 2 1 B 1 2 C¼B @ 0 1 0 0
0 1 2 1
1 0 0C C 1 A 2
The ‘‘Tits form’’ q of a quiver Q is defined by
E6
E7
E8
Here the subscript indicates the number of vertices in the graph. b D, b and E b The extended Dynkin graphs of type A, are as follows.
qðÞ ¼ h; i ¼ 12 ð; Þ It is known that the number of continuous parameters describing representations of dimension for 6¼ 0 is greater than or equal to 1 q(). Let g be the Kac–Moody algebra associated to the Cartan matrix of a quiver Q. By forgetting the orientation of the arrows of Q, we obtain the underlying (undirected) graph. This is the Dynkin graph of g . Associated to g is a root system and a set of simple roots {i j i 2 Q0 } indexed by the vertices of the Dynkin graph. Theorem 1
An
Dn
E6
(Gabriel’s theorem).
(i) A quiver is of finite type if and only if the underlying graph is a union of Dynkin graphs of type A, D, or E. (ii) A quiver is of tame type if and only if the underlying graph is a union of Dynkin graphs of type A, D, or E and extended Dynkin graphs b D, b or E b (with at least one extended of type A, Dynkin graph). (iii) The isomorphism classes of indecomposable representations of a quiver Q of finite type are in one-to-one correspondence with the positive roots of the root system associated to the
E7
E8
Here we have used an open dot to denote the vertex that was added to the corresponding Dynkin graph of type A, D, or E. Theorem 2 (Kac’s theorem). Let Q be an arbitrary quiver. The dimension vectors of indecomposable representations of Q correspond to positive roots
Finite-Dimensional Algebras and Quivers
of the root system associated to the underlying graph of Q (and are thus independent of the orientation of the arrows of Q). The correspondence is given by X dV 7! dV ðiÞi i 2 Q0
Note that in Kac’s Theorem, it is not asserted that the isomorphism classes are in one-to-one correspondence with the roots as in the finite case considered in Gabriel’s theorem. It turns out that in the general case, dimension vectors for which there is exactly one isomorphism class correspond to real roots while imaginary roots correspond to dimension vectors for which there are families of representations. Example 7 Let Q be the quiver of type An , oriented as follows. ρ1 1
ρ2 2
ρn – 2 3
n–2
ρn – 1
n–1
n
It is known that the set of positive roots of the simple Lie algebra of type An is ( ) l X i 1 j l n t f0g i¼j The zero root P corresponds to the trivial representation. The root li = j i for some 1 j l n corresponds to the unique (up to isomorphism) representation V with Vi ¼ k if j i l 0 otherwise and V i ¼
1 0
if j i l 1 otherwise
b n , with all Example 8 Let Q be the quiver of type A arrows oriented in the same direction (for instance, Pn counter-clockwise). The positive root i = 0 i (where {0, 1, 2, . . . , n} are the vertices of the quiver) is imaginary. There is a one-parameter family of isomorphism classes of indecomposable representations where the maps assigned to each arrow are nonzero. The parameter is the composition of the maps around the loop. If a quiver Q has no oriented cycles, then the only simple kQ-modules are the modules Si for i 2 Q0 where k if i ¼ j Sij ¼ 0 if i 6¼ j and Si = 0 for all 2 Q1 .
317
Ringel–Hall Algebras Let k be the finite field Fq with q elements and let Q be a quiver with no oriented cycles. Let P be the set of all isomorphism classes of kQ-modules which are finite as sets (since k is finite dimensional, these are just the quiver representations we considered above). Let A be a commutative integral domain containing Z and elements v, v1 such that v2 = q. The Ringel–Hall algebra H = HA, v (kQ) is the free A-module with basis {[V]} indexed by the isomorphism classes of representations of the quiver Q, with an A-bilinear multiplication defined by X 1 2 ½V 1 ½V 2 ¼ vhdim V ;dim V i gV V 1 ;V 2 ½V V
Here hdim V 1 , dim V 2 i is the Euler form and gV V1, V2 is the number of submodules W of V such that Q V=W ffi V 1 and W ffi V 2 . H is an associative Z 00 graded algebra, with identity element [0], the isomorphism class of the trivial representation. The grading H = H is given by letting H be the A-span of the set of isomorphism classes [V] such that dim V = . Let C = CA, v (kQ) be the A-subalgebra of H generated by the isomorphism classes [Si ] of the simple kQ-modules. C is called the ‘‘composition algebra.’’ If the underlying graph of Q is of finite type, then C = H. Now let K be a set of finite fields k such that the set {jkj j k 2 K} is infinite. Let A be an integral domain containing Q and, for each k 2 K, an element vk such that v2k = jkj. For each k 2 K, we have the corresponding composition algebra Ck , generated by the elements [k Si ] (here we make the field k explicit). Now let C be the subring of Q k 2 K Ck generated by Q and the elements t ¼ ðtk Þk 2 K ;
t k ¼ vk
t1 ¼ ðtk1 Þk 2 K ;
tk1 ¼ ðvk Þ1
ui ¼ ðuik Þk 2 K ;
uik ¼ ½k Si ; i 2 Q0
Now, t lies in the center of C and if p(t) = 0 for some polynomial p, then p must be the zero polynomial since the set of vk is infinite. Thus, we may think of C as the A-algebra generated by the ui , i 2 Q0 , with A = Q[t, t1 ] and t an indeterminate. Let C = Q(t) A C. We call C the ‘‘generic composition algebra.’’ Let g be the Kac–Moody algebra associated to the Cartan matrix of the quiver Q and let U be the quantum group associated by Drinfeld and Jimbo to g . It has a triangular decomposition U = U U0 Uþ .
318 Finite-Dimensional Algebras and Quivers
Specifically, Uþ is the Q(t)-algebra with generators Ei , i 2 Q0 and relations 1c Xij 1c p p 1 cij ð1Þ Epi Ej Ei ij ; i 6¼ j p p¼0
where cij are the entries of the Cartan matrix and m ½m! ¼ ½p!½m p! p ½n ¼
tn tn ; t t1
½n! ¼ ½1½2 . . . ½n
Theorem 3 There is a Q(t)-algebra isomorphism C ! Uþ sending ui 7! Ei for all i 2 Q0 . The proof of Theorem 3 is due to Ringel in the case that the underlying graph of Q is of finite or affine type. The more general case presented here is due to Green. All of the Kac–Moody algebras considered so far have been simply-laced. That is, their Cartan matrices are symmetric. There is a way to deal with non-simply-laced Kac–Moody algebras using species. We will not treat this subject in this article.
Quiver Varieties One can use varieties associated to quivers to yield a geometric realization of the upper half of the universal enveloping algebra of a Kac–Moody algebra g and its irreducible highest-weight representations. Lusztig’s Quiver Varieties
We first introduce the quiver varieties, first defined by Lusztig, which yield a geometric realization of the upper half U þ of the universal enveloping algebra of a simply laced Kac–Moody algebra g . Let Q = (Q0 , Q1 ) be the quiver whose vertices Q0 are the vertices of the Dynkin diagram of g and whose set of arrows Q1 consists of all the edges of the Dynkin diagram with both orientations. By definition, U þ is the Q-algebra defined by generators ei , i 2 Q0 , subject to the Serre relations 1c Xij p 1 cij ð1Þ epi ej e1cij p ¼ 0 p p¼0
for all i 6¼ j in Q0 , where cij are the entries P of the Cartan matrix associated to Q. For any = i 2 Q0 i i, i 2 N, þ let U þ spanned by the
be the subspace of U
monomials ei1 ei2 . . . ein for various sequences i1 , i2 , . . . , in in which i appears i times for each þ i 2 Q0 . Thus, U þ = U þ
. Let U Z be the subring of p þ U generated by the elements ei =p! for i 2 Q0 , p 2 N. þ þ þ þ Then U þ Z = U Z, where U Z, = U Z \ U . We define the involution : Q1 ! Q1 to be the function which takes 2 Q1 to the element of Q1 consisting of the same edge with opposite orientation. An orientation of our graph/quiver is a choice = Q1 and of a subset Q1 such that [ = ;. \ Let V be the category of finite-dimensional Q0 -graded vector spaces V = i 2 Q0 V i over C with morphisms being linear maps respecting the grading. Then V 2 V shall denote that V is an object of V. The dimension of V 2 V is given by v = dim V = (dim V 0 , . . . , dim V n ). Given V 2 V, let EV be the space of representations of Q with underlying vector space V. That is, M EV ¼ HomðV tðÞ ; V hðÞ Þ 2 Q1
For any subset Q01 of Q1 , let EV, Q01 be the subspace of EV consisting of all vectors x = (x ) such that x = 0 Qwhenever 62 Q01 . The algebraic group GV = i Aut(V i ) acts on EV and EV, Q01 by ðg; xÞ ¼ ððgi Þ; ðx ÞÞ 7! gx ¼ ðx0 Þ ¼ ðghðÞ x g1 tðÞ Þ Define the function " : Q1 ! {1, 1} by "() = 1 for Let h , i be the all 2 and "() = 1 for all 2 . nondegenerate, GV -invariant, symplectic form on EV with values in C defined by X hx; yi ¼ "ðÞtrðx yÞ 2 Q1
Note that EV can be considered as the cotangent space of EV , under this form. The moment map associated to the GV -action on the symplecticQ vector space EV is the map : EV ! gl V = i EndV i , the Lie algebra of GLV , with i-component i : EV ! EndV i given by X "ðÞx x i ðxÞ ¼ 2 Q1 ;hðÞ¼i
Definition 1 An element x 2 EV is said to be nilpotent if there exists an N 1 such that for any sequence 1 , 2 , . . . , N in H satisfying t(1 ) = h(2 ), t(2 ) = h(3 ), . . . , t(N1 ) = h(N ), the composition x1 x2 . . . xN : V t(N ) ! V h(1 ) is zero.
Finite-Dimensional Algebras and Quivers
Definition 2 Let V be the set of all nilpotent elements x 2 EV such that i (x) = 0 for all i 2 I. A subset of an algebraic variety is said to be ‘‘constructible’’ if it is obtained from subvarieties from a finite number of the usual set-theoretic operations. A function f : A ! Q on an algebraic variety A is said to be a constructible function if f 1 (a) is a constructible set for all a 2 Q and is empty for all but finitely many a. Let M(V ) denote the Q-vector e V) space of all constructible functions on V . Let M( denote the Q-subspace of M(V ) consisting of those functions that are constant on any GV -orbit in V . Let V, V 0 , V 00 2 V such that dim V = dim V 0 þ dim V 00 . Now, suppose that S is an I-graded subspace of V. For x 2 V we say that S is x-stable if x(S) S. Let V; V 0 , V 00 be the variety consisting of all pairs (x, S) where x 2 V and S is an I-graded x-stable subspace of V such that dim S = dim V 00 . Now, if we fix some isomorphisms V=S ffi V 0 , S ffi V 00 , then x induces elements x0 2 V 0 and x00 2 V 00 . We then have the maps V 0 V 00
p1
p2
V;V 0 ;V 00 ! V
where p1 (x, S) = (x0 , x00 ), p2 (x, S) = x. For a holomorphic map between complex varieties A and B, let ! denote the map between the spaces of constructible functions on A and B given by X ð! f ÞðyÞ ¼ a ð1 ðyÞ \ f 1 ðaÞÞ a2Q
Let be the pullback map from functions on B to functions on A acting as f (y) = f ((y)). We then define a map e V 00 Þ ! Mð e VÞ e V 0 Þ Mð Mð
½1
by (f 0 , f 00 ) 7! f 0 f 00 where f 0 f 00 ¼ ðp2 Þ! p 1 ðf 0 f 00 Þ e V 0 V 00 ) is defined by Here f 0 f 00 2 M( 0 00 0 00 (f f )(x , x ) = f 0 (x0 )f 00 (x00 ). The map [1] is bilinear and defines an associative Q-algebra structure on e V ) where V is the object of V defined by M(
V i = C i . There is a unique algebra homomorphism e V ) such that (ei ) is the function
: U þ ! M( on the point V i with value 1. Then restricts to a e
map : U þ
! M(V ). It can be shown that p
pi (ei =p!) is the function 1 on the point V pi for i 2 Q0 , p 2 Z 0 . e Z (V ) be the set of all functions in M( e V ) that Let M take on only integer values. One can show that if
319
e Z (V 0 ) and f 00 2 M e Z (V 00 ), then f 0 f 00 2 M e Z (V ) f0 2M þ e
in the setup of [1]. Thus (U Z, ) MZ (V ). Let IrrV denote the set of irreducible components of V . The following proposition was conjectured by Lusztig and proved by him in the affine (and finite) case. The general case was proved by Kashiwara and Saito. Proposition 2 For any 2 (Z 0 )Q0 , we have
dim U þ
= #IrrV . We then have the following important result due to Lusztig. Theorem 4
Let 2 (Z 0 )Q0 . Then,
(i) For any Z 2 IrrV , there exists a unique fZ 2 (U þ Z, ) such that fZ is equal to 1 on an open dense subset of Z and equal to zero on an open dense subset of Z0 2 IrrV for all Z0 6¼ Z. (ii) {fZ j Z 2 IrrV } is a Q-basis of (U þ
). þ (iii) : U þ
! (U ) is an isomorphism. (iv) Define [Z] 2 U þ
by ([Z]) = fZ . Then B = {[Z] j Z 2 IrrV } is a Q-basis of U þ
. þ e
(v) (U þ Z, ) = (U ) \ MZ (V ). (vi) B is a Z-basis of U þ Z, . From this theorem, we see that B = t B is a Q-basis of U þ , which is called the ‘‘semicanonical basis.’’ This basis has many remarkable properties. One of these properties is as follows. Via the algebra involution of the entire universal enveloping algebra U of g given on the Chevalley generators by ei 7! fi , fi 7! ei and h 7! h for h in the Cartan subalgebra of g , one obtains from the results of this section a semicanonical basis of U , the lower half of the universal enveloping algebra of g . For any irreducible highest-weight integrable representation V of U (or, equivalently, g ), let v 2 V be a nonzero highest-weight vector. Then the set fbvjb 2 B; bv 6¼ 0g is a Q-basis of V, called the semicanonical basis of V. Thus, the semicanonical basis of U is simultaneously compatible with all irreducible highestweight integrable modules. There is also a way to define the semicanonical basis of a representation directly in a geometric way. This is the subject of the next subsection. One can also obtain a geometric realization of the upper part Uþ of the quantum group in a similar manner using perverse sheaves instead of constructible functions. This construction yields the canonical basis of the associated quantum group (a q-deformation of the universal enveloping algebra) which also has many remarkable properties and is closely related to the theory of crystal bases.
320 Finite-Dimensional Algebras and Quivers Nakajima’s Quiver Varieties
We introduce here a description of the quiver varieties first presented by Nakajima. They yield a geometric realization of the irreducible highest-weight representations of simply-laced Kac–Moody algebras. The construction was motivated by the work of Kronheimer and Nakajima on solutions to the anti-self-dual Yang–Mills equations on ALE gravitational instantons (see Instantons: Topological Aspects). Definition 3 For v, w 2 ZI 0 , choose I-graded vector spaces V and W of graded dimensions v and w, respectively. Then define M HomðV i ; W i Þ ðv; wÞ ¼ V
~ w; v00 ) is a point dim S = v0 = v v00 . A point of F(v, (x, t, S) of F(v, w; v00 ) together with a collection of isomorphisms R0i : V 0i ffi Si and R00i : V 00i ffi V i =Si for each i 2 I. Then we define p2 (x, t, S, R0 , R00 ) = (x, t, S), p3 (x, t, S) = (x, t) and p1 (x, t, S, R0 , R00 ) = (x00 ,x0 , t0 ) where x00 , x0 , t0 are determined by R0hðÞ x0 ¼ x R0tðÞ : V 0tðÞ ! ShðÞ ti0 ¼ ti R0i : V 0i ! W i R00hðÞ x00 ¼ x R00tðÞ : V 00tðÞ ! V hðÞ =ShðÞ It follows that x0 and x00 are nilpotent. Lemma 1
st 00 0 ðp3 p2 Þ1 ððv; wÞst Þ p1 1 ððv ; 0Þ ðv ; wÞ Þ
i2I
Definition 4 Let st = (v, w)st be the set of all (x, t) 2 (v, w) satisfying the following condition: if S = (Si ) with Si V i is x-stable and ti (Si ) = 0 for i 2 I, then Si = 0 for i 2 I.
Thus, we can restrict [2] to st , forget the (v00 , 0)factor and consider the quotient by GV and GV 0 . This yields the diagram Lðv0 ; wÞ
The group GV acts on (v, w) via
and the stabilizer of any point of (v, w)st in GV is trivial. We then make the following definition. Let L L(v, w) = (v, w)st =GV .
We should note that while the above definition and other constructions in this article are algebraic, there are also more geometric ways of looking at quiver varieties. In particular, the space ! M Mðv; wÞ ¼ HomðV tðÞ ; V hðÞ Þ 2 Q1
M
!
¼ fðx; t; SÞ 2 Fðv; w; v v0 Þjðx; tÞ 2 ðv; wÞst g=GV
Let M(L(v, w)) be the vector space of all constructible functions on L(v, w). Then define maps hi : MðLðv; wÞÞ ! MðLðv; wÞÞ ei : MðLðv; wÞÞ ! MðLðv ei ; wÞÞ fi : MðLðv ei ; wÞÞ ! MðLðv; wÞÞ by hi f ¼ ui f
has a natural hyper-Ka¨hler metric and one can consider a hyper-Ka¨hler quotient by the group Q U(V i ). The variety L(v, w) is a Lagrangian subvariety of (and is homotopic to) this hyperKa¨hler quotient. In the case g = sln , the varieties involved are closely related to flag varieties. Let w, v, v0 , v00 2 ZI 0 be such that v = v0 þ v00 . Consider the maps
p2
p3
~ w; v00 Þ Fðv;
! Fðv; w; v00 Þ ! ðv; wÞ
½3
def
HomðW i ; V i Þ HomðV i ; W i Þ
p1
2
F ðv; w; v v0 Þ ! Lðv; wÞ
F ðv; w; v v0 Þ
ei f ¼ ð1 Þ! ð 2 f Þ fi g ¼ ð2 Þ! ð 1 gÞ
i2I
ðv00 ; 0Þ ðv0 ; wÞ
1
where
1 ðg; ðx; tÞÞ 7! ððghðÞ x g1 tðÞ Þ; ðti gi ÞÞ
Definition 5
One has
½2
where the notation is as follows. A point of F(v, w; v00 ) is a point (x, t) 2 (v, w) together with an I-graded, x-stable subspace S of V such that
Here u ¼ t ðu0 ; . . . ; un Þ ¼ w Cv where C is the Cartan matrix of g and we are using diagram 3 with v0 = v ei where ei is the vector whose components are given by eij = ij . Now let ’ be the constant function on L(0, w) with value 1. Let L(w) be the vector space of functions generated by acting on ’ with all possible combinations of the operators fi . Then let L(v, w) = M(L(v, w)) \ L(w). Proposition 3 The operators ei , fi , hi on L(w) provide it with the structure of the irreducible highest-weight
Finite-Dimensional Algebras and Quivers
integrable representation of g with highest weight P i !i . Each summand of the decomposition i 2 Q0 w L L(w) = v L(v, w) is a weight space with weight P w ! i i vi i . Here the !i and i are the i 2 Q0 fundamental weights and simple roots of g , respectively. Let Z 2 IrrL(v, w) and define a linear map TZ : L(v, w) ! C that associates to a constructible function f 2 L(v, w) the (constant) value of f on a suitable open dense subset of Z. The fact that L(v, w) is finite dimensional allows us to take such an open set on which any f 2 L(v, w) is constant. So we have a linear map : Lðv; wÞ ! CIrrLðv;wÞ Then we have the following proposition. Proposition 4 The map is an isomorphism; for any Z 2 IrrL(v, w), there is a unique function gZ 2 L(v, w) such that for some open dense subset O of Z we have gZ jO = 1 and for some closed GV -invariant subset K L(v, w) of dimension < dim L(v, w) we have gZ = 0 outside Z [ K. The functions gZ for Z 2 Irr(v, w) form a basis of L(v, w).
Additional Topics To conclude, we have given here a brief overview of some topics related to finite-dimensional algebras and quivers. There is much more to be found in the literature. For basics on associative algebras and their representations, the reader may consult introductory texts on abstract algebra such as Lang (2002). For further results (and their proofs) on Ringel–Hall algebras see the papers of Ringel (1990a, b, 1993, 1995, 1996) and of Green (1995) and the references cited therein. The reader interested in species, which extend many of these results to non-simply-laced Lie algebras, should consult Dlab and Ringel (1976). The book by Lusztig (1993) covers the quiver varieties of Lusztig and canonical bases. Canonical bases are closely related to crystal bases and crystal graphs (see Hong and Kang (2002) for an overview of these topics). In fact, the set of irreducible components of the quiver varieties of Lusztig and Nakajima can be endowed with the structure of a crystal graph in a purely geometric way (see Kashiwara and Saito (1997) and Saito (2002)). Many results on Nakajima’s quiver varieties can be found in the original papers (Nakajima 1994, 1998). The overview article (Nakajima 1996) is also useful.
321
Quiver varieties can also be used to give geometric realizations of tensor products of representations (see Malkin (2002, 2003), Nakajima (2001), and Savage (2003)) and finite-dimensional representations of quantum affine Lie algebras (see Nakajima (2001)). This is just a select few of the many applications of quiver varieties. Much more can be found in the literature.
Acknowledgments This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. See also: Instantons: Topological Aspects.
Further Reading Dlab V and Ringel CM (1976) Indecomposable representations of graphs and algebras. Memoirs of the American Mathematical Society 6(173): vþ57. Green JA (1995) Hall algebras, hereditary algebras and quantum groups. Inventiones Mathematicae 120(2): 361–377. Hong J and Kang S-J (2002) Introduction to Quantum Groups and Crystal Bases, Graduate Studies in Mathematics, vol. 42. Providence, RI: American Mathematical Society. Kashiwara M and Saito Y (1997) Geometric construction of crystal bases. Duke Mathematical Journal 89(1): 9–36. Lang S (2002) Algebra, Graduate Texts in Mathematics, vol. 211. New York: Springer. Lusztig G (1993) Introduction to Quantum Groups, Progress in Mathematics, vol. 110. Boston MA: Birkha¨user. Malkin A (2002) Tensor product varieties and crystals: GL case. Transactions of the American Mathematical Society 354(2): 675–704 (electronic). Malkin A (2003) Tensor product varieties and crystals: the ADE case. Duke Mathematical Journal 116(3): 477–524. Nakajima H (1994) Instantons on ALE spaces, quiver varieties, and Kac–Moody algebras. Duke Mathematical Journal 76(2): 365–416. Nakajima H (1996) Varieties associated with quivers. In: Bautista R, Martinez-Villa R, and de la Pen˜a JA (eds.) Representation Theory of Algebras and Related Topics (Mexico City, 1994), CMS Conf. Proc., vol. 19, pp. 139–157. Providence, RI: American Mathematical Society. Nakajima H (1998) Quiver varieties and Kac–Moody algebras. Duke Mathematical Journal 91(3): 515–560. Nakajima H (2001a) Quiver varieties and finite-dimensional representations of quantum affine algebras. Journal of the American Mathematical Society 14(1): 145–238 (electronic). Nakajima H (2001b) Quiver varieties and tensor products. Inventiones Mathematicae 146(2): 399–449. Ringel CM (1990a) Hall algebras. In: Balcerzyk S, Jo´zefiak T, Krempa J, Simson D, and Vogel W (eds.) Topics in Algebra, Part 1 (Warsaw, 1988), Banach Center Publ., vol. 26, pp. 433–447. Warsaw: PWN. Ringel CM (1990b) Hall algebras and quantum groups. Inventiones Mathematicae 101(3): 583–591.
322 Finite Group Symmetry Breaking Ringel CM (1993) Hall algebras revisited. In: Joseph A and Shnider S (eds.) Quantum Deformations of Algebras and Their Representations (Ramat-Gan, 1991/1992; Rehovot, 1991/ 1992), Israel Math. Conf. Proc., vol. 7, pp. 171–176. Ramat Gan: Bar-Ilan Univ. Ringel CM (1995) The Hall algebra approach to quantum groups. In: Gomez-Mont X, Montejano L, de la Pen˜a JA, and Seade J (eds.) XI Latin American School of Mathematics (Spanish) (Mexico City, 1993), Aportaciones Mat. Comun., vol. 15, pp. 85–114. Me´xico: Soc. Mat. Mexicana.
Ringel CM (1996) Green’s theorem on Hall algebras. In: Bautista R, Martinez-Villa R, and de la Pen˜a JA (eds.) Representation Theory of Algebras and Related Topics (Mexico City, 1994), CMS Conf. Proc., vol. 19, pp. 185–245. Providence, RI: American Mathematical Society. Saito Y (2002) Crystal bases and quiver varieties. Mathematische Annalen 324(4): 675–688. Savage A (2003) The tensor product of representations of Uq ðsl2 Þ via quivers. Advances in Mathematics 177(2): 297–340.
Finite Group Symmetry Breaking G Gaeta, Universita` di Milano, Milan, Italy ª 2006 Elsevier Ltd. All rights reserved.
Introduction It is a commonplace situation that symmetric laws of Nature give rise to physical states which are not symmetric. States related by symmetry operations are equivalent, but still nature selects one of them. As an example, consider a ferromagnetic system of interacting spins with no external magnetic field. The ‘‘up’’ and ‘‘down’’ states are equivalent, but one of the two is chosen: the interaction makes states with agreeing spin orientation (and therefore macroscopic magnetization) energetically preferred, and fluctuations will decide which state is actually chosen by a given sample. Finite group symmetry is also commonplace in physics, in particular through crystallographic groups occurring in condensed matter physics – but also through the inversions (C, P, T and their combinations) occurring in high-energy physics and field theory. The breaking of finite group symmetry has thus been thoroughly studied, and general approaches exist to investigate it in mathematically precise terms with physical counterparts. In particular, a widely applicable approach is provided by the Landau theory of phase transitions – whose mathematical counterpart resides in the realm of equivariant singularity and bifurcation theory. In Landau theory, the state of a system is described by a finite-dimensional variable (the ‘‘order parameter’’), and physical states correspond to minima of a potential, invariant under a group. In this article we describe the basics of symmetry breaking analysis for systems described by a symmetric polynomial; in particular, we
discuss generic symmetry breakings, that is, those determined by the symmetry properties themselves and independent of the details of the polynomial describing a concrete system. We also discuss how the plethora of invariant polynomials can be to some extent reduced by means of changes of coordinates, that is, how one can reduce to consider certain types of polynomials with no loss of generality. Finally, we will give some indications on extension of this theory, that is, on how one deals with symmetry breakings for more general groups and/or more general physical systems.
Basic Notions Finite Groups
A finite group (G, ) is a finite set G of elements {g0 , . . . , gN } equipped with a composition law , and such that the following conditions hold: 1. for all g, h 2 G the composition g h belongs to G, that is, g h 2 G; 2. the composition is associative, that is, (g h) k = g (h k) for all g, h, k 2 G; 3. there is an element in G – which we will denote as e – which is the identity for the action of on G, that is, e g = g = g e for all g 2 G; and 4. for each g 2 G there is an element g1 which is the inverse of g, that is, g1 g = e = g g1 . In the following, we omit the symbol , that is, we write gh to mean g h. Similarly, we usually write simply G for the group, rather than (G, ). Given a subset H G, this is a subgroup of (G, ) if (H, ) satisfies the group axioms (1)–(4) above. Note that this implies that e 2 H whenever H is a subgroup, and {e} is a subgroup. Subgroups not coinciding with the whole G and with {e} are said to be ‘‘proper.’’
Finite Group Symmetry Breaking
Given two elements g, h we say that ghg1 is the conjugate of h by g. The conjugate of a subgroup H G by g 2 G is the subgroup of elements conjugated to elements of H, gHg1 = {(ghg1 ), h 2 H}. Group Action
In physics, one is usually interested in a realization of an abstract group as a group of transformations in some set X; in physical applications, this is usually a (possibly, function) space or a manifold, and we refer to elements of X as ‘‘points.’’ That is, there is a map : G 7! End(X) from G to the group of endomorphisms of X, such to preserve the composition law: ðgÞ ðhÞ ¼ ðg hÞ
8g; h 2 G
In this case, we say that we have a ‘‘representation’’ of the abstract group G acting in the ‘‘carrier’’ space or manifold X; we also say that X is a G-space or G-manifold. We often denote by the same letter the abstract element and its representation, that is, write simply g for (g) and G for (G). (In many physically relevant cases, but not necessarily, X has a linear structure and we consider linear endomorphisms. In this case, we sometimes write Tg for the linear operator representing g.) If x 2 X is a point in X, the G-orbit G(x) is the set of points to which x is mapped under G, that is, GðxÞ ¼ fy 2 X : y ¼ gx; g 2 Gg X Belonging to the same orbit is obviously an equivalence relation, and partitions X into equivalence classes. The ‘‘orbit space’’ for the G action on X, also denoted as = X=G, is the set of these equivalence classes. It corresponds, in physical terms, to considering X modulo identification of elements related by the group action. For any point x 2 X, the ‘‘isotropy (sub)group’’ Gx is the set of elements leaving x fixed, Gx ¼ fg 2 G: gx ¼ xg G Points on the same G-orbit have conjugated isotropy subgroups: indeed, y = gx implies immediately that Gy = gGx g1 . When a topology is defined on X, the problem arises if the G-action preserves it; if this is the case, we say that the G-action is ‘‘regular.’’ In the case of a compact Lie group (and a fortiori for a finite group) we are guaranteed the action is regular. (A physically relevant example of nonregular action is provided by the irrational flow on a torus. In this case G = R, realized as the time t irrational flow on the torus X = T k .)
323
Spontaneous Symmetry Breaking
Let us now consider the case of physical systems whose state is described by a point x in the G-space or G-manifold X, with G a group acting by smooth mappings g : X ! X. In physical problems, G quite often acts by linear and orthogonal transformations. (If this is not the case, the Palais–Mostow theorem guarantees that, for suitable groups (including in particular the finite ones) we can reduce to this case upon embedding X into a suitably larger carrier space Y.) Usually, G represents physical equivalence of states, and G-orbits are collections of physically equivalent states. A point which is G-invariant, that is, such that Gx = G, is called ‘‘symmetric’’ for short. Let be a scalar function (potential) defined on X, : X ! R, possibly depending on some parameter , such that the physical state corresponds to critical points – usually the (local) minima – of . A concrete example is provided by the case where is the Gibbs free energy; more generally, this is the framework met in the Landau theory of phase transitions (Landau 1937, Landau and Lifshitz 1958). We are interested in the case where is invariant under the group action, or briefly G-invariant, that is, where ðgxÞ ¼ ðxÞ
8x 2 X; 8g 2 G
½1
A critical point x such that Gx = G is a symmetrical critical point. If Gx is strictly smaller than G, then x is a symmetry-breaking critical point. If a physical system corresponds to a nonsymmetric critical point, we have a spontaneous symmetry breaking: albeit the physical laws (the potential function ) are symmetric, the physical state (the critical point for ) breaks the symmetry and chooses one of the G-equivalent critical points. It follows from [1] that the gradient of is covariant under G. If y = g(x), then the differential (Dg) of the map g : X ! X is a linear map between the corresponding tangent spaces, (Dg) : Tx X ! Ty X. The covariance amounts, with the Riemannian metric in X, to (ij @j )(gx) = [(Dg)ik km @m ](x); this is also written compactly, with obvious notation, as ðrÞðgxÞ ¼ ðDgÞ½ðrÞðxÞ
½2
(in the case of euclidean spaces ( = ) and linear actions described by matrices Tg , the covariance condition reduces to (r)i (Tg x) = (Tg )ij [(r)j (x)]). As (Dg) is a linear map, (r)(x) = 0 implies the vanishing of r at all points on the G-orbit of x. We conclude that critical points of a G-invariant potential come in G-orbits: if x is a critical point for
324 Finite Group Symmetry Breaking
, then each y 2 G(x) is also a critical point for . We speak therefore of critical orbits for . It is thus possible (thanks to the regularity of the G-action), and actually convenient, to study spontaneous symmetry breaking in the orbit space = X=G rather than in the carrier manifold X (Michel 1971). If G describes physical equivalence, physical states whose symmetries are G-conjugated should be seen as physically equivalent. An equivalence class of isotropy types under conjugation will be said to be a symmetry type. We are thus interested, given a G-invariant polynomial , to know the symmetry types of its critical points. We denote symmetry types as [H] = {gHg1 }, and say that [H] < [K] if a group conjugated to H is strictly contained in a group conjugated to K. As we have seen, points on the same G-orbit have the same symmetry type. On the other hand, points on different G-orbits can have the same isotropy type (e.g., for the standard action of O(n) in Rn , all collinear nonzero points will have the same isotropy subgroup but will lie on distinct group orbits).
G-Invariant Polynomials Consider a finite group G acting in X. (Many of the notions and results mentioned in this section have a much wider range of applicability.) We look at the ring of G-invariant scalar polynomials in x1 , . . . , xn . By the Hilbert basis theorem, there is a set { J1 (x), . . . , Jk (x)} of G-invariant homogeneous polynomials of degrees {d1 , . . . , dk } such that any G-invariant polynomial (x) can be written as a polynomial in the { J1 , . . . , Jk }, that is, ðxÞ ¼ ½ J1 ðxÞ; . . . ; Jk ðxÞ
½3
with a polynomial. (A similar theorem holds for smooth functions.) The algebra of G-invariant polynomials is finitely generated, that is, we can choose k finite. When the Ja are chosen so that none of them can be written as a polynomial of the others and r has the smallest possible value (this value depends on G), we say that they are a minimal integrity basis (MIB). (Note that some of the Ja could be written as nonpolynomial functions of the others, and the J could satisfy polynomial relations. For example, consider the group Z2 acting in R2 via g : (x, y) ! (x, y); an MIB is made of J1 (x, y) = x2 , J2 (x, y) = y2 , and J3 (x, y) = xy. None of these can be written as a polynomial function of the others, but J1 J2 = J32 .) In this case, we say that the { Ja } are a set of basic invariants for G. There is obviously some arbitrarness in the choice of the Ja in an MIB, but the
degrees {d1 , . . . , dk } of { J1 , . . . , Jk } are fixed by G. (In mathematical terms, they are determined through the Poincare´ series of the graded algebra PG of G-invariant polynomials.) We will henceforth assume that we have chosen an MIB, with elements { J1 , . . . , Jk } of degrees {d1 , . . . , dk } in x, say with d1 d2 dk . When the elements of an MIB for G are algebraically independent, we say that the MIB is regular; if G admits a regular MIB we say that G is coregular. An algebraic relation between elements J of the MIB is said to be a relation of the first kind. The algebraic relations among the J are a set of polynomials in { J1 , . . . , Jr }, which are identically zero when seen as polynomials in x. If there are algebraic relations among these, they are called relations of the second kind, and so on. A theorem by Hilbert guarantees that the chain of relations has finite maximal length. (This is the homological dimension of the graded algebra PG mentioned above.) In the following, we will consider a matrix built with the gradients of basic invariants, the P-matrix (Sartori). This is defined as P ih ðxÞ :¼ hrJi ðxÞ; rJh ðxÞi
½4
with h. , .i the scalar product in T X. The gradient of an invariant is necessarily a covariant quantity; the scalar product of two covariant quantities is an invariant one, and thus can be expressed again in terms of the basic invariants. Thus, the P-matrix can always be written in terms of the basic invariants themselves.
Geometry of Group Action The use of an MIB allows to introduce a map J: x ! { J1 (x), . . . , Jk (x)} from X to a subset P of Rk . If the MIB is regular, P = Rk , while if the Ji satisfy some relation then P Rk is the submanifold satisfying the corresponding relations. The manifold P is isomorphic to the orbit space = X=G (the isomorphism being realized by the J map) and provides a more convenient framework to study . As mentioned above, on physical terms we are mainly interested in the orbit space up to equivalence of symmetry type. The set of points in X (of orbits in ) with the same symmetry type will be called a G-stratum in X (a G-stratum in ); the G-stratum of the point x will be denoted as (x) X (the G-stratum of the orbit ! as (!) ). (The notion of stratum was introduced by Whitney in topology; a stratified manifold is a set which can be decomposed as the disjoint union of smooth
Finite Group Symmetry Breaking
manifolds of different dimensions, the topological S (or Whitney) strata: M = Mk , with Mk @Mj for all k < j.) It results that the G-stratification is compatible with the topological stratification. Indeed, P is a semialgebraic (i.e., it is defined by algebraic equalities and inequalities) stratified manifold in Rk ; the image of any G-stratum in belongs to a single topological stratum in P, and topological strata in P are the union of images of G-strata in . Moreover, the subgroup relations correspond to bordering relations between G-strata: if [Gx ] < [Gy ], then (y) 2 @(x) and (with !x the orbit of x) (!y ) 2 @(!x ). There is a stratum, called the principal stratum 0 , which corresponds to minimal isotropy, open and dense in X; similarly, the principal stratum 0 is open and dense in .
Landau Polynomial In the Landau (1937) theory of phase transitions, the state of the system under study is described by a G-invariant polynomial : X ! R having a critical point in the origin, with at least some of its coefficients – in particular those controlling the stability of the zero critical point – depending on external control parameters (usually, X = Rn and G O(n); in particular, in solid-state physics G is a crystallographic group). This should be chosen as the most general G-invariant polynomial of the lowest degree ‘ sufficient to ensure termodynamic stability; in mathematical terms, this amounts to the requirement that there is some open set B containing the origin and such that – for all values of the control parameters – r points inwards at all points of @B (i.e., B is invariant under the gradient flow of ). If the polynomials in the MIB are of degree d1 d2 dr , then usually ‘ = 2dr . The G-invariance of and the results recalled above mean that we can always write it in terms of the polynomials in an MIB for G as in [3], (x) = [ J(x)]. The discussion of previous sections shows that we can study symmetry breakings for : X ! R by studying critical points of : P ! R; in other words, Landau theory can be worked out in the G-orbit space := M=G. The polynomial – providing a representation of the Landau polynomial in the orbit space – will also be called Landau–Michel polynomial. (Louis Michel (1923–1999) pioneered the use of orbit space techniques in physics and nonlinear dynamics, originally motivated by the study of hadronic interactions.)
325
In this way, the evaluation of the map : X ! R is, in principle, substituted by evaluation of two maps, J : X ! P and : P ! R. However, if, as in Landau theory, we have to consider the most general G-invariant polynomial on X, we can just consider the most general polynomial on P.
Critical Points of the Landau Polynomial and Geometry of Orbit Space The G-invariance has consequences on the critical points of . We have already seen one such consequence: critical points come in G-orbits. However, this is not all. Indeed, G-invariance enforces the presence of a certain set (G) 2 X of critical points, and conversely if we look for points which are critical under any G-invariant potential, these are precisely the points in (G); the critical points on (G) correspond to critical orbits which we call principal critical orbits. The set (G) can be determined on the basis of the geometry of the G-action. (A trivial example is provided by X = R and G = Z2 acting via g : x ! x; any even function has a critical point in zero, and albeit even functions can, and in general will, have nonzero critical points, this is the only critical point common to all the even functions.) Indeed (Michel 1971): an orbit ! is a principal critical orbit if and only if it is isolated in its stratum. For the linear orthogonal group actions in Rn often occurring in physics, no nonzero point or orbit can be isolated in its stratum. However, we can quotient out the radial degeneracy and work on X = Sn1 Rn . In this case, a G-orbit !1 in Sn1 which is isolated in its stratum corresponds to a one-dimensional family {!r } of G-orbits in Rn (call X0 the corresponding submanifold in X); the gradient of at x 2 X0 points along Tx X0 . We can thus reduce to consider the restriction 0 of the potential to X0 . (See also the reduction lemma of Golubitsky and Stewart in this context.) Correspondingly, if P0 P is the submanifold in P image of X0 , that is, P0 = J(X0 ), we can reduce to consider the restriction 0 of to P0 . As these become one-dimensional problems, general results are available. In particular, one can provide general conditions ensuring the existence of one-dimensional branches of symmetry-breaking solutions bifurcating from zero along any such X0 or P0 ; this is also known as the equivariant branching lemma of Cicogna and Vanderbauwhede.
326 Finite Group Symmetry Breaking
Reduction of the Landau Potential In realistic problems, quickly becomes extremely complicated, that is, it includes a high number of terms and therefore of coefficients. A thorough study of different symmetry-breaking patterns, that is, of the symmetry type of minima of for different values of these coefficients and of the external control parameter, is in this case a prohibitive task. It is possible to reduce the generality of the Landau polynomial with no loss of generality for the corresponding physical problem. Indeed, a change of coordinates in the X space will produce a formally different – but obviously equivalent – Landau polynomial; it is convenient to use coordinates in which the Landau polynomial is simpler. A systematic and algorithmic reduction procedure – based on perturbative expansion near the origin – is well known in dynamical systems theory (Poincare´–Birkhoff normal forms), and can be adapted to the reduction of Landau polynomials. (An alternative and more general – but also much more demanding – approach is provided by the spectral sequence approach, also originating in normal-form theory.) We work near the origin, so that we can assume X = Rn (with metric ), and for simplicity we also take the case where G acts via a linear representation Tg . We consider changes of coordinates of the (Poincare´) form xi ¼ yi þ hi ðyÞ
½5
generated by a G-invariant function H: hi (y) = ij (@H(y)=@yj ); this guarantees that [5] preserves the G-invariance of . The action of [5] on can be read from its action on the basic invariants Ja . It results Ja ðxÞ ¼ Ja ðyÞ þ ðJa ÞðyÞ Ja : ¼ P ab ð@H=@Jb Þ
½6
Let us now consider the reduction of an invariant polynomial (x) = ( J). We write D := @=@J , and understand that summation over repeated indices is implied. In general, ð JÞ ! ð J þ JÞ r X @ð JÞ ¼ ð JÞ þ J þ @J ¼1
@ @H P
ðD ÞP ðD HÞ @J @J
mþp ! mþp ¼ mþp þ ðD p ÞP ðD Hm Þ þ
½8
We can then operate sequentially with Hm of degree 3, 4, . . . ; at each stage (generator Hm ), we are not affecting the terms k with k m. Moreover, we can just consider [8], as higher-order terms are generic and will be taken care of in subsequent steps. (This procedure requires to determine suitable generating functions Hm ; these are obtained as solutions to homological equations.) In the above, we disregarded the dependence on the control parameters, such as temperature, pressure, magnetic field, etc; that is, we implicitly considered fixed values for these. However, they have to change for a phase transition to take place. If we consider a full range of values – including in particular the critical ones – for the control parameters, say 2 , we should take care that the concerned quantities and operators are nonsingular uniformly in . This leads to reduction criteria for the Landau and Landau–Michel polynomials (Gufan). Define, for i = 1, . . . , k the quantities Ui ( J1 , . . . , Jk ) := (@F=@Js )P si . Reduction Criterion
For (x) = ( J1 , . . . , Jk ) : Rn ! R a G-invariant potential depending on physical parameters 2 , there is a sequence of Poincare´ changes of coordinates such that ^ = ( ^ J), is expressed in the new coordinates y as (y) where terms which can be written to higher-order P(up k terms) uniformly in as = 1 Q ( J1 , . . . , Jk ) U ( J1 , . . . , Jk ), with Q polynomials in J1 , . . . , Jk satisfying the compatibility condition (@Q =@J ) = ^ (@Q =@J ), are not present in .
Nonstationary and Nonvariational Problems
where the ellipsis means higher-order terms. Disregarding higher-order terms and using [6] and [4], we get ¼
P k (ax) = akþ1 k (x). Also, write = k k , where k (x) := k [ J(x)]. It results that under a change of coordinates [5] generated by H = Hm homogeneous of degree m þ 1, the terms k with k m are not changed, while the terms mþp change according to
½7
We expand as a sum of P homogeneous polynomials, and write (x) = ‘k = 0 k (x), where
So far we have considered stationary physical states. In some cases, one is not satisfied with such a description, and wants to study time evolution. A model framework for this is provided by the Ginzburg–Landau equation x_ ¼ f ðxÞ
½9
where f = (r) : X ! TX (see above for notation). In this case, G-invariance of implies equivariance
Finite Group Symmetry Breaking
of [9]. More generally, we can consider [9] for an equivariant smooth f (not necessarily a gradient), that is, f i (gx) = (Dg)ij f j (x). In this case, one shows that f ðxÞ 2 Tx ðxÞ
½10
so that closures of G-strata are dynamically invariant, and the dynamics can be reduced to them. This is of special interest for the ‘‘most singular’’ strata, that is, those of lower dimension. The reduction lemma and the equivariant branching lemma mentioned above also hold (and were originally formulated) in this context. The relation [10] also implies that one can project the dynamics [9] in X to a smooth dynamics p_ = F(p) in the orbit space; this satisfies F[ J(x)] = (DJ)[ f (x)]. In the gradient case, this (together with initial conditions) embodies the full dynamics in X, while in the generic case one loses all information about motions along group orbits (note that these correspond to phonon modes). An orbit ! isolated in its stratum is still an orbit of fixed points for any G-equivariant dynamics in X in the gradient case, while in the generic case it corresponds to a fixed point for F and to relative equilibria (dynamical orbits which belong to a single group orbit) in X. In this case, time averages of physical quantities can be G-invariant for nontrivial relative equilibria.
Extensions and Physical Applications We have discussed finite group symmetry breaking and focused on polynomial potentials (which can be thought of as Taylor expansions around critical points). For nonfinite groups, and in particular noncompact ones, the situation can be considerably more complicated. 1. An extension of the theory sketched here is provided by Palais’ theory, and in particular by his ‘‘symmetric criticality principle,’’ which applies in Hilbert or Banach spaces of sections of a fiber bundle satisfying certain conditions. This is especially relevant in connection with field theory and gauge groups. 2. We focused on the situation discussed in classical physics. Finite group symmetry breaking is of course also relevant in quantum mechanics; this is discussed, for example, in the classical books by Weyl (1931) and Wigner (1959), and in the review by Michel et al. (2004). 3. One speaks of ‘‘explicit symmetry breaking’’ when a nonsymmetric perturbation is introduced in a symmetric problem. In the Hamiltonian
327
case (or in the Lagrangian one for Noether symmetries), Hamiltonian symmetries correspond to conserved quantities, and nonsymmetric perturbations make these become approximate constants of motion. 4. The symmetry of differential equations – as well as symmetric and symmetry-breaking solutions for symmetric equations – can be studied in general mathematical terms (see, e.g., Olver (1986)). 5. Physical applications of the theory discussed here abound in the literature, in particular through the Landau theory of phase transitions. A number of these, together with a deeper discussion of the underlying theory, is given in the monumental review paper by Michel et al. (2004). See also: Central Manifolds, Normal Forms; Compact Groups and Their Representations; Electroweak Theory; Finite Group Symmetry Breaking; Phase Transitions in Continuous Systems; Quasiperiodic Systems; Symmetry and Symmetry Breaking in Dynamical Systems; Symmetry Breaking in Field Theory.
Further Reading Abud M and Sartori G (1983) The geometry of spontaneous symmetry breaking. Annals of Physics (NY) 150: 307–372. Gaeta G (1990) Bifurcation and symmetry breaking. Physics Reports 189: 1–87. Gaeta G (2004) Lie–Poincare´ transformations and a reduction criterion in Landau theory. Annals of Physics (NY) 312: 511–540. Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities and Groups in Bifurcation Theory. New York: Springer. Landau LD (1937) On the theory of phase transitions I & II. Zhurnal Eksperimentalnoi i Teoreticheskoi Fiziki 7: 19 and 627. Landau LD and Lifshitz EM (1958) Statistical Physics. London: Pergamon. Michel L (1971) Points critiques des fonctions invariantes sur une G-variete´. Comptes Rendus de l’Academie des Sciences de Paris 272: 433–436. Michel L (1980) Symmetry, defects, and broken symmetry. Reviews of Modern Physics 52: 617–650. Michel L, Kim JS, Zak J, and Zhilinskii B (2004) Symmetry, invariants, topology. Physics Reports 341: 1–395. Olver PJ (1986) Applications of Lie Groups to Differential Equations. Berlin: Springer. Palais RS (1979) The principle of symmetric criticality. Communications in Mathematical Physics 69: 19–30. Sartori G (2002) Geometric invariant theory in a modelindependent analysis of spontaneous symmetry and supersymmetry breaking. Acta Applicandae Mathematicae 70: 183–207. Toledano JC and Toledano P (1987) The Landau Theory of Phase Transitions. Singapore: World Scientific. Weyl H (1931) The Theory of Groups and Quantum Mechanics. New York: Dover. Wigner EP (1959) Group Theory and Its Application to the Quantum Mechanics of Atomic Spectra. New York: Academic Press.
328 Finite Weyl Systems
Finite Weyl Systems D-M Schlingemann, Technical University of Braunschweig, Braunschweig, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction Finite Weyl systems have their applications in various branches of quantum information theory. They are helpful to tame the growth of complexity for a large class of quantum systems: a key discrepancy between classical and quantum systems is the difference in the growth of complexity as one goes to larger and larger systems. This is encountered by simulating a quantum spin system on a computer, for example, with the aim to determine the ground state of a solid-state model of magnetism. For a model of N classical spins, this involves checking the energy for 2N different configurations, but for a model with quantum spins it requires the solution of an eigenvalue equation in a Hilbert space of dimension 2N , which is a vastly more difficult problem for large N. For a three-dimensional lattice, three sites each way (N = 27), this is a problem in 108 dimensions, and lattice size 4 leads to utterly untractable 1019 dimensions. It is therefore highly desirable to find ways of treating at least some aspects of large, complex quantum systems without actually having to write out state vectors component by component. States which are invariant under a suitable discrete abelian symmetry group satisfy this condition. They can be characterized by simple combinatorial data, which do not grow exponentially with the system size N. At the same time, the class of these so-called stabilizer states is sufficiently complex to capture some of the key features needed for computation, especially the quantum correlation (entanglement) between subsystems. They have also been shown to be sufficient to generate large quantum error correcting codes. A further motivation for finite Weyl systems is directly based on constructing quantum error correcting codes from classical coding procedures (see Quantum Error Correction and Fault Tolerance). The ‘‘quantization’’ technique which is used there naturally leads to the structure of finite Weyl systems. Finite Weyl systems precisely represent quantum versions of discrete abelian symmetry groups. It is a standard procedure to build the quantum version of a symmetry group by an appropriate central extension, or equivalently, to study all its projective
representations: the composition of two symmetry transformations is only preserved up to a phase on the representation Hilbert space. The unitary operators which represent the symmetry transformations are called Weyl operators. The simplest and most prominent example for a finite Weyl system is given by the three Pauli matrices and the identity. These four unitary operators build a projective representation of the symmetry group of binary vectors (0, 0), (0, 1), (1, 0), (1, 1), where the group law is the addition modulo two. The null-vector (0, 0) corresponds to the identity, the vector (0, 1) is assigned to X, (1, 0) corresponds to Z, and (1, 1) is mapped to iY. It is not difficult to verify that the product of two Pauli operators preserves the addition of binary vectors up to a phase. Discrete Weyl systems are deeply related to symplectic geometry for vector spaces over finite fields. The additive structure of the vector space is the underlying abelian symmetry group. The exchange of two Weyl operators within a product produces a phase that is the exponential of an antisymmetric bilinear form, as it is explained in the next section. For irreducible Weyl systems, this antisymmetric form must be symplectic because the Weyl operators generate a full matrix algebra. In particular, this requires that the dimension of the underlying vector space is even. The Pauli matrices are also an example for this more special structure: the binary vectors (p, q)p, q = 0, 1 are a two-dimensional vector space over the field with two elements {0, 1}. The commutation relations for Pauli operators imply that the symplectic form can be evaluated for two binary vectors (p, q), (p0 , q0 ) according to pq0 qp0 mod 2. It is apparent to interpret the binary vectors (p, q) as points in a discrete phase space, where the first entry corresponds to the momentum and the second to the position. In view of this, discrete Weyl systems serve as a finitedimensional analog of the canonical commutation relations. For the generic situation in quantum information theory, an irreducible Weyl system is represented on the Hilbert space describing a system of several single particles. Stabilizer states are left unchanged under the action of a so-called isotropic subgroup which consists of mutually commuting Weyl operators: this kind of invariance is precisely the type of constraint that reduces the complexity for the parametrization of the state. For an efficient description of such states, there are combinatorial techniques available e.g., graph theory.
Finite Weyl Systems 329
Operations that preserve the class of stabilizer states (for a particular symmetry group) must be covariant with respect to this symmetry. These operations are called Clifford channels which have far-reaching applications in the theory of quantum error correction. They also allow to take classical coding procedures and turn them into quantum codes: on the classical level, the encoding operation acts on classical phase space as a linear map (additive code). Up to a choice of phases, this induces a quantum channel that preserves the structure of Weyl systems. These codes are called stabilizer codes and have been investigated by many authors (Calderbank et al. 1997, Cleve and Gottesman 1996, 1997) (see Quantum Error Correction and Fault Tolerance). In particular, the first quantum error correcting codes belong to this class. This article is organized as follows. In the next section, the basic mathematical notions are provided, like projective representations, Weyl systems, and irreducibility. Moreover, statements on the main structure of Weyl systems are presented. Next, the notion of Weyl covariant channels (Clifford channels) is introduced and their basic properties are stated. In particular, stronger results for the reversible case are given. The relation between symplectic geometry and reversible Clifford operations on finite Weyl systems is explained. Results on the general structure of stabilizer states and stabilizer codes are given in the penultimate section. Finally, the representation of stabilizer codes in terms of graphs is described.
Finite Weyl Systems A projective representation of a group assigns to each group element a unitary operator w() on a Hilbert space H such that the group law is preserved up to a phase, that is, the relation wð1 þ 2 Þ ¼ f ð1 ; 2 Þwð1 Þwð2 Þ
½1 2
is fulfilled for a phase-valued function f on . In the following, we denote a projective representation by a triple (w, f , H). A finite Weyl system is a projective representation of a finite abelian group. The operators w() are called Weyl operators and the function f is called the factor system. We refer to the work by Zmud (1971, 1972) for an analysis of projective representations for general abelian groups. The Weyl algebra A (w, f , H) associated with a Weyl system (w, f , H) is the smallest norm-closed subalgebra in the space of bounded operators B(H) which contains all Weyl operators. If the Weyl algebra coincides with the algebra of all bounded operators, then the Weyl system is called irreducible.
This is equivalent to the fact that each operator that commutes with all Weyl operators must be a multiple of the identity. In order to analyze the properties of factor systems systematically, we introduce here a few pieces of the cohomology theory of groups. For each positive integer k = 1, 2, 3, . . . we introduce the abelian group Ck () of k-cochains which consists of all phase-valued functions on k . The product and the inverse of k-cochains is defined pointwise. Factor systems are special 2-cochains. Namely, if we consider a Weyl system (w, f , H), then associativity implies that the so-called 2-cocycle condition, f ð1 þ 2 ; 3 Þf ð2 ; 3 Þ1 f ð1 ; 2 þ 3 Þ1 f ð1 ; 2 Þ ¼ 1
½2
holds. This property can also be expressed by a coboundary map which is a group homomorphism from k-cochains to (k þ 1)-cochains. We consider here the action of the coboundary map on a 1-cochain ’ and a 2-cochain f: ð ’Þð1 ; 2 Þ :¼ ’ð1 þ 2 Þ’ð1 Þ1 ’ð2 Þ1
½3
ð f Þð1 ; 2 ; 3 Þ :¼ f ð1 þ 2 ; 3 Þf ð2 ; 3 Þ1 f ð1 ; 2 þ 3 Þ1 f ð1 ; 2 Þ
½4
The group of 2-cocycles Z2 () consists of all 2-cochains f with f = 1 and the group of all 2-coboundaries B2 () contains all 2-cochains of the form f = ’. The 2-fold concatenation of the coboundary map is the trivial homomorphism = 1, which implies that each 2-coboundary is a 2-cocycle. The converse is in general not the case and the 2-cohomology group H 2 () := Z2 ()=B2 () is nontrivial. The Zmud (1971, 1972) analysis shows that the set of Weyl systems are characterized by elements of the 2-cohomology H 2 (). The multiplication of a Weyl system (w, f , H) by a 1-cochain ’ yields a new family of Weyl operators (’w)() = ’()w(). The 2-cocycle f is altered by the multiplication of the 2-coboundary ’ and the new Weyl system is given by (’w, ’f , H). This kind of transformation does not change the cohomology class of the factor system and the corresponding Weyl algebras coincide: A (w, f , H) = A (’w, ’f , H). Thus, the fundamental properties of a Weyl system only depend on the cohomology class of the factor system. In particular, if the factor system f = ’ is a 2-coboundary, then we can trivialize the Weyl system (w, ’, H) by multiplying the inverse 1-cochain ’1 and we obtain a true unitary representation (’1 w, 1, H). The corresponding Weyl algebra A (w, ’, H) is abelian. The relation between cohomology and Weyl systems
330 Finite Weyl Systems
can be made even more precise by the following theorem:
(p, q) = (0011, 1010) can be expressed in terms of Pauli matrices (see Introduction) as follows:
Theorem 1 (Zmud 1971, 1972). is the group homomorphism on 2-cochains that exchanges the variables: (f )(1 , 2 ) = f (2 , 1 ).
wð0011; 1010Þ ¼ wð0; 1Þ 1 wð1; 1Þ wð1; 0Þ ¼ iX 1 Y Z ½9
(i) The antisymmetric part f 1 (f ) of a factor system (2-cocycle) is an antisymmetric bicharacter, that is, a group homomorphism in both arguments keeping the other variable fixed. (ii) Each symmetric 2-cocycle f = f is a 2-coboundary f = ’. (iii) The group of antisymmetric bicharacters on is isomorphic to the 2-cohomology group H 2 (). For each antisymmetric bicharacter the corresponding 2-cohomology class is uniquely determined by = f 1 (f ) for some representative f 2 Z2 (). Example 2 The following Weyl system describes n-quantum digits (in short qudits). The system’s Hilbert space is spanned by orthonormal vectors jai = ja1 , a2 , . . . , an i which are labeled by vectors a of the additive group Fn , where F = Zd is the cyclic field of prime order. A projective representation n (w, , Cd ) of the additive group F2n is given by t
wðp; qÞjai :¼ eð2i=dÞp a ja þ qi
½5
where pt is the transposed vector. The factor system assigns to each pair (p, q), (p0 , q0 ) the phase t 0
ðp; qjp0 ; q0 Þ :¼ eð2i=dÞp q
½6
The finite vector space F2n is interpreted as finite phase space with a multiplicative symplectic form . It assigns to a pair of vectors (p, q), (a, b) the phase t
t
ðp; qja; bÞ :¼ eð2i=dÞðp ba qÞ
½7
The commutation relation for Weyl operators comprise the symplectic form: wðp; qÞwða; bÞ ¼ ða; bjp; qÞwða; bÞwðp; qÞ
½8
The d2n Weyl operators w(p, q) are a basis of the algebra of all operators nacting on the Hilbert n space Cd , hence (w, , Cd ) is irreducible. In particular, this Weyl system is a nice error basis in the sense of (Klappenecker and Roetteler 2002, 2005). Namely, the Weyl operators form a projective representation, on the one hand, and a unitary basis (Werner 2001) on the other. For d = 2 and n = 4, we obtain a system of four qubits and the Weyl operators are tensor products of four Pauli matrices including the identity. For instance, the Weyl operator of the binary vector
Clifford Channels Weyl systems can be seen as quantized symmetries corresponding to finite abelian groups. In the Heisenberg picture the symmetry transformations act on operators A 2 B(H) of the observable algebra by automorphisms (reversible quantum channels): Ad½wðÞðAÞ :¼ wðÞAwðÞ
½10
Since a projective representation preserves the group law up to a phase, the corresponding automorphisms preserve the group law: Ad½wðÞ Ad½wðÞ ¼ Ad½wð þ Þ
½11
A quantum channel T is called a Clifford channel if it is covariant with respect to Weyl systems (w1 , f1 , H1 ) and (w2 , f2 , H2 ), that is, the intertwiner relation T Ad½w2 ðÞ ¼ Ad½w1 ðÞ T
½12
holds. It is required that the antisymmetric part of the factor systems f1 and f2 coincide, that is, = f11 f1 = f21 f2 . We call (w1 , f1 , H1 ) the input and (w2 , f2 , H2 ) the output system. We refer to the article by Scutaru (1979), which is concerned with the general properties of covariant channels. It is a natural question to ask how Clifford channels act on Weyl operators. As shown by Holevo (n.d.), a Clifford channel maps Weyl operators of the output system to multiples of a Weyl operators of the input system, provided the input system is irreducible. Theorem 3 (Holevo (n.d.)). Let T be a Clifford channel such that the input system (w1 , f1 , H1 ) is irreducible. Then there exists a function ’ : ! C such that Tðw2 ðÞÞ ¼ ’ðÞw1 ðÞ
½13
holds for all 2 . The function ’ is of positive type, that is, for all complex functions f on the inequality X 0 ’ð Þf ðÞf ðÞ ½14 ;2
holds. Conversely, if the factor systems f1 = f2 coincide, then a well-defined channel is determined
Finite Weyl Systems 331
by [13] for any function ’ of positive type with ’(0) = 1. We apply Theorem 3 to a reversible Clifford channel T. Each output Weyl operator w2 () is mapped to a multiple of an input Weyl operator Tðw2 ðÞÞ ¼ ’ðÞw1 ðÞ
½15
where ’ is phase-valued (a 1-cochain) according to the reversibility of T. We focus now on the converse problem: construct all reversible Clifford channels for irreducible Weyl systems that have a common antisymmetric part of the factor system. The following theorem gives a useful characterization of reversible Clifford channels. Theorem 4 (Schlingemann and Werner 2001). If (w1 , f1 , H1 ) and (w2 , f2 , H2 ) are irreducible Weyl systems with f11 (f1 ) = f21 (f2 ), then there exists a 1-cochain ’ with coboundary ’ = f11 f2 , and a reversible Clifford channel T’ is determined by T’ ðw2 ðÞÞ ¼ ’ðÞw1 ðÞ If is a 1-cochain that also satisfies there exists 2 such that
½16 = f11 f2 ,
ðÞ ¼ ðjÞ’ðÞ T ¼ Ad½w1 ðÞ T’ ¼ T’ Ad½w2 ðÞ
then ½17 ½18
holds. In other words, two irreducible Weyl systems determine a reversible Clifford channel up to a ‘‘phase space translation .’’ We consider the Weyl system (w, f , H) over a discrete phase space F2n , where F is a finite field of prime order. The group of symplectic transformations Sp(n, F) consists of all F-linear maps s on the phase space F2n that preserve the symplectic form = f 1 f . A further Weyl system (w s, f s, H) is obtained for each symplectic transformation s. Here the factor system f s is defined according to (f s) (, ) := f (s, s) and the corresponding Weyl operators are (w s)() = w(s). Obviously, the antisymmetric part of the factor system f s is the symplectic form s = . The following statement is a direct consequence of Theorem 4. Corollary 5 For each symplectic transformation s 2 Sp(n, F) there exists a 1-cochain ’ with coboundary ’ = f 1 (f s) and the corresponding reversible Clifford channel T[’, s] is given by T½’;s ðwðÞÞ ¼ ’ðÞwðsÞ
½19
with , 2 F2n . Example 6 We consider a finite field F. To a symmetric matrix 2 Mn (F) we associate the
symplectic transformation on F2n that maps a phase space vector (p, q) to (p q, q). This shear transformation is viewed as one elementary step of a discrete dynamics. The quantized version of this dynamics is given by the unitary multiplication operator t
uðÞjqi ¼ dq q jqi
½20
with the root of unity d = exp(i(d þ 1)=d) for d 6¼ 2 and 2 = i. The unitary operator u() implements a reversible Clifford operation for the symplectic transformation (p, q) 7! (p q, q) since the relation qt q
uðÞwðp; qÞuðÞ ¼ d
wðp q; qÞ
½21
holds. The symmetric matrix describes a pattern of two-qudit interactions. This can be visualized by a graph whose vertices are the positions x, y = 1, . . . , n. Two vertices x, y are connected by an edge if the matrix element xy 6¼ 0 is nonvanishing. The value of the matrix element xy is interpreted as the strength of the interaction. Example 7 The second type of symplectic transformations, which is relevant here, is determined by an invertible matrix C 2 Mn (F). It induces a symplectic transformation which maps the vector ~ ~ is the inverse of the (p, q) to (Cq,Cp), where C transpose of C. This is implemented by a unitary transformation F[C] . It is called the Fourier transform associated with the invertible matrix C: 1 X ð2i=dÞpt Cq e jqi ½22 F½C jpi ¼ pffiffiffin d q2Fn By construction, the relation t ~ F½C wðp; qÞF½C ¼ eð2i=dÞp q wðCq; CpÞ
½23
follows. If C = diag(c1 , . . . , cn ) is a diagonal matrix, then F[C] is a local unitary transformation. In fact, the Fourier transform is a tensor product F½C ¼ F½c1 F½c2 F½cn
½24
with cx 2 Fn0, where the tensor product structure is determined by jqi = jq1 i jqn i.
The Stabilizer Formalism This section is dedicated to the stabilizer formalism, which has widely been discussed in the literature (Calderbank et al. 1997, Gottesman 1996, 1997). We investigated here stabilizer codes from a point of view of symmetries and show how they can be characterized by Clifford channels. We verify that
332 Finite Weyl Systems
stabilizer codes are specific Clifford channels in the sense described in the last section. To begin with, we consider an irreducible Weyl system (w, f , H) of an even-dimensional F-vector space such that the antisymmetric part of the factor system := f 1 f is a symplectic form on . Furthermore, we need to introduce the following notions: The symplectic complement of a subspace Q is the subspace Q ¼ f 2 jðjqÞ ¼ 1 8q 2 Qg
½25
Furthermore, a subspace Q of is isotropic if it is contained in its symplectic complement Q Q. In other words, for all pairs of vectors q, q0 2 Q we have (q j q0 ) = 1. We consider an isotropic subspace Q and we denote by (wjQ , f jQ , H) the corresponding restriction of the Weyl system (w, f , H). Since Q is isotropic, it follows that the restriction f jQ is symmetric. Hence, the Weyl algebra for the restricted system A Q := A (wjQ , f jQ , H) is an abelian subalgebra of B(H). As a consequence, all the operators in A Q can be diagonalized simultaneously. To obtain the joint spectral resolution for all operators in A, we employ some facts from the theory of finite dimensional abelian C -algebras:
1. A Q is a finite-dimensional abelian C -algebra and can be identified with the algebra of complex functions C(Q^ ) on a finite set Q^ . 2. Each element $ 2 Q^ is a character (pure state), that is, a linear functional such that $(AB) = $(A)$(B) and $(A ) = $(A). 3. For each operator A 2 A Q there exists precisely one function fA on Q^ which is uniquely determined by $(A) = fA ($). The isomorphism A ! fA is called the Gelfand isomorphism. 4. A character $ 2 Q^ is an irreducible representation of A Q and there is a unique projection e$ onto the subspace in H which carries this irreducible representation. From these facts we derive a joint spectral resolution for all operators in A Q . Namely, each A 2 A Q can be written as X e$ $ðAÞ ½26 A¼ $2Q^
We are now prepared to introduce the notion of stabilizer codes in accordance with Calderbank et al. (1997) and Gottesman (1996, 1997): Let Q be an isotropic subspace in and let $ 2 Q^ be a character of A Q . The projection e$ is called a stabilizer code. The abelian group that is generated by the Weyl operators w(q), q 2 Q, is called stabilizer group. The
abelian C -algebra A Q is called stabilizer algebra. According to the following theorem, each stabilizer code is uniquely associated with a Clifford channel: Theorem 8 (Schlingemann 2002, 2004). Let Q be an isotropic subspace of and let e$ be the stabilizer code of a character $. Then there exists a unique Clifford channel E$ with input system (w$ , f$ , H$ ) and output system (wjQ , f jQ , H) such that the following is true: (i) For each 2 the identity E$ ðwðÞÞ ¼ Q ðÞw$ ðÞ
½27
is fulfilled. (ii) Let v$ : H$ ! H be the isometry which embeds H$ into H, then E$ ðAÞ ¼ v$ Av$
½28
holds for all A 2 B(H). (iii) The channel E$ is invariant under translations in the isotropic subspace Q, that is, the identity E$ Ad½wðqÞ ¼ E$
½29
holds for all q 2 Q. Stabilizer codes for maximally isotropic subspaces Q = Q are special, since the projection e$ onto the eigenspace of the character $ is one-dimensional. Thus, e$ is the density matrix of a pure state which is called stabilizer state. In view of Theorem 8, the expectation value of a Weyl operators w() is given by trðe$ wðÞÞ ¼ $ðwðÞÞ Q ðÞ
½30
Representation by Graphs As described in the previous section (Theorem 8), each stabilizer codes is a pure Clifford channel which is completely determined by an isotropic subspace and a character of the corresponding stabilizer algebra. A constructive characterization of isotropic subspaces can be given in terms of graphs, as it has been shown in Schlingemann (2002, 2004). The complete description of a stabilizer code requires in addition the choice of a character of the stabilizer algebra. Both data, the isotropic subspace and the character, can be encoded in a single graph . The set of vertices N is partitioned into four different types, the input vertices I, the output vertices J, the measurement vertices K and the syndrome vertices L (see Figure 1). The edges of the graph are undirected, and a pair of vertices can be connected by at most d 1 edges, where selflinks are also allowed. The adjacency matrix (also
Finite Weyl Systems 333
(1, 1) ~ iY
(1, 0) ~ Z
1
0
1
0 0
(0, 0) ~ 1
(1, 1) ~ iY
(1, 0) ~ Z
(a)
1
0
1
1
(b)
Figure 1 (a) A graphical representation of a Weyl operator Y Y Z Z 1 of the stabilizer algebra of a quantum error correcting code, encoding one qubit into five (see 00273). The input vertex is gray, the output vertices are black. Each binary vector represents a Pauli matrix sitting at a tensor position of the output system. (b) The expectation values which are products over all edges, where to each edge with labels q, q 0 the value 0 (1)qq is assigned. The character corresponds to the syndrome configuration (1110) (blanc vertices).
denoted by ) is a symmetric matrix with entries xy = 0, 1, . . . , d 1 according to the number of edges between x and y. Thus, the adjacency matrix can be seen as a linear operator on FN with cyclic field F = Zd . Each subset A N corresponds to a linear projection onto the subspace FA FN , which we denote by A . For a convenient description we introduce the following notation: the union of two sets of vertices is written without the symbol [, that is, instead of I [ J we write IJ: Theorem 9 (Schlingemann 2002, 2004). Let Q
FJ FJ be an isotropic subspace and let $ be a character of the stabilizer algebra A Q . Then there exists a graph with input vertices I, output vertices J, measurement vertices K and syndrome vertices L such that the following holds: (i) The linear operator JK IKL is invertible. (ii) The isotropic subspace Q consists of the vectors (J JK q, J q) with q 2 ker(IK JK ). (iii) There is a unique vector a in the syndrome subspace FL such that the expectation values of the character $ are given by ðqþaÞt ðqþaÞ
$ðwðJ JK q; J qÞÞ ¼ d
½31
with q 2 ker(IK JK ). Theorems 8 and 9 provide different useful characterizations of stabilizer codes, namely in terms of eigenspaces, Clifford channels, and graphs.
The original definition of stabilizer codes in terms of eigenspaces goes back to Calderbank, Gottesman, Rains, Shor, and Sloane (see, e.g., Calderbank et al. (1997), Gottesman (1996, 1997). They have developed an approach to derive quantum codes from classical binary codes.
Stabilizer codes can also be characterized by specific Clifford channels (see Theorem 8). The condition for a channel to be a stabilizer code is the covariance with respect to a subgroup of phase space translations. This reflects stabilizer codes in terms of symmetries.
Theorem 9 yields a characterization of stabilizer codes in terms of graphs providing an explicit expression for the isotropic subspace and the character of the stabilizer code. This graphical representation provides a suggestive encoding of various properties like error-correcting capabilities, multipartite entanglement, the effects of specific local operations. In fact, as it has been shown in Briegel and Raussendorf (2001), Du¨r et al. (2003), and Hein et al. (2004) that the entanglement present in a graph state can be derived from its shape. See also: Capacities Enhanced by Entanglement; Quantum Error Correction and Fault Tolerance.
Further Reading Briegel H-J and Raussendorf R (2001) Persistent entanglement in arrays of interacting particles. Physical Review Letters 86: 910. Calderbank AR, Rains EM, Shor PW, and Sloane NJA (1997) Quantum error correction and orthogonal geometry. Physical Review Letters 78: 405–408. Cleve R and Gottesman D (1997) Efficient computations of encodings for quantum error correction. Physical Review A 56: 76–83. Du¨r W, Aschauer H, and Briegel H-J (2003) Multiparticle entanglement purification for graph states. Physical Review Letters 91: 107903. Gottesman D (1996) Class of quantum error-correcting codes saturating the quantum Hamming bound. Physical Review A 54: 1862. Gottesman D (1997) Stabilizer Codes and Quantum Error Correction. Ph.D. thesis, CalTec. Hein M, Eisert J, Du¨r W, and Briegel H-J (2004) Multi-party entanglement in graph states. Physical Review A 69: 062311. Holevo AS (n.d.) Remarks on the classical capacity of quantum channel, quant-ph/0212025. Holevo AS (2004) Additivity conjecture and covariant channels. In Proc. Conference Foundations of Quantum Information. Camerino. Klappenecker A and Roetteler M (2002) Beyond stabilizer codes I: Nice error bases. IEEE Transactions on Information Theory 48(8): 2392. Klappenecker A and Roetteler M (2005) On the monomiality of nice error bases. IEEE Transactions on Information Theory 51(3): 1. Schlingemann D-M (2000) Stabilizer codes can be realized as graph codes. Quant. Inf. Comp. 2(4): 307–323. Schlingemann D-M (2004) Cluster states, graphs and algorithms. Quant. Inf. Comp. 4(4): 287–324. Schlingemann D-M and Werner RF (2001) Quantum errorcorrecting codes associated with graphs. Physical Review A 65: 012308.
334 Finitely Correlated States Schlingemann D-M and Werner RF (2005) On the structure of Clifford quantum cellular automata. Preprint (in preparation). Scutaru H (1979) Some remarks on covariant completely positive linear maps. Rep. Math. Phys. 16: 79–87. Werner RF (2001) All teleportation and dense coding schemes. Journal of Physics A 35: 7081.
Zmud EM (1971) Symplectic geometries over finite abelian groups. Mathematics of the USSR Sbornik 15: 7–29. Zmud E (1972) Symplectic geometry and projective representations of finite abelian groups. Mathematics of the USSR Sbornik 16: 1–16.
Finitely Correlated States R F Werner, Technische Universita¨t Braunschweig, Braunschweig, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction A typical problem of quantum statistical mechanics is to compute equilibrium states of quantum dynamical systems. However, there is a strange difficulty inherent in this task, which is to describe the solution: if we try to describe the quantum state by specifying all matrix elements of all local density operators, we have a job which grows exponentially with the system size. This approach is obviously out of the question for the large systems statistical mechanics is interested in. Luckily, in practice nobody really wants to see all those numbers anyway, and one is content with determining a few correlation functions, or other easily parametrized characteristics of the state. But for computing a state in the first place, we cannot restrict the state description to a such parameters. So the problem there is again: how can we efficiently parametrize the states of interest? In this article we collect some results on a particular way of addressing this problem. It originated in the early 1990s (Fannes et al. 1992b) in ideas for quantizing the notion of Markov chains (Accardi and Frigerio 1983). Recently, there has been a new surge of interest in such ideas, because they turned out to be very useful for numerical work on quantum spin chains. Its typical feature is that one does not directly describe expectation values of the state, but instead generates the state from a description of its correlations between neighboring sites. In the language of quantum information theory, it could be said that the method focuses on the entanglement between different parts of the system.
The Basic Construction Notation
We consider a quantum spin chain, that is, a system of infinitely subsystems, labeled by the integers, each
of which is a quantum-mechanical d-level system. Let us denote the observable algebra at site x 2 Z by Ax . Each Ax is hence isomorphic to the d d matrices. The observables of the whole (infinite) systemN lie in the infinite tensor product AZ = x2Z Ax . This is defined as a quasilocal algebra (Bratteli and Robinson 1987, 1997), which is to say that it is the algebra generated by N all finite tensor products of elements of the Ax , say x2 Ax with Ax 2 Ax and finite. Such an element is said to be localized in , and we denote by A the corresponding algebra. For 1 2 , we identify A1 with a subalgebra of A2 , by tensoring with the identity operator on all sites in 2 n1 . AZ is the completion of the union of all A , with finite, under the C -norm. A state ! on AZ is uniquely specified by its expectations on the subalgebras A . Since these are finite-dimensional matrix algebras, we can write !(A) = tr( A) for A 2 A , with a ‘‘local density operator’’ . The system of local density operators must be consistent with respect to restrictions (partial traces). So far we have not used the structure of the underlying lattice Z in any way. This enters via the translation automorphisms n of AZ , which identify Ax with Axþn . A state is called translationally invariant, if ! n = !. The translationally invariant states form a weakly compact convex subset of the state space of AZ , whose extreme points are called ergodic states. How to Generate Correlations
Correlations between parts of a systems typically have their origin in an interaction in the past. Even if the subsystems are dynamically separated later on, the correlation persists, and one can take this as a motivation to model correlations from two ingredients: a simplified prototype of a correlated system, and some evolution taking the parts of the simplified system to the parts of the given system. Let us consider a composite system, whose parts have observable algebras A1 and A2 , respectively, so that the whole system has algebra A1 A2 . We can build a state ! on this system from a simpler one,
Finitely Correlated States
say a state on some B1 B2 , and two completely positive unit preserving maps Ti : Ai ! Bi such that
...
x
335
...
x +1
!ðA1 A2 Þ ¼ ðT1 ðA1 Þ T2 ðB2 ÞÞ Some features of are inherited by !. For example, when is separable (a convex combination of products), which is always the case if either B1 or B2 is classical (i.e., an abelian algebra), then the same holds for !. Hence, if we want to describe quantum correlated ‘‘entangled’’ states, we have to build the correlations on an entangled state . Similarly, the ‘‘size’’ of the model system B1 B2 limits the strength of correlations in !. As for every correlated state, we can look at the linear functionals on A2 , which are of the form A 7! !(A1 A) with fixed A1 2 A2 . The dimension of the space of such functionals might be called the correlation dimension of !. This dimension is 1 for product states, and can clearly not increase by passing from to !. Hence, it is bounded by the dimensions of B1 and B2 , even if A1 and A2 are infinite dimensional. ‘‘Finite correlation’’ in the sense of the title of this article refers to the finiteness of the correlation dimension between the two halves of a spin chain. The VBS Construction, and Matrix Product States
The so-called valence bond solid (VBS) states on a chain are constructed by applying these ideas to the correlations across every link of a spin chain. Let us introduce a correlated model state x on some þ algebra B x Bx for every bond (x, x þ 1). Then the state at site x is a function of contributions from both bonds connecting it, and we express this by a completely positive map Tx : Ax ! Bþ x1 Bx . Then an observable A1 ALNon a chain piece of length L is first mapped by Lx = 1 Tx to an element þ of Bþ 0 B1 B L1 BL . Evaluating with the states 1 L1 , we are left with an element of Bþ 0 BL , which we can evaluate with yet another state 0L describing the boundary conditions for the construction (see Figure 1). Clearly, if we take the algebras B x large enough, and the model states x sufficiently highly entangled, we can generate every state on the finite chain. However, we can get an interesting class of states, even for fixed finite dimensions of the B x . By restricting this correlation dimension, we can set a level of complexity for the state description. We can then try to handle a given physical problem first with simple states of low correlation dimension, and increase this parameter only as needed. A typical problem here is to determine the ground state of a finite-range Hamiltonian. We can then optimize each Tx and x separately, minimizing the ground
↓ Tx
...
...
– x –1
↓ Tx + 1
+ x
– x
↓ ηx C
...
...
+ x +1
... ...
Figure 1
state energy with all other elements fixed. This is a semidefinite programming problem, for which very efficient methods are known. The global minimization is then done by letting the optimization site x sweep over the whole chain as often as needed. In a ground-state problem one is looking for a pure state, and it is therefore sufficient to choose both the model states x and the operations Tx to pure, that is, without decomposition into sums of similar objects. The scheme is thus run at the vector level rather than the operator level: we take the algebras Bþ x = B x as the operators on a Hilbert space Kx , and x = (dim Kx )1 jx ihx j with the (unnormalized) maximally entangled vector X jji jji 2 Kx Kx ½1
x ¼ j
The maps Tx will be implemented by a single operator Vx : Kx1 Kx ! H as Tx (A) = Vx AVx . Then the vectors 2 HL contributing to the state on the chain of length L are of the form ¼ V1 VL jj0 i L jiL X ¼ ðV1 VL Þjj0 ; j1 ; j1 ; . . . ; jL1 ; jL i j0 ;j1 ;...;jL
where j0 , jL are labels for bases in K0 and KL , describing the possible choices at the boundary, and we have used the special form of . We write out the operators Vx in components, so that X jiVx;jj Vx jjj0 i ¼ 0
with suitable dim Kx1 dim Kx dimensional matrices Vx , in terms of which the above expression can be interpreted as a matrix product. The components of in a product basis {ji} become h1 ; . . . ; L ji ¼ hj0 jV11 V22 VLL jjL i
½2
Due to this form the states generated in this way have also been called ‘‘matrix product states’’ (Klu¨mper 1991). If one wants to consider periodic
336 Finitely Correlated States
boundary conditions, the indices j0 and jL can also be contracted, and the expression becomes a trace. For some simulations it is also convenient to choose K0 = dim KL = 1, so there is only one matrix element to be considered. The scheme for getting ground-state vectors described here is essentially the same as the density matrix renormalization group method (Verstraete et al. 2004). However, the version given here appears to be more transparent, more flexible, and in some cases (e.g., periodic boundary conditions) vastly more efficient. However, it may be too early for such judgment, since this is very much work in progress (Verstraete et al. 2005). In the sequel, we will focus not so much on the numerical aspects, but on the possibility this construction offers to explicitly construct nontrivial translationally invariant states on the infinite chain. Numerically, even in a translation invariant situation the matrices Vx obtained by optimization may turn out to depend on x (Wolf, Private Communication), that is, one has to admit the possibility of a spontaneous symmetry breaking. However, for the construction of states on the infinite chain we will simply fix all Vx to be equal. In some sense this turns the matrix product into a matrix power, which could be analyzed by methods familiar from the transfer matrix formalism of statistical mechanics. In eqn [2] this does not work, because of the -dependence of the matrices involved. Nevertheless, a slight reorganization of the construction will lead to a transfer-matrix-like formalism. The Evolution Operator Construction
Fixing all Tx to be the same in Figure 1 still does not fix the state uniquely, since both in the mixed state version and in the pure state version of the construction some boundary information enters, as well. This boundary information then has to be chosen in such a way that a consistent family of local density operators is generated. It turns out that by rearranging the construction a little bit one can trivially solve one boundary condition, and reduce the other to finding a fixed point of a linear operator. This rearrangement was first carried out in Fannes et al. (1992b), where the term ‘‘finitely correlated state’’ was also coined. The basic element of the VBS construction was the operators T : A ! Bþ B (here already taken independent of x). This is specified by dim A dim Bþ dim B matrix elements. However, assuming we can identify the algebras B , we can also consider these matrix elements as those of an ‘‘evolution operator’’ E : A B ! B. This operator
is once again taken to be completely positive and unit preserving. We introduce its nth iterate E(n) : An B ! B by the recursion Eð1Þ ¼ E;
Eðnþ1Þ ¼ EðidA EðnÞ Þ
½3
Clearly, these operators are again completely positive and unit preserving. Another way to express this iteration is to look at E as a family of maps on B, parametrized by A 2 A: We set EA (B) = E(A B), and find EðnÞ ðA1 An BÞ ¼ EA1 EA1 ðBÞ
½4
An important special role is played by the operator b = E1 , which is again completely positive and unit E preserving. Now given any state on B, we get a state !n on An , by setting ½5
!n ðA1 An Þ ¼ EðnÞ ðA1 An 1Þ b = 1, this family of states is consistent Since E(1) with respect to increasing n, by adding sites on the right, that is, !nþ1 (A 1) = !n (A). In other words, the family !n defines a state on the infinite right half-chain. This state can be extended to the full chain, as a translationally invariant state if and only if consistency also holds for adding sites on the left, that is, if !nþ1 (1 A) = !n (A) for all A 2 An . For this we need a condition on the state b (i.e., : it must be invariant under the map E b (E(B)) = (B) for all B 2 B). This is the only requirement, and we call ! the state AZ generated b has the invariant by E and . Note that since E vector 1, its transpose also has an invariant vector, which can also be chosen as a state. We will often look at unique invariant state, in which case we can call ! the state generated by E, without having to mention . The valence bond picture was very much suggested by trying to describe correlations in a spatially distributed quantum system (the chain). The construction given here is perhaps more readily suggested by a process in time, rather than space. In fact, the paper by Fannes et al. (1992b) was partly motivated by an attempt to define a quantum analog of Markov processes (Accardi and Frigerio 1983). In fact, we can think of the construction as a general form for a repeated measurement in quantum theory. The object on which the measurements are performed has observable algebra B, whereas A describes the successive outputs. Choosing A to be classical (abelian) we would find in ! the joint probability distribution of the sequence of measured values, when the initial state of the object is (not necessarily invariant). Allowing nonabelian A would
Finitely Correlated States
then correspond to a family of delayed choice experiments: while E describes the interaction of the system with the measurement apparatus (includb we are still free to ing the overall state change E), make correlated and even entangled measurements on the successive output systems. This interpretation suggests many extensions, in particular, to continuous time (where the case of abelian outputs is discussed extensively in the classic book by Davies 1976), or to cases allowing an external quantum input in each step, in which case we are looking at a quantum channel with memory B (Kretschmann and Werner 2005). In spite of the different natural interpretations, however, the constructions in this and previous paragraphs give exactly the same class of translationally invariant states on the chain, as was shown in (Fannes et al. 1992b).
Ergodic Decomposition A state on AZ is called ergodic if it is an extreme point of the compact convex subset of translationally invariant states. Often in statistical mechanics, one finds states which may be ergodic, but nevertheless contain a breaking of translation symmetry. Such states can be decomposed into periodic states, that is, states which are invariant with respect to some power of the shift. In general, new decompositions may become possible for any period. If no decomposition into periodic states is possible, the state is called completely ergodic. In this section we consider the question of how to decompose a finitely correlated state into ergodic components, using a well-established connection between ergodicity and clustering properties (Bratteli and Robinson 1987, 1997), that is, the decay of correlation functions. Correlation functions are very easily evaluated for finitely correlated states: let A be two observables localized on n sites, and suppose that these sites are separated by L sites. Then eqn [5] gives ðn Þ b L ðnþ Þ EA ð1Þ ½6
! A 1L Aþ ¼ EA E The L-dependence of this operator is clearly b By assumption governed by the matrix powers of E. this operator always has the eigenvalue 1, because b = 1, and has norm 1, because it is also E(1) completely positive. The spectrum is hence contained in the unit circle. Each eigenvalue with modulus 0 between zero and the next eigenvalue. This property is of considerable interest for models in solid-state physics and statistical mechanics. It was shown for all ergodic pure finitely correlated states in (Fannes et al. 1992b).
Density Density of Finitely Correlated Pure States
The natural topology in which to consider the approximation between states on the chain is the weak topology. A sequence !n converges weakly to ! if for all local A the expectations converge, that is, !n (A) ! !(A). Let us start from an arbitrary translationally invariant state !, and see how we can approximate it. First, we can split the chains into intervals of length L, and replace ! by the tensor product of the restrictions of ! to each of these intervals. This state is not translationally invariant, so we average it over the L translations, and call the resulting state !L . Consider a local observable A, whose localization region has length R. Then for L R out of the L translates contributing to !L the expectation will be the same as for !, and we get R R eðAÞ; !L ðAÞ ¼ 1 !ðAÞ þ ! L L e is again a state. Hence, !L where the error term ! converges weakly to ! as L ! 1. One can show easily that !L is finitely correlated, with an algebra B essentially equal to AL . Hence, the finitely correlated states are weakly dense in the set of translationally invariant states. We can make the approximating states purer by a very simple trick. In the previous construction we always take two intervals together, and replace the tensor product of the two restrictions by a purification, that is, by a pure state on an interval of length 2L, whose restrictions to the two length-L subintervals coincide with !. We average this over 2L translates, and call the result L . The estimates showing that L ! ! weakly are exactly the same as before. Moreover, one can show (Fannes et al. 1992a) that L is purely generated. Being defined as a convex combination of other states, L is not pure, and the peripheral spectrum of b will contain all the 2Lth roots of unity. However, E we can use that such a rich peripheral spectrum is b constructed from an isometry V. not generic for E Therefore, if we choose an isometry V" close to the isometry V generating L , we obtain a purely
339
generated state L" with trivial peripheral spectrum. Since the expression for expectations of such states depends continuously on the generating isometry, we have that L" ! L as " ! 0. But we know from the previous section that such states are pure. Hence, the pure finitely correlated states are weakly dense in the set of all translationally invariant states (Fannes et al. 1992a). This has implications for the geometry of the compact convex set of translationally invariant states, which are rather counter-intuitive for the intuitions trained on finite-dimensional convex bodies. To begin with, the extreme points (the ergodic states) are dense in the whole body. This is not such a rare occurrence in infinite-dimensional convex sets, and is shared, for example, by the set of operators F with 0 F 1 on an infinite-dimensional Hilbert space (Davies 1976). Together with the property that the translationally invariant states form a simplex, it actually fixes the structure of this compact convex set to be the so-called Poulsen simplex. This was known also without looking at finitely correlated states. The rather surprising result of the above density argument is that even the small subclass of states which are extremal, not only in the translationally invariant subset but even in the whole state space, is still dense.
Finitely Correlated Pure States with Bounded Memory Dimension
It is clear in the above construction that the dimension of the algebra B goes to infinity for an approximating sequence. How many states can we get with a fixed memory algebra B? The dimension of this manifold can be estimated easily from the number of parameters needed to describe the map E, and this dimension is certainly small compared to the dimension of the state space of the length L piece of the chain as L ! 1. However, since this is an infinite set, and not a linear subspace, we do not get an immediate bound on the dimension of the linear span of these states. What we want to show in this section is that the space of finitely correlated states with fixed B nevertheless generates a low-dimensional subspace of states on any large interval of the chain. To this end we will have to exhibit many observables A, localized on L sites, whose expectation is the same for all finitely correlated states with given B. Let us look first at the case of purely generated states, or rather at the vectors 2 HL , which can be written in the form [2], which in the translation invariant case becomes h1 ; . . . ; L j i ¼ hj0 jV 1 V 2 V L jjL i
½12
340 Finite-Type Invariants
for some collection V 1 , . . . , V d of k k matrices, and some basis labels j0 , jL 2 {1, . . . , k}. The span of all such vectors will be denoted by V L (k, d), and we would like to analyze the growth of dim V L (k, d), as L ! 1. Now a vector with components a(1 , . . . , L ) lies in the orthogonal complement of V L (k, d) if and only if X að1 ; . . . ; L ÞV 1 V 2 V L ¼ 0 1 ;...;L
for any collection of matrices V . In other words, this expression, considered as a noncommutative polynomial in d variables, is a polynomial identity for k k matrices. The simplest such identity, for k = 2, d = 3, L = 5, is [A, [B, C]2 ] = 0. (For the proof observe that [B, C] is traceless, so its square is a multiple of the identity by the Cayley–Hamilton theorem.) This identity alone implies the existence of many more identities. For example, we can substitute higher-order polynomials for A, B, C, and multiply the identity with arbitrary polynomial from the right or form the left. There is a well-developed theory for such identities, called the theory of polynomial identity (PI) rings. In that context, the precise growth we are looking for has been worked out (Drensky 1998): lim
L!1
log dim V L ðk; dÞ ¼ ðd 1Þk2 þ 1 log L
½13
Thus, the dim V L (k, d) only grows like a polynomial in L, of known degree, and the joint support of all purely generated finitely correlated state is exponentially small compared to HL . We can apply the same idea to the set of all finitely correlated states with B equal to the k k matrices. The joint support in this case is the full space, since the trace state on the chain, which is a product state generated with k = 1, already has full support. However, it is still true all but a polynomial number of expectation values of ! are already fixed by specifying k. Indeed, formula [5] for a general state is precisely of
the form [12], with the difference that the arguments A replace , and the matrices EA are now operators on the k2 -dimensional space B. If we only want an upper bound, we can ignore subtlatties coming from Hermiticity and normalization constraints on E, and we get that the dimension of all finitely correlated states generated from the k k matrices, restricted to a subchain of length L, grows at most like L , with (d2 1)k2 þ 1. See also: Ergodic Theory; Quantum Spin Systems; Quantum Statistical Mechanics: Overview.
Further Reading Accardi L and Frigerio A (1983) Markovian cocycles. Proceedings of the Royal Irish Academy 83A: 251–263. Bratteli O and Robinson DW (1987, 1997) Operator Algebras and Quantum Statistical Mechanics I, II, 2nd edn. Springer. Davies EB (1976) Quantum Theory of Open Systems. Academic Press. Drensky V (1998) Gelfand–Kirillov dimension of PI-algebras. In: Methods in Ring Theory, (Levico Terme, 1997), Lecture Notes in Pure and Appl. Math., vol. 198, pp. 97–113. New York: Dekker. Fannes M, Nachtergaele B, and Werner RF (1992a) Abundance of translation invariant pure states on quantum spin chains. Lett. Mathematical Physics 25: 249–258. Fannes M, Nachtergaele B, and Werner RF (1992b) Finitely correlated states on quantum spin chains. Communications in Mathematical Physics 144: 443–490. Fannes M, Nachtergaele B, and Werner RF (1994) Finitely correlated pure states. Journal of Functional Analysis 120: 511–534. Klu¨mper A, Schadschneider A, and Zittartz J (1991) Journal of Physics A 24: L955. Kretschmann D and Werner RF (2005) Quantum channels with memory, quant-ph/0502106. Verstraete F, Porras D, and Cirac JI (2004) DMRG and periodic boundary conditions: a quantum information perspective. Physical Review Letters 93: 227205. Verstraete F, Weichselbaum A, Schollwo¨ck U, Cirac JI, and von Delft J (2005) Variational matrix product state approach to quantum impurity models, cond-mat/0504305. Wolf M private communication.
Finite-Type Invariants D Bar-Natan, University of Toronto, Toronto, ON, Canada ª 2006 D Bar-Natan. Published by Elsevier Ltd. All rights reserved.
Introduction Knots belong to sailors and climbers and upon further reflection, perhaps also to geometers, topologists, or combinatorialists. Surprisingly, throughout the 1980s, it became apparent that knots are also closely related
to several other branches of mathematics in general and mathematical physics in particular. Many of these connections (though not all!) factor through the notion of ‘‘finite-type invariants’’ (aka ‘‘Vassiliev’’ or ‘‘Goussarov–Vassiliev’’ invariants) (Goussarov 1991, 1993, Vassiliev 1990, 1992, Birman-Lin 1993, Kontsevich 1993, Bar-Natan 1995). Let V be an arbitrary invariant of oriented knots in oriented space with values in some abelian group A. Extend V to be an invariant of 1-singular knots, knots that may have a single singularity that locally looks like a double point % , using the formula
Finite-Type Invariants 341
Vð% Þ ¼ Vð Þ Vð Þ
½1
m
Further extend V to the set K of m-singular knots (knots with m double points) by repeatedly using [1]. Definition 1 We say that V is of type m if its extension VjK mþ1 to (m þ 1)-singular knots vanishes identically. We also say that V is of finite type if it is of type m for some m. Repeated differences are similar to repeated derivatives; hence, it is fair to think of the definition of VjK m as repeated differentiation. With this in mind, the above definition imitates the definition of polynomials of degree m. Hence, finite-type invariants can be considered as ‘‘polynomials’’ on the space of knots. As described in the section ‘‘Basic facts’’, finite-type invariants are plenty and powerful and they carry a rich algebraic structure and are deeply related to Lie algebras. There are several constructions for a ‘‘universal finite-type invariant’’ and those are related to conformal field theory, the Chern–Simons–Witten topological quantum field theory, and Drinfel’d’s theory of associators and quasi-Hopf algebras (see the section ‘‘The proofs of the fundamental theorem’’). Finite-type invariants have been studied extensively (see the section ‘‘Some further directions’’) and generalized in several directions (see the section ‘‘Beyond knots’’). But the first question on finite-type invariants remains unanswered: Problem 2 Honest polynomials are dense in the space of functions. Are finite-type invariants dense within the space of all knot invariants? Do they separate knots? In a similar way, one may define finite-type invariants of framed knots (and ask the same questions).
Basic Facts
invariant (thus, the Jones polynomial can be reconstructed from finite-type information). A similar theorem holds for the Alexander– Conway, HOMFLY-PT, and Kauffman polynomials (Bar-Natan 1995), and indeed, for arbitrary Reshetikhin–Turaev invariants (Reshetikhin and Turaev 1990, Lin 1991), although it is still unknown if the signature of a knot can be expressed in terms of its finite-type invariants. Chord Diagrams and the Fundamental Theorem
The top derivatives of a multivariable polynomial form a system of constants which determine the polynomial up to polynomials of lower degree. Likewise the mth derivative V (m) := V(% V is a constant (for mm % ) of a type m invariant m V(% % ) V( % % ) = V( % (m) mþ1 )=0 % so V is blind to 3D topology), and likewise V (m) determines V up to invariants of lower type. Hence, a primary tool in the study of finite-type invariants is the study of the ‘‘top derivative’’ V (m) , also known as ‘‘the weight system of V.’’ Blind to 3D topology, V (m) only sees the combinatorics of the circle that parametrizes an m-singular knot. On this circle, there are m pairs of points that are pairwise identified in the image; one indicates those by drawing a circle with m chords marked (an ‘‘m-chord diagram’’) (see Figure 1). Definition 4 Let Dm denote the space of all formal linear combinations with rational coefficients of m-chord diagrams. Let Arm be the quotient of Dm by all 4T and FI relations as drawn in Figure 2 (full details are given in, e.g., Bar-Natan (1995)), L and let A^r be the graded completion of A := m Arm . Let Am , A, and A^ be the same as Arm , Ar , and A^r but without imposing the FI relations. Theorem 5 (The fundamental theorem)
Classical Knot Polynomials
The first (nontrivial!) thing to notice is that there are plenty of finite-type invariants and they are at least as powerful as all the standard knot polynomials combined (finite-type invariants are like polynomials on the space of knots; the standard phrase ‘‘knot polynomials’’ refers to a different thing – knot invariants with polynomial values): Theorem 3 (Bar-Natan 1995, Birman-Lin 1993). Let J(K)(q) be the Jones polynomial of a knot K (it is a Laurent polynomial in a variable P q). Consider the m power series expansion J(K)(ex ) = 1 m = 0 Vm (K)x . Then each coefficient Vm (K) is a finite-type knot
3
3
4 1
2
2
1 4
Figure 1 A 4-singular knot and its corresponding chord diagram.
4T : Figure 2 The 4T and FI relations.
FI :
342 Finite-Type Invariants
(Easy part). If V is a rational valued type m-invariant then V (m) defines a linear functional on Arm . If in addition V (m) 0, then V is of type m 1. (Hard part). For any linear functional W on Arm , there is a rational valued type m invariant V so that V (m) = W. Thus, to a large extent, the study of finite-type invariants is reduced to the finite (though superexponential in m) algebraic study of Arm . A similar theorem reduces the study of finite-type invariants of framed knots to the study of Am . The Structure of A
Knots can be multiplied (the ‘‘connected sum’’ operation) and knot invariants can be multiplied. This structure interacts well with finite-type invariants and induces the following structure on Ar and A: Theorem 6 (Kontsevich 1993, Bar-Natan 1995, Willerton 1996, Chmutov et al. 1994). Ar and A are commutative and cocommutative graded bialgebras (i.e., each carries a commutative product and a compatible cocommutative coproduct). Thus, both Ar and A are graded polynomial algebras over their spaces of primitives, P r = m P rm and P = m P m . Framed knots differ from knots only by a single integer parameter (the ‘‘self-linking,’’ itself a type 1 invariant). Thus, P r and P are also closely related. Theorem 7 (Bar-Natan 1995). P = P r hi, where is the unique 1-chord diagram:
Little is known about these dimensions for large m. There is an explicit conjecture in Broadhurst (1997), but no progress has been made in the direction of proving or disproving it. The best asymptotic bounds available are the following. pffiffiffi c m (for Theorem 8 p For ffiffiffiffiffiffiffiffilarge m, dim P m > e m p ffiffiffiffiffi any fixed c < 2=3) and dim Am < 6 m! m=2m (Stoimenow 1998, Zagier 2001). Jacobi Diagrams and the Relation with Lie Algebras
Much of the richness of finite-type invariants stems from their relationship with Lie algebras. Theorem 9 below suggests this relationship on an abstract level, Theorem 10 makes that relationship concrete, and Theorem 12 makes it a bit deeper. Theorem 9 (Bar-Natan 1995). The algebra A is isomorphic to the algebra At generated by ‘‘Jacobi diagrams in a circle’’ (chord diagrams that are also allowed to have oriented internal trivalent vertices) modulo the AS, STU, and IHX relations (see Figure 3). Thinking of trivalent vertices as graphical analogs of the Lie bracket, the AS relation becomes the anti-commutativity of the bracket, STU becomes the equation [x, y] = xy yx, and IHX becomes the Jacobi identity. This analogy is made concrete within the proof of the following: Theorem 10 (Bar-Natan 1995). Given a finitedimensional metrized Lie algebra g (e.g., any semisimple Lie algebra), there is a map T g : A ! U(g)g defined on A and taking values in the invariant part U(g)g of the universal enveloping algebra U(g) of g. Given also a finite-dimensional representation R of g there is a linear functional Wg, R : A ! Q.
Bounds and Computational Results
Table 1 shows the number of type m-invariants of knots and framed knots modulo type m 1 invariants (dim Arm and dim Am ) and the number of multiplicative generators of the algebra A in degree m ( dim P m ) for m 12. Some further tabulated results are in Bar-Natan (1996).
AS :
+
=
STU :
=
–
IHX :
=
–
0
Figure 3 A Jacobi diagram in a circle and the AS, STU, and IHX relations.
Table 1 Some dimensions of spaces of finite type invariants m Arm
dim dim Am dim P m
0
1
2
3
4
5
6
7
8
9
10
11
12
1 1 0
0 1 1
1 2 1
1 3 1
3 6 2
4 10 3
9 19 5
14 33 8
27 60 12
44 104 18
80 184 27
132 316 39
232 548 55
Source: Bar-Natan (1995); Kneissler (1997).
Finite-Type Invariants 343
degrees so it extends to an isomorphism : B^ ! A^ of graded completions.
Proofs of the Fundamental Theorem The heart of all known proofs of Theorem 5 is always a construction of a ‘‘universal finite-type invariant’’ (see below); it is simple to show that the existence of a universal finite-type invariant is equivalent to Theorem 5.
Figure 4 A free Jacobi diagram.
The last assertion along with Theorem 5 show that associated with any g, R, and m, there is a weight system and hence a knot invariant. Thus, knots are unexpectedly linked with Lie algebras. The hope (Bar-Natan 1995) that all finite-type invariants arise in this way was dashed by Vogel (1997, 1999) and Lieberum (1999). But finite-type invariants that do not arise in this way remain rare and not well understood. The Poincare´–Birkhoff–Witt (PBW) theorem of the theory of Lie algebras says that the obvious ‘‘symmetrization’’ map g : S(g) ! U(g) from the symmetric algebra S(g) of a Lie algebra g to its universal enveloping algebra U(g) is a g-module isomorphism. The following definition and theorem form a diagrammatic counterpart of this theorem: Definition 11 Let B be the space of formal linear combinations of ‘‘free Jacobi diagrams’’ (Jacobi diagrams as before, but with unmarked univalent ends (‘‘legs’’) replacing the circle; see an example in Figure 4), modulo the AS and IHX relations of before. Let : B ! A be the symmetrization map which maps a k-legged free Jacobi diagram to the average of the k! ways of planting these legs along a circle. Theorem 12 (Diagrammatic PBW; Kontsevich 1993, Bar-Natan 1995). is an isomorphism of vector spaces. Furthermore, fixing a metrized g there is a commutative square as in Figure 5. Note that B can be graded (by half the number of vertices in a Jacobi diagram) and that respects
Definition 13 A universal finite-type invariant is a map Z : {knots} ! A^r whose extension to singular knots satisfies Z(K) = D þ (higher degrees) whenever a singular knot K and a chord diagram D are related as discussed before. The Kontsevich Integral
The first construction of a universal finite-type invariant was given by Kontsevich (1993) (see also Bar-Natan (1995) and Chmutov and Duzhin (2001)). It is known as ‘‘the Kontsevich integral’’ and up to a normalization factor it is given by Z 1 m X ^ X 1 dzi dz0i ð1Þ#P# DP Z1 ðKÞ ¼ m ð2iÞ zi z0i m¼0 i¼1 t1 < are different hC; Di ¼ sum of all ways to glue legs of C and D > : together R FG One defines Z(K) := hexpt ( w1 =2b), Yi. Then Z FG 1 X Grn ðn ZðKÞÞ ZðKÞ ¼ ðbÞn n¼0 Hence, R FG ZðKÞ LMO 3 ^ Z ðSK Þ ¼ R FG signðbÞ Þ ZðU
Other Approaches
Another construction of a universal perturbative invariant based on integrations over configuration spaces, closer to the original physics approach but harder to calculate because of the lack of a surgery formula, was developed by Axelrod and Singer, Kontsevich, Bott and Cattaneo, Kuperberg and Thurston (see Axelred and Singer (1992), Bott and Cattaneo (1998)).
Quantum Invariants and Perturbative Expansion Fix a simple (complex) Lie algebra g of finite dimension. Using the quantized enveloping algebra of g one can define quantum link and 3-manifold invariants. We recall here the definition, adapted for the case of roots lattice (projective group case). Here our q is equal to q2 in the text book (Jantzen 1995). Fix a root system of g. Let X, Xþ , Y denote respectively the weight lattice, the set of dominant weights, and the root lattice. We normalize the invariant scalar product in the real vector space of thepweight lattice so that the length of any short root ffiffiffi is 2.
Finite-Type Invariants of 3-Manifolds Quantum Link Invariants
Suppose L is a framed oriented link with m-ordered components, then the quantum invariant JL (1 , . . . , m ) is a Laurent polynomial in q1=2D , where 1 , . . . , m are dominant weights, standing for the simple g-modules of highest weights 1 , . . . , m , and D is the determinant of the Cartan matrix of g (see, e.g., Turaev (1994) and Leˆ (1996)). The Jones polynomial is the case when g = sl2 and all the i ’s are the highest weights of the fundamental representation. For the unknot U with zero framing, one has (here is the half-sum of all positive roots) JU ðÞ ¼
Y
qðþj Þ=2 qðþj Þ=2 qðj Þ=2 qðj Þ=2 positive roots
We will also use another normalization of the quantum invariant: QL ð1 ; . . . ; m Þ :¼ JL ð1 ; . . . ; m Þ
m Y
JU ðj Þ
j¼1
This definition is good only for j 2 Xþ . Note that each 2 X is either fixed by an element of the Weyl group under the dot action (see Humphreys (1978)) or can be moved to Xþ by the dot action. We define QL (1 , . . . , m ) for arbitrary j 2 X by requiring that QL (1 , . . . , m ) = 0 if one of the j ’s is fixed by an element of the Weyl group, and that QL (1 , . . . , m ) is component-wise invariant under the dot action of the Weyl group, that is, for every w1 , . . . , wm in the Weyl group, QL ðw1 1 ; . . . ; wm m Þ ¼ QL ð1 . . . ; m Þ Proposition 2 (Leˆ 1996). the root lattice Y.
353
the sum with j ’s run over a fundamental set Pr of the action of rY, where Pr :¼ fx ¼ c1 1 þ þ c‘ ‘ j 0 c1 ; . . . ; c‘ < rg Here 1 , . . . , ‘ are basis roots. For a root of unity of order r, let X FL ð Þ ¼ QL ð1 ; . . . ; m Þjq¼
j 2ðPr \YÞ
If FU ( ) 6¼ 0, define L ð Þ :¼
FL ð Þ ðFUþ ð ÞÞþ ðFU ð ÞÞ
Recall that D is the determinant of the Cartan matrix. Let d be the maximum of the absolute values of entries of the Cartan matrix outside the diagonal. Theorem 6 (Leˆ 2003) (i) If the order r of is coprime with dD, then FU ( ) 6¼ 0. Pg (ii) If FU ( ) 6¼ 0 then M ( ) := L ( ) is an invariant of the 3-manifold M = S3L . Remark 1 The version presented here corresponds to projective groups. It was defined by Kirby and Melvin for sl2 , Kohno and Takata for sln , and by Leˆ (2003) for arbitrary simple Lie algebra. When r is coprime with dD, there is also an associated modular category that generates a topological quantum field theory. In most texts in literature, say Kirillov (1996) and Turaev (1994), another version g was defined. The reason we choose Pg is: it has nice integrality and eventually perturbative expansion. For relations between the version Pg and the usual g , see Leˆ (2003).
Suppose 1 , . . . , m are in 1
(i) (Integrality) Then QL (1 , . . . , m ) 2 Z[q ], (no fractional power). (ii) (Periodicity) When q is an rth root of 1, then QL (1 , . . . , m ) is invariant under the action of the lattice group rY, that is, for y1 , . . . , ym 2 Y, QL (1 , . . . , m ) = QL (1 þ ry1 , . . . , m þ rym ).
Examples g = sl2 ,
When M is the Poincare´ sphere and
Psl2 ðqÞ ¼ M
1 1 X qn ð1 qnþ1 Þ 1 q n¼0
ð1 qnþ2 Þ . . . ð1 q2nþ1 Þ Here q is a root of unity, and the sum is easily seen to be finite.
Quantum 3-Manifold Invariants
P Although the infinite sum j 2Y QL (1 , . . . , m ) does not have a meaning, heuristic ideas show that it is invariant under the second Kirby move, and hence almost defines a 3-manifold invariant. The problem is to regularize the infinite sum. One solution is based on the fact that at rth roots of unity, QL (1 , . . . , m ) is periodic, so we should use
Integrality The following theorem was proved for g = sl2 by Murakami (1995) and for g = sln by Takata–Yokota and Masbaum–Wenzl (using ideas of J Roberts) and for arbitrary simple Lie algebras by Leˆ (2003). Theorem 7 Suppose the order r of is a prime big Pg enough, then M ( ) is in Z[ ] = Z[ exp (2i=r)].
354 Finite-Type Invariants of 3-Manifolds
Theorem 9
Perturbative Expansion
Unlike the link case, quantum 3-manifold invariants can be defined only at certain roots of unity. In general, there is no analytic extension of the Pg function M around q = 1. In perturbative theory, g we want to expand the function M around q = 1 into power series. For QHS, Ohtsuki (for g = sl2 ) and then the present author (for all other simple Lie algebras) showed that there is a number-theoretical Pg expansion of M around q = 1 in the following sense. Suppose r is a big enough prime, and
= exp(2i=r). By the integrality (Theorem 7), Pg ð Þ M
2
2 Z½ ¼ Z½q=ð1 þ q þ q þ þ q
Choose a representative f (q) 2 Z[q] of Formally substitute q = (q 1) þ 1 in f(q):
r1
Þ
Pg M ( ).
f ðqÞ ¼ cr;0 þ cr;1 ðq 1Þ þ þ cr;n2 ðq 1Þn2 The integers cr, n depend on r and the representative f(q). It is easy to see that cr, n (mod r) does not depend on the representative f(q) and hence is an invariant of QHS. The dependence on r is a big drawback. The theorem below says that there is a rational number cn , not depending on r, such that cr, n (mod r) is the reduction of either cn or cn modulo r, for sufficiently large prime r. It is easy to see that if such cn exists, it must be unique. Let s be the number of positive roots of g. Recall that ‘ is the rank of g. Theorem 8 For every QHS M, there is a sequence of numbers 1 cn 2 Z ð2n þ 2sÞ!jH1 ðM; ZÞj such that for sufficiently large prime r jH1 ðM; ZÞj ‘ cr;n cn ðmod rÞ r where jH1 ðM; ZÞj ¼ 1 r is the Legendre symbol. Moreover, cn is an invariant of order 2n. P1 n The series tPg n = 0 cn (q 1) , called M (q 1) := the Ohtsuki series, can be considered as the Pg perturbative expansion of the function M at q = 1. Pg For actual calculation of tM (q 1), see Leˆ (2003), Ohtsuki (2002), and Rozansky (1997). Recovery from the LMO invariant It is known that for any metrized Lie algebra g, there is a linear map Wg : Grn A(;) ! Q (see Bar-Natan (1995)).
1 X
One has
Wg ðGrn ZLMO Þ hn ¼ tPg M ðq 1Þjq¼eh
n¼0
This shows that the Ohtsuki series tPg M (q 1) can be recovered from, and hence totally determined by, the LMO invariant. The theorem was proved by Ohtsuki for sl2 . For other simple Lie algebras, the theorem follows from the Arhus integral (see BarNatan et al. (2002a, b) and Ohtsuki (2002)). Rozansky’s Gaussian Integral
Rozansky (1997) gave a definition of the Ohtsuki series using formal Gaussian integral in the important work. The work is only for sl2 , but can be generalized to other Lie algebras; it is closer to the original physics ideas of perturbative invariants.
Cyclotomic Expansion The Habiro Ring
d by Let us define the Habiro ring Z[q] d :¼ lim Z½q=ðð1 qÞð1 q2 Þ . . . ð1 qn ÞÞ Z½q n
Habiro (2002) called it the cyclotomic completion d is the set of all series of the of Z[q]. Formally, Z[q] form f ðqÞ ¼
1 X
fn ðqÞð1 qÞð1 q2 Þ . . . ð1 qn Þ
n¼0
fn ðqÞ 2 Z½q Suppose U is the set of roots of 1. If 2 U then (1 )(1 2 ) (1 n ) = 0 if n is big enough; d One can hence, one can define f ( ) for f 2 Z[q]. d as a function with domain U. consider every f 2 Z[q] Note that f ( ) 2 Z[ ] is always an algebraic integer. d has remarkable properties, It turns out that Z[q] and plays an important role in quantum topology. Note that the formal derivative of (1 q) (1 q2 ) . . . (1 qn ) is divisible by (1 q) (1 q2 ) . . . (1 qk ) with k > (n 1)=2. This means every d has a derivative f 0 2 Z[q], d and hence element f 2 Z[q] d derivatives of all orders in Z[q]. One can then d its Taylor series at a root of 1: associate to f 2 Z[q] T ðf Þ :¼
1 X f ðnÞ ð Þ ðq Þn n! n¼0
which can also be obtained by noticing that (1 q) (1 q2 ) . . . (1 qn ) is divisible by (q )k if n is bigger than k times the order of . Thus, one has a d ! Z[ ][[q ]]. map T : Z[q]
Finite-Type Invariants of 3-Manifolds
Theorem 10 (Habiro 2004) (i) For each root of unity , the map T is injective, d is determined by its that is, a function in Z[q] Taylor expansion at a point in the domain U. (ii) if f ( ) = g( ) at infinitely many roots of prime d power orders, then f = g in Z[q]. d is an One important consequence is that Z[q] integral domain, since we have the embedding d ,! Z[[q 1]]. T1 : Z[q] In general the Taylor series T1 f has 0 convergence radius. However, one can speak about p-adic convergence to f ( ) in the following sense. Suppose the order r of is a power of prime, r = pk . Then it is known that ( 1)n is divisible by pm if n > mk. Hence, T1 f ( ) converges in the p-adic topology, and it can be easily shown that the limit is exactly f ( ). d as The above properties suggest considering Z[q] a class of ‘‘analytic functions’’ with domain U.
d Quantum Invariants as an Element of Z[q]
It was proved, by Habiro for sl2 and by Habiro with the present author for general simple Lie algebras, d that quantum invariants of ZHSs belong to Z[q] and thus have remarkable integrality properties: Theorem 11 g (i) For every ZHS M, there is an invariant IM 2 d Z[q] such that if is a root of unity for which Pg the quantum invariant M ( ) can be defined, g Pg then IM ( ) = M ( ). (ii) The Ohtsuki series is equal to the Taylor series g of IM at 1.
Corollary 1
Suppose M is a ZHS.
(i) For every root of unity , the quantum invariant g ( ) 2 Z[ ]. (No at is an algebraic integer, M restriction on the order of is required.) (ii) The Ohtsuki series tPg M (q 1) has integer coefficients. If is a root of order r = pk , where p is prime, then the Ohtsuki series at converges p-adically to the quantum invariant at . Pg (iii) The quantum invariant M is determined by values at infinitely many roots of prime power orders and also determined by its Ohtsuki series. (iv) The LMO invariant totally determines the Pg quantum invariants M . Part (ii) was conjectured by R Lawrence for sl2 and first proved by Rozansky (also for sl2 ). Part (iv) follows from the fact that the LMO invariant determines the Ohtsuki series; it exhibits another universality property of the LMO invariant.
355
See also: Finite-Type Invariants; Knot Invariants and Quantum Gravity; Lie Groups: General Theory; Quantum 3-Manifold Invariants.
Further Reading Axelrod S and Singer IM (1992) Chern–Simons perturbation theory. In: Proceedings of the XXth International Conference on Differential Geometric Methods in Theoretical Physics, New York, 1991, vols. 1 and 2, pp. 3–45. River Edge, NJ: World Scientific. Bar-Natan D (1995) On the Vassiliev knot invariants. Topology 34: 423–472. Bar-Natan D, Garoufalidis S, Rozansky L, and Thurston DP (2002a) The Arhus integral of rational homology 3-spheres. Selecta Mathematica (NS) 8: 315–339. Bar-Natan D, Garoufalidis S, Rozansky L, and Thruston DP (2002b) The Arhus integral of rational homology 3-spheres. Selecta Mathematica (NS) 8: 341–371. Bar-Natan D, Garoufalidis S, Rozansky L, and Thurston DP (2004) The Arhus integral of rational homology 3-spheres. Selecta Mathematica (NS) 10: 305–324. Bar-Natan D, Leˆ TTQ, and Thurston DP (2003) Two applications of elementary knot theory to Lie algebras and Vassiliev invariants. Geometry and Topology 7: 1–31 (electronic). Bott R and Cattaneo AS (1998) Integral invariants of 3-manifolds. Journal of Differential Geometry 48: 91–133. Cochran TD and Melvin P (2000) Finite type invariants of 3-manifolds. Inventiones Mathematicae 140: 45–100. Garoufalidis S and Levine J (1997) Finite type 3-manifold invariants, the mapping class group and blinks. Journal of Differential Geometry 47: 257–320. Garoufalidis S, Goussarov M, and Polyak M (2001) Calculus of clovers and finite type invariants of 3-manifolds. Geometry and Topology 5: 75–108 (electronic). Goussarov M (1999) Finite type invariants and n-equivalence of 3-manifolds. Comptes Rendus des Seances de l’Academie des Scie´nces. Se´rie I. Mathe´matique 329: 517–522. Habiro K (2000) Claspers and finite type invariants of links. Geometry and Topology 4: 1–83 (electronic). Habiro K (2002) On the quantum sl2 invariants of knots and integral homology spheres. In: Invariants of Knots and 3-Manifolds (Kyoto, 2001), Geometry and Topology Monogram, (electronic), vol. 4, pp. 55–68. Coventry: Geometry and Topology Publisher. Habiro K (2004) Cyclotomic completions of polynomial rings. Publications of the Research Institute for Mathematical Sciences, Kyoto University 40: 1127–1146. Humphreys J (1978) Introduction to Lie Algebras and Representation Theory. Graduate Texts in Mathematics, vol. 6. Berlin: Springer. Jantzen JC (1995) Lecture on Quantum Groups. Graduate Studies in Mathematics, vol. 6. Providence, RI: American Mathematical Society. Kirillov A (1996) On an inner product in modular categories. Journal of American Mathematical Society 9: 1135–1169. Leˆ TTQ (1997) In: Buchtaber and Novikov S (eds.) An Invariant of Integral Homology 3-Spheres which is Universal for all Finite Type Invariants. AMS Translation Series 2, vol. 179, pp. 75–100. Providence, RI: American Mathematical Society. Leˆ TTQ (2000) Integrality and symmetry of quantum link invariants. Duke Mathematical Journal 102: 273–306. Leˆ TTQ (2003) Quantum invariants of 3-manifolds: integrality, splitting, and perturbative expansion. Topology and Its Applications 127: 125–152.
356 Floer Homology Leˆ TTQ, Murakami J, and Ohtsuki T (1998) On a universal perturbative invariant of 3-manifolds. Topology 37: 539–574. Murakami H (1995) Quantum SO(3)-invariants dominate the SU(2)-invariant of Casson and Walker. Math Proceedings Cambridge Philosophical Society 117: 237–249. Ohtsuki T (2002) Quantum Invariants. A Study of Knots, 3-Manifolds and their Sets. Series on Knots and Everything, vol. 29. River Edge, NJ: World Scientific. Rozansky L (1997) The trivial connection contribution to Witten’s invariant and finite type invariants of rational
homology spheres. Communications in Mathematical Physics 183(1): 23–54. Rozansky L (1998) On p-adic propreties of the Witten– Reshetikhin–Turaev invariant. Preprint math.QA/9806075. Turaev VG (1994) Quantum Invariants of Knots and 3-Manifolds. de Gruyter Studies in Mathematics, vol. 18. Berlin: Walter de Gruyter. Witten E (1989) Quantum field theory and the Jones polynomial. Communications in Mathematical Physics 121: 360–379.
Floer Homology P B Kronheimer, Harvard University, Cambridge, MA, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Morse theory allows one to reconstruct the homology of a compact manifold B from data obtained from the gradient flow of a function f : B ! R, the Morse function. The term ‘‘Floer homology’’ is used to describe homology groups that arise from carrying out the same construction, but in a setting where the space B is replaced by an infinite-dimensional manifold (a space of maps, or a space of configurations for a gauge theory), and where the gradient trajectories of the Morse function correspond to solutions of an elliptic differential equation. There are two important types of such homology theories that have been extensively developed, and the study of both was initiated in the 1980s by Andreas Floer. In the first type, the elliptic equation that arises is a Cauchy– Riemann equation, whose solutions are pseudoholomorphic maps from a two-dimensional domain into a symplectic manifold. In the second type, the elliptic equation is an equation of gauge theory on a 4-manifold: either the anti-self-dual Yang–Mills equations or the Seiberg–Witten equations. Important antecedents of Floer’s work included work of Conley, Zehnder, and others on the symplectic fixed-point problem, and Witten’s ideas about Morse theory. This article describes the background material from Morse theory before discussing Floer homology of Cauchy–Riemann type and its application to the Arnol’d conjecture in symplectic topology. Floer homology in the context of four-dimensional gauge theories is discussed more briefly.
Morse Theory Let B be a smooth, compact manifold and f : B ! R a smooth function. A critical point p of f is said to
be nondegenerate if the Hessian of f is a nonsingular operator on Tp B. The function f is a Morse function if all its critical points are nondegenerate. In the presence of a Riemannian metric g on B, the derivative df becomes a vector field, the gradient rf , and we can consider the downward gradientflow equation for a path x(s) in B: dx ¼ rf ðxÞ ds If p and q are nondegenerate critical points, let us write M(p, q) for the space of solutions x(s) satisfying lim xðsÞ ¼ p
s!1
lim xðsÞ ¼ q
s!þ1
To understand the structure of M(p, q), consider the linearization of the gradient-flow equation at a solution x 2 M(p, q). This is a linear equation for a vector field X along the path x in B, and takes the form r@=@s X ¼ rrf ðXÞ
½1
where rrf is the covariant derivative of the gradient rf , an operator on tangent vectors. Let x be the dimension of the space of solutions X to this linear equation, with the boundary conditions lims ! 1 X(s) = 0, and let 0x be the dimension of the space of solutions to the adjoint equation r@=@s X ¼ þrrf ðXÞ We say that the trajectory x is ‘‘regular’’ if 0x = 0. In this case, the trajectory space M(p, q) has the structure of smooth manifold near x: its dimension is x and its tangent space is the space of solutions X to [1]. The gradient flow is said to be Morse–Smale if all trajectories between critical points are regular. If f is any Morse function, one can always choose the metric g so that the corresponding flow is Morse–Smale. (It is also the case that one can leave g fixed and perturb f to achieve the same effect.)
Floer Homology
In the Morse–Smale case, each M(p, q) is a smooth manifold. The dimension of M(p, q) in the neighborhood of a trajectory x depends only on p and q, not otherwise on x. Indeed, even without the regularity condition, the index of eqn [1], namely the difference x 0x , is given by x 0x ¼ indexðpÞ indexðqÞ where index(p) denotes the number of negative eigenvalues (counting multiplicity) of the Hessian at p. In the Morse–Smale case therefore, the dimension of M(p, q) is given by index(p) index(q). If x(s) is a solution of the gradient-flow equation, then so is the reparametrized trajectory x(s þ c); and this is different from x(s) as long as p 6¼ q. Let us denote by M(p, q) the quotient of M(p, q) by the action of R given by these reparametrizations. We have dim Mðp; qÞ ¼ indexðpÞ indexðqÞ 1
ðp 6¼ qÞ
as long as the trajectory space is nonempty. Let F2 denote the field with two elements. The Morse complex of a Morse–Smale gradient flow, with coefficients in F2 , is defined as follows. For each i, let Ci (f ) be the finite-dimensional vector space over F2 having a basis ep1 ; . . . ; epri indexed by the critical points p1 , . . . , pri with index i. For each pair of critical points p and q with indices i and i 1 respectively, let pq 2 F2 denote the number of points in the zero-dimensional manifold M(p, q), counted mod 2: qÞ ðmod 2Þ pq ¼ #Mðp; The Morse–Smale condition ensures that the zero dimensional space M(p, q) is finite, so this definition is satisfactory. Define a differential : Ci ðf Þ ! Ci1 ðf Þ by ðep Þ ¼
X
pq eq
indexðqÞ¼i1
The first important fact is that really is a differential: as long as the flow is Morse–Smale, we have the composite : Ci ðf Þ ! Ci2 ðf Þ is zero
½2
We can therefore construct the homology of the complex (C (f ), ). This is the Morse homology: Hi ðf Þ ¼
kerð : Ci ðf Þ ! Ci1 ðf ÞÞ imð : Ciþ1 ðf Þ ! Ci ðf ÞÞ
½3
357
The proof of [2] is as follows. Suppose that p has index i and r is a critical point with index i 2, and consider M(p, r), which has dimension 1. The key step is to understand that M(p, r) is noncompact, and that its ends correspond to ‘‘broken trajectories’’: pairs (x1 , x2 ) (modulo reparametrization), where x1 is a gradient trajectory from p to some q of index i 1, and x2 is a P trajectory from q to r. The number of ends is thus q qr pq . Since the number of ends of a 1-manifold is even, this sum is zero in F2 . This sum is also the matrix entry of from ep to er ; so = 0. The main result about Morse homology in finite dimensions is the following: Theorem 1 The Morse homology Hi (f ) is isomorphic to the ordinary homology of the compact manifold B with coefficients F2 : the group Hi (B; F2 ). This result can be proved by first showing that Hi (f ) depends only on B, not on the choice of f or the metric. (This step can be accomplished by examining a nonautonomous flow of the form dx=ds = rf (s, x).) Then one can examine the Morse complex in the case of a self-indexing Morse function (where the value of f at the critical points is a monotone-increasing function of their index). In the self-indexing case, the unstable manifolds of the critical points give rise to a cell decomposition of the manifold B, and the Morse complex is easily identified with the cellular chain complex for this cell decomposition. The sum of the dimensions of the Morse homology groups cannot be larger than the sum of the dimensions of the chain groups Ci (f ), which is the total number of critical points. The above theorem therefore implies the following basic version of the ‘‘Morse inequalities’’: Corollary 2 The number of critical points of a Morse function f : B ! R cannot be less than P i dim Hi (B; F2 ). The Morse complex can be refined in various ways. For example, one can use integer coefficients in place of coefficients F2 by taking account of orientations of the spaces of trajectories. One can also introduce Morse theory with coefficients in a local system, and in both these cases a version of the above theorem continues to hold. One can also study the Morse complex of a multivalued Morse function: that is, one can start with closed 1-form on B, with nontrivial periods, and study the flow generated by the corresponding vector field g1 . Such a theory was developed by Novikov. The Morse complex can be generalized in a different direction, replacing f by a functional
358 Floer Homology
related to a geometric problem. The canonical example of this (and one of the very few cases in which the theory works as in the finite-dimensional case) is the case that B = LW is the space of loops u : S1 ! W in a Riemannian manifold W and f is the R ‘‘energy function,’’ fE (u) = (du=dt)2 dt. If the Morse–Smale condition holds, then the Morse homology Hi (fE ) computes the homology of LW, as expected. Critical points of fE are geodesics, and the relationship between geodesics and the topology of LW, for which Corollary 2 provides a prototype, is an idea with many applications. For the energy functional, the downward gradientflow equation is a parabolic equation (the ordinary heat equation if the target space is Euclidean), and a solution to the flow exists for each choice of initial condition. Floer homology can be loosely characterized as the Morse theory of certain variational problems for which the gradient-flow equation is not parabolic, but elliptic of first order: the important models are the Cauchy–Riemann equation in dimension 2, the anti-self-dual Yang– Mills equations in dimension 4, or the closely related Seiberg–Witten equations. For an elliptic equation, one does not expect to solve the Cauchy problem with arbitrary initial condition; so with Floer homology, one is studying a functional for which the gradient flow is not everywhere defined. However, to define the Morse complex, the important thing is only that we have a good understanding of the trajectory spaces M(p, q), which will now be solution spaces for an elliptic problem of geometric origin. The proof of Theorem 1 depends very much on the fact that the flow is everywhere defined: this theorem will therefore fail for the Morse complexes arising in Floer theory, and one must look elsewhere for a means to compute the Morse homology groups. Before discussing Floer homology in more specific terms, we shall describe the problem in symplectic geometry that motivated its development.
The Arnol’d Conjecture A symplectic manifold of dimension 2n is a smooth manifold W equipped with a 2-form ! which is closed and nondegenerate. On a symplectic manifold, one can associate to each smooth function H : W ! R a vector field XH on W: the vector field is characterized by the property that !ðXH ; VÞ ¼ dHðVÞ for all vector fields V. In this situation, one refers to H as the Hamiltonian and XH as the corresponding
Hamiltonian vector field. If W is compact, or if XH is otherwise complete, then this vector field generates a flow t : W ! W(t 2 R). We also wish to consider the case that H is time dependent: we suppose that Ht : W ! R is a Hamiltonian which varies smoothly with t 2 R and is periodic, in that Htþ1 = Ht . In this case, there is a time-dependent Hamiltonian vector field Xt , and we can consider the flow t that it generates: so for x 2 W, the path t (x) will be the solution to d t ðxÞ ¼ Xt ðxÞ dt
½4
with initial condition 0 (x) = x. The Arnol’d conjecture, in one formulation, concerns the 1-periodic solutions to this equation, or equivalently the fixed points of 1 : W ! W. A fixed point x with 1 (x) = x is called nondegenerate if d1 : Tx X ! Tx X does not have 1 as an eigenvalue. With this understood, one version of the conjecture states: Conjecture 3 Suppose W is compact and let Ht be any 1-periodic, time-dependent Hamiltonian. If the fixed points of 1 are all nondegenerate, then the number of fixed points is not less than the sum of the Betti numbers of the manifold W. There is another, more general version of this conjecture. Let L W be a closed Lagrangian submanifold: that is, an n-dimensional submanifold such that the restriction of ! to L as a 2-form is identically zero. Let L0 W be another Lagrangian, obtained from L by a Hamiltonian isotopy: that is, L0 is 1 (L), for some flow t generated by a timedependent Hamiltonian Ht as above. Question 4 If L and L0 intersect transversely, is it always true that the number of intersection points of L and L0 is at least the sum of the Betti numbers of the manifold L: X #ðL \ L0 Þ rankHi ðLÞ? i
This is phrased as a question rather than a conjecture, because the answer is certainly ‘‘no’’ in some cases. For example, L might be a circle contained in a small disk in a symplectic 2-manifold, in which case there is no reason why 1 should not move the disk to be completely disjoint from itself. Nevertheless, with extra hypotheses, it is known that the answer is often ‘‘yes.’’ We can exhibit Conjecture 3 as a special case of Question 4, as follows. Given a symplectic manifold (V, !), we can form the product W = V V, with the symplectic form !W = p1 ! þ p2 !, where the pi are the two projections. The result of this definition is
Floer Homology
that the diagonal in V V is a Lagrangian submanifold, LW ¼V V for this symplectic form. Let Ht be a time-dependent Hamiltonian on V, and let t : V ! V be the flow. Then Ht p2 is a time-dependent Hamiltonian generating a flow on W. For the flow on W, the image L0 of the diagonal L W at time 1 is the graph of 1 : V ! V. Thus, (L \ L0 ) can be identified with the set of fixed points of 1 in V, and an affirmative answer to Question 4 for L W implies Conjecture 3 for V. Conjecture 3 and Question 4 can both be extended to the case of isolated degenerate fixed points of 1 for Conjecture 3, or to the case of isolated, nontransverse intersections for Question 4. For example, one can ask whether, in the nontransverse case, the sum of the intersection multiplicities can ever be less than the sum of the Betti numbers.
Morse Theory and the Arnol’d Conjecture The Arnol’d conjecture, and the related Question 4, can both be studied by reformulating them as questions about the number of critical points of a carefully chosen functional. We begin with the situation addressed by Conjecture 3. For simplicity, we suppose that 2 (W) is zero. Let B be the space of smooth, null-homotopic loops in W: B ¼ fu : S1 ! Wju is smooth and null homotopicg This is a smooth, infinite-dimensional manifold. There is a natural functional f0 : B ! R, the symplectic action, defined as f0 ðuÞ ¼
Z
v ð!Þ
D2
where v : D2 ! W is any extension of the map u : S1 ! W. The extension v exists because u is null homotopic, and the value of f0 is independent of the choice of v because 2 (W) = 0. This functional can be modified in the presence of a periodic Hamiltonian. Introduce a coordinate t on S1 with period 1, and so regard u as a periodic function of t. Write the Hamiltonian as Ht as before, and define f ðuÞ ¼ f0 ðuÞ þ
Z 0
1
Ht ðuðtÞÞ dt
359
To compute the first variation of f, consider a oneparameter family of loops us (t) = u(s, t) parametrized by s 2 R. We compute Z 1 Z 1 d @u @u @u f ðus Þ ¼ ; ! dHt dt þ dt ds @s @t @s 0 0 Z 1 @u @u ; Xt ðuÞ dt ! ¼ @s @t 0 using the relationship between dHt and Xt . Thus, a loop u 2 B is a critical point of f : B ! R if and only if it is a solution of the equation du ¼ Xt ðuðtÞÞ dt
½5
This means that there is a one-to-one correspondence between these critical points and certain 1-periodic solutions of eqn [4]: these in turn correspond to fixed points p of 1 with the additional property that the path t (p) from p to p is null homotopic. To consider the formal gradient flow of the functional f, on must introduce a metric on B. A Riemannian metric g on the symplectic manifold (W, !) is compatible with ! if there is an almostcomplex structure J : TW ! TW such that !(X, Y) = g(JX, Y) for all tangent vectors X and Y at any point of W. Let gt be a 1-periodic family of compatible Riemannian metrics on W. Using these, on can define an inner product on the tangent bundle of B by the formula Z 1 hU; Vi ¼ gt ðUðtÞ; VðtÞÞ dt 0
in which U and V are tangent vectors at u 2 B, regarded as vector fields along the loop u in W. We can rewrite the above formula for the variation of f in terms of this inner product: @u @u ; Jt Xt ðuÞ @s @t where Jt is the almost-complex structure corresponding to gt . Formally then, a one-parameter family of loops u(s, t) is a solution of the downward gradient-flow equations for the functional f with respect to this metric, if u satisfies the differential equation @u @u þ Jt Xt ðuÞ ¼ 0 ½6 @s @t In the absence of the term Xt , and with W replaced by Cn with the standard J, this equation becomes the Cauchy–Riemann equation du=dz = 0, for a function u of the complex variable z = s þ it, periodic in t.
360 Floer Homology
Let us now suppose we are in the situation of Conjecture 3, so W is closed, and the fixed points of 1 are nondegenerate. As we have seen, each fixed point p of 1 corresponds to a 1-periodic solution up of eqn [5], a critical point of f. For each pair of fixed points p and q, introduce M(p, q) as the space of solutions of the formal gradient-flow equations of f, running from p to q: that is, M(p, q) is the space of maps u : R S1 ! W satisfying eqn [6], with lim uðs; tÞ ¼ up ðtÞ
s!1
lim uðs; tÞ ¼ uq ðtÞ
s!þ1
With these definitions in place, one can follow the same sequence of steps that we outlined previously in the context of finite-dimensional Morse theory, to construct the Morse complex. First, if u belongs to M(p, q), we can consider the linearization at u of eqns [6], to obtain the counterpart of eqn [1]. These are linear equations for a vector field U(s, t) along u in W, and take the form r@=@s U þ Jt r@=@t U þ hðUÞ ¼ 0
½7
where h is a linear operator of order zero. Let u denote the dimension of the space solutions U which decay at s = 1, and let 0u denote the dimension of the space of solutions of the formal adjoint equation. Elliptic theory for the Cauchy–Riemann equation, and the nondegeneracy condition for up and uq , mean that the operator that appears on the left-hand side of the equation is Fredholm: so both u and 0u are finite, and the index u 0u is deformation invariant. This index depends only on p and q: we give it a name, u 0u ¼ indexðp; qÞ 0u
As before, u is said to be regular if is zero. For suitable choice of the almost-complex structures Jt (or equivalently the metrics gt ), the Morse–Smale condition will hold: that is, the trajectories in all spaces M(p, q) are regular. In this case, each M(p, q) is a smooth manifold and has dimension index(p,q) if it is nonempty. The ‘‘relative index’’ index(p, q) plays the role of the difference of the Morse indices in the finitedimensional case. It can be defined whether or not M(p, q) is empty by considering an equation such as [7] along an arbitrary path u(s, t). In general, there is no natural way to define the ‘‘index’’ of p: if we wish, we can select one fixed point p0 and declare it to have index zero; we can then define index(p) as index(p, p0 ). Alternatively, we can regard the critical points as indexed by an affine copy of Z (without a preferred zero).
Imitating the construction of the Morse complex, we define a vector space CF over F2 as having a basis consisting of elements ep indexed by the fixed points p. We then define : CF ! CF by X ep ¼ pq eq indexðp;qÞ¼1
where pq is defined by counting points in M(p, q) as before. The vector space CF is Z-graded if we make a choice of critical point p0 to have index zero; otherwise, CF has an ‘‘affine’’ Z-grading. The map maps CFi into CFi1 . To show that is well defined, and to show that = 0, one must show that the zero-dimensional spaces M(p, q) are compact, and that the ends of the one-dimensional spaces M(p, r) correspond bijectively to broken trajectories, as in the finitedimensional case. Both of these desired properties hold, under the Morse–Smale conditions; but this is a very special feature of the specific problem. Without the hypothesis that 2 (W) is zero, additional noncompactness can arise from the following ‘‘bubbling’’ phenomenon. There could be a sequence of solutions ui 2 M(p, q) to eqns [6], and a point (s0 , t0 ) in R S1 , such that for suitable constants i converging to zero, the rescaled solutions ~i ð; Þ ¼ ui ðs0 þ i ; t0 þ i Þ u converge on compact subsets of the plane R 2 to a ~ : CP1 ! W, nonconstant pseudoholomorphic map u or more precisely a solution of the equation @~ u @~ u þ J t0 ¼0 @ @ (In the original coordinates, the derivatives of the ui would grow like 1=i near (s0 , t0 ).) A pseudoholomorphic sphere always has nontrivial homology class (and therefore nontrivial homotopy class); so this sort of noncompactness does not occur when 2 (W) = 0. Granted the compactness results, the proof that = 0 runs as before, and we can construct a Floer homology group, HF ¼ kerðÞ=imðÞ Unlike the Morse homology of the energy functional, the Floer homology does not yield the ordinary homology of B. To compute it, one first shows that it depends only on the symplectic manifold (W, !), not on the choice of Hamiltonian Ht or metrics gt : this step is similar to the proof that the finite-dimensional Morse homology H (f ) does not depend on the Morse function. Once one has
Floer Homology
established this independence, HF can be computed by examining a special case. Floer did this by taking the Hamiltonian to be independent of t and equal to a small negative multiple h of a fixed Morse function h : W ! R on the symplectic manifold. If the multiple 2 R is small enough, the only fixed points of 1 are the stationary points of the flow, and these are exactly the critical points of h. Furthermore the only index-1 solutions of eqn [6] for small are the solutions u(s, t) with no t dependence; and these are the solutions of du=ds = rh, the downward gradient flow of h, scaled by . In this case therefore, the Floer complex CF is precisely the Morse complex C (h) of the Morse function h, and Theorem 1 yields: Theorem 5 For a periodic, time-dependent Hamiltonian Ht on a closed symplectic manifold (W, !) with 2 (W) = 0, the Floer homology HF is isomorphic to the ordinary homology of W with F2 coefficients, H (W; F2 ). Because the generators of CF correspond to fixed points p of 1 such that the path t (p) is null homotopic, the number of these fixed points is not less than the P dimension of HF , and therefore not less than i dim Hi (W; F2 ) because of the above result. The sum of the mod 2 Betti numbers is at least as large as the sum of the ordinary Betti numbers (the dimensions of the rational homology groups); so one deduces, following Floer, Corollary 6 The Arnol’d conjecture (Conjecture 3) holds for symplectic manifolds (W, !) satisfying the additional condition 2 (W) = 0. Orientations can be introduced rather as in the case of finite-dimensional Morse theory, allowing one to define Floer groups with arbitrary coefficients. The Arnol’d conjecture is now known to hold in complete generality, without the hypothesis on 2 . The proof has been achieved by successive extensions of the Floer homology technique. When 2 (W) is nonzero, the space B is not simply connected. The first complication that arises is that the symplectic action functional f0 , and therefore f also, is multivalued. This is not an obstacle initially, because rf is still well defined, and the spaces M(p, q) of gradient trajectories can still be assumed to satisfy the Morse–Smale condition: this is the type of Morse theory considered by Novikov, as mentioned above. Because 1 (B) is nontrivial, M(p, q) is a union of parts Mz (p, q), one for each homotopy class of paths from p to q. For each homotopy class z, we have the index indexz (p, q), which is the dimension of Mz (p, q).
361
The spaces Mz (p, q) may now have additional noncompactness, due to the presence of pseudo~ : CP1 ! W. The simplest holomorphic spheres u manifestation is when a sequence ui in Mz (p, q) ‘‘bubbles off’’ a single such sphere at a point (s0 , t0 ), and converges elsewhere to a smooth trajectory u0 in Mz0 (p, q), belonging to a different homotopy class. ~. Because Let be the homology class of the sphere u the sphere has positive area, the pairing of with the de Rham class [!] is positive: h[!], i > 0. The indices are related by indexz0 ðp; qÞ ¼ indexz ðp; qÞ 2hc1 ðWÞ; i where c1 (W) 2 H 2 (W; Z) is the first Chern class of a compatible almost-complex structure. The symplectic manifold is said to be ‘‘monotone’’ if, in real cohomology, c1 (W) is a positive multiple of [!]. In the monotone case, we always have indexz0 (p, q) < indexz (p, q), and no bubbling off can occur for trajectory spaces Mz (p, q) of index 2 or less: the above formula either makes Mz0 (p, q) a space of negative dimension (in which case it is empty) or a zero-dimensional space (in which case one has to exploit an additional transversality argument, to show that the holomorphic spheres belonging to classes with hc1 (W), i = 1 cannot intersect one of the loops up in W). Since the construction of HF involves only the trajectories of indices 1 and 2, the construction goes through with minor changes. Because indexz (p, q) depends on the path z, the group HF will no longer be Z-graded: the grading is defined only modulo 2d, where d is the smallest nonzero value of hc1 (W), i for spherical classes . In the case that W is not monotone, additional techniques are needed to deal with the essential noncompactness of the trajectory spaces. These techniques involve (amongst other things) multivalued perturbations on orbifolds – a strategy that requires the use of rational coefficients in order to perform the necessary averaging. For this reason, in the monotone case, the Arnol’d conjecture is known to hold only in its original form: with the ordinary (rational) Betti numbers. To address Question 4 for Lagrangian intersections, a closely related Floer homology theory is used. Assume L is connected, and introduce the space of smooth paths joining L to L0 : ðW; L; L0 Þ ¼ fu : ½0; 1 ! W j uð0Þ 2 L; uð1Þ 2 L0 g Fix a point x0 in L, and let u0 be the path u0 (t) = t (x0 ). Let B be the connected component
362 Floer Homology
of (W; L, L0 ) containing u0 . On B we have a symplectic action functional, defined as Z f ðuÞ ¼ v ð!Þ ½0;1 ½0;1
where v : [0, 1] [0, 1] ! W is a path in B with v(0, t) = u0 (t) and v(1, t) = u(t). The symplectic action is single valued if 2 (W, L) is trivial (even though this condition does not guarantee that B is simply connected). The critical points of f correspond to constant paths whose image in W is an intersection point of L and L0 (though not all such constant paths belong to the connected component B). If we fix a one-parameter family of compatible metrics gt and almost-complex structures Jt on W, then we can consider the downward gradient trajectories of the functional. These are maps u : R ½0; 1 ! W satisfying the Cauchy–Riemann equation @u @u þ Jt ¼0 @s @t with boundary conditions u(s, 0) 2 L and u(s, 1) 2 L0 . With coefficients F2 , a Morse complex can be constructed much as in the case just considered. If 2 (W, L) is trivial, then the Floer homology group HF obtained as the homology of this Morse complex is isomorphic to H (L; F2 ); and as a corollary, Question 4 has an affirmative answer in this case. Without the hypothesis that 2 (W, L) is trivial, one does not expect an affirmative answer to Question 4 in all cases. There is a ‘‘monotone’’ case, in which HF can always be defined; but it is not always isomorphic to H (L; F2 ): instead, there is a spectral sequence relating the two. In the general case, there is once again the need to use rational coefficients in place of mod 2 coefficients, in order to deal with the orbifold nature of the trajectory spaces that appear. This raises the question of orientability for the trajectory spaces. In contrast to the Morse theory for Hamiltonian diffeomorphisms, there is an obstruction to orientability, involving spin structures on L and W. Even when the trajectory spaces are orientable, there are further obstructions to the existence of a Morse differential satisfying = 0. The theory of these obstructions is developed in Fukaya et al. (2000). There are still open questions in this area.
Instanton Floer Homology A ‘‘Floer homology theory’’ for 3-manifolds should assign to each 3-manifold Y (satisfying perhaps some
additional topological requirements) a group, say HF(Y). Furthermore, given a four-dimensional cobordism W from Y1 to Y2 , the theory should provide a corresponding homomorphism of groups, from HF(Y1 ) to HF(Y2 ). These homomorphisms should satisfy the natural composition law for composite cobordisms. One can formulate this by considering the category in which an object is a closed, connected, oriented 3-manifold Y, and in which the morphisms from Y1 to Y2 are the oriented four-dimensional cobordisms, considered up to diffeomorphism. A Floer homology theory is then a functor from this category (perhaps with some additional decorations or restrictions) to the category of groups. Such a functor was constructed by Floer (1988a), at least for the full subcategory of homology 3-spheres (manifolds Y with H1 (Y; Z) = 0). We outline the construction. Let P ! Y be a principal SU(2) bundle (necessarily trivial). Let A denote the space of SU(2) connections in the bundle P, and let A0 be any chosen basepoint in A. Any other A 2 A can be written as A0 þ a, for some 1-form a with values in the adjoint bundle ad(P) whose fiber is the Lie algebra su (2). So A is an affine space, A ¼ A0 þ 1 ðY; adðPÞÞ and we can identify the tangent space TA A at any A with 1 (Y; ad(P)). The Chern–Simons functional is a smooth function CS : A ! R depending on our choice of a reference connection A0 . It can be defined by stating that its derivative at A 2 A is the linear map TA A ! R given by Z a 7! trða ^ FA Þ Y
where FA denotes the curvature of A, as an ad(P)valued 2-form on Y, and tr denotes the trace of a matrix-valued 3-form. If we equip Y with a Riemannian metric, then we have the L2 inner product on 1 (Y; ad(P)), with respect to which we can consider the gradient of CS. The formal downward gradient-flow equation on A is then ðd=dsÞA ¼ FA
½8
where is the Hodge star on Y. If A(s) is a solution defined on an interval [s1 , s2 ], then we can form the corresponding four-dimensional connection A on [s1 , s2 ] Y, and eqn [8] implies that A is a solution of the anti-self-dual Yang–Mills equation, FAþ = 0. Here FAþ is the self-dual part of the curvature 2-form on the cylinder. The critical points of CS are the flat connections on Y, with FA = 0.
Floer Homology
Let G denote the gauge group, by which we mean the group of automorphisms of P. When a trivialization of P is chosen, G becomes the group of smooth maps g : Y ! SU(2). A connection A 2 A is irreducible if its stabilizer in G consists only of the constant gauge transformations 1. The functional CS is invariant only under the identity component of G: it descends to a function CS : A=G ! R=(42 Z). If we choose a basepoint in Y, then the gauge-equivalence classes of flat connections in A are in one-to-one correspondence with conjugacy classes of representations,
: 1 ðYÞ ! SUð2Þ Given representations and , we write M( , ) for the quotient by G of the space of trajectories A(s) which satisfy the gradient-flow equation [8] and which are asymptotic to flat connections belonging to the classes and as s ! 1. There is a purely four-dimensional interpretation of M( , ): it can be identified with the moduli space of solutions A to the anti-self-dual Yang–Mills equation, or ‘‘instantons,’’ on R Y, satisfying the same asymptotic conditions. One defines the ‘‘instanton Floer homology’’ of Y, roughly speaking, as the Morse homology arising from the functional CS. In the case that Y is a homology 3-sphere, Floer defined I (Y) as the homology H (C, ) of a complex C whose generators correspond to the irreducible representations , and whose differential is defined in terms of the onedimensional components of the moduli spaces M( , ). To carry out the construction of I (Y), it is necessary to perturb the functional CS to achieve a Morse–Smale condition: this is done by adding a function f : A ! R defined in terms of the holonomy of connections along families of loops in Y. The group G is not connected, and for given and , the moduli space M( , ) has components differing in dimension by multiples of 8. For this reason, I (Y) is a Z=8-graded homology theory. It is a topological invariant of Y, and is functorial for cobordisms, in the manner outlined at the beginning of this section. Various extensions have been made, to allow the definition of I (Y) for 3-manifolds with nontrivial H1 , and to incorporate the reducible representations. Although there have been some successes (Donaldson 2002), a completely satisfactory general theory has not been constructed. The main difficulties stem from the noncompactness of the instanton moduli spaces (a bubbling phenomenon) and the interaction of this bubbling with the reducible solutions. The instanton Floer theory for 3-manifolds is closely tied up with Donaldson’s polynomial invariants of closed 4-manifolds, which are also defined using the anti-self-dual Yang–Mills equations.
363
Seiberg–Witten Floer Homology Seiberg–Witten Floer homology can be defined in a manner very similar to the instanton case. Again, we start with a Riemannian 3-manifold Y, equipped now with a spinc structure s: a rank-2 Hermitian vector bundle S ! Y together with a Clifford multiplication : (Y) ! End(S). The configuration space C is defined as the space of pairs (A, ), where A is a spinc connection and is a section of S. In place of the Chern–Simons functional considered above, we have the Chern–Simons–Dirac functional CSD : C ! R defined by Z 1 1 CSDðA; Þ ¼ CSðtrðAÞÞ þ h; DA i d 4 2 Y where tr(A) denotes the connection induced by A on the line bundle 2 S and DA is the Dirac operator for the connection A. The functional is invariant again under the identity component of the gauge group G, which this time is the group of maps g : Y ! S1 , acting as automorphisms of S. The critical points are the solutions (A, ) to the three-dimensional ‘‘Seiberg–Witten equations,’’ 1 2 ðFtrðAÞ Þ
ð Þ0 ¼ 0 DA ¼ 0
in which the subscript 0 denotes the traceless part of the endomorphism. If and are gauge-equivalence classes of critical points, then we write M(, ) for the quotient by G of the space of gradient trajectories from to . As in the instanton case, M(, ) has a fourdimensional interpretation: it is the quotient by the four-dimensional gauge group of a space of solutions (A, F) on R Y to the four-dimensional Seiberg–Witten equations: þ 1
F 2 trðAÞ ðFF Þ0 ¼ 0 Dþ AF ¼ 0 Here F is a section of the summand Sþ of the fourdimensional spinc bundle S = Sþ S , and Dþ A: (Sþ ) ! (S ) is the four-dimensional Dirac operator. The action of the gauge group on C is free except at configurations with = 0. These reducible configurations have an S1 stabilizer. Reducible critical points of CSD correspond to flat connections in the line bundle 2 S. We can now distinguish two cases, according to whether c1 (S) is a torsion class or not. If c1 (S) is not a torsion class, then there are no flat connections in 2 S, so all critical points are irreducible. In this case, there is a straightforward Floer-type Morse theory for the functional CSD on
364 Floer Homology
the space C=G: for generators of our complex we take the gauge-equivalence classes of critical points, and we use the one-dimensional trajectory spaces M(, ) to define the boundary map. The resulting Morse homology group is denoted HM (Y, s). It has a canonical Z=2-grading, and is a topological invariant of Y and its spinc structure. If c1 (S) is torsion, the theory is more complex. There will be reducible critical points, and one cannot exclude these from the Morse complex and still obtain a topological invariant of Y. One may incorporate the reducible critical points in two different ways, that are in a sense dual to one another; and there is a third homology theory that one can define, using the reducibles alone. Thus, one can construct three Floer groups associated to Y with the spinc structure s. The resulting theory closely resembles the Heegaard Floer homology that is described next.
Heegaard Floer Homology and Other Floer Theories Heegaard Floer homology is a Floer homology theory for 3-manifolds that is formally similar to Seiberg–Witten Floer homology, and conjecturally isomorphic to it. Unlike the instanton and Seiberg– Witten theories, its construction, due to Ozsva´th and Szabo´, does not use gauge theory. Instead, one begins with a decomposition of the 3-manifold into two handlebodies with common boundary , and one studies a symplectic manifold sg , the configuration space of g-tuples of points on , where g denotes the genus. The Heegaard Floer groups are then defined by a variant of the construction used for Lagrangian intersections (see the section ‘‘Morse theory and the Arnol’d conjecture’’), applied to a particular pair of Lagrangian tori in sg . As in the case of Seiberg–Witten theory, Heegaard Floer homology assigns to each oriented 3-manifold Y three different Floer groups, HFþ (Y), HF (Y), and HF1 (Y), related by a long exact sequence: ! HFþ ðYÞ ! HF ðYÞ ! HF1 ðYÞ ! HFþ ðYÞ ! The first two groups are dual, in that there is a nondegenerate pairing between HFþ (Y) and HF (Y), where Y denotes the same 3-manifold with opposite orientation. If W is an oriented fourdimensional cobordism from Y1 to Y2 , then there are associated functorial maps F þ ðWÞ : HFþ ðY1 Þ ! HFþ ðY2 Þ F ðWÞ : HF ðY1 Þ ! HF ðY2 Þ F 1 ðWÞ : HF1 ðY1 Þ ! HF1 ðY2 Þ
In addition, if the intersection form of W is not negative semidefinite, there is a map F ðWÞ : HF ðY1 Þ ! HFþ ðY2 Þ As a special case, one can start with a closed 4-manifold X, and consider the cobordism W from S3 to S3 obtained from X by removing two 4-balls. In this case, the map F ðWÞ : HF ðS3 Þ ! HFþ ðS3 Þ encodes a diffeomorphism invariant of the original 4-manifold X. This invariant is conjectured to be equivalent to the Seiberg–Witten invariants of X. Heegaard Floer homology, and its cousin Seiberg– Witten Floer homology, have been applied successfully to settle long-standing problems in topology, particularly questions related to surgery on knots. An example of such an application is the theorem of Kronheimer et al. that one cannot obtain the projective space RP3 by surgery on a nontrivial knot in the 3-sphere. In these and other applications of both Heegaard and Seiberg–Witten Floer homology, two key properties of the homology groups play an important part. The first is a nonvanishing theorem, which shows, for example, that these Floer groups can distinguish S1 S2 from any other manifold with the same homology. The second is a long exact sequence, which relates the Floer groups of the manifolds obtained by three different surgeries on a knot. The latter property is shared by the instanton Floer groups, as was shown by Floer (Braam and Donaldson 1995). Other Floer-type theories have been considered, not all of which arise from a gradient flow, but in which the boundary map of the complex is obtained by counting solutions to a geometric differential equation. At the time of writing, Floer homology is an area of very active development. See also: Four-Manifold Invariants and Physics; Gauge Theoretic Invariants of 4-Manifolds; Gauge Theory: Mathematical Applications; Knot Homologies; Ljusternik– Schnirelman Theory; Minimax Principle in the Calculus of Variations; Moduli Spaces: An Introduction; Seiberg–Witten Theory; Topological Quantum Field Theory: Overview.
Further Reading Braam PJ and Donaldson SK (1995) Floer’s work on instanton homology, knots and surgery. In: The Floer Memorial Volume, Progr. Math., vol. 133, pp. 195–256. Basel: Birkha¨user. Conley C and Zehnder E (1986) A global fixed point theorem for symplectic maps and subharmonic solutions of Hamiltonian equations on tori. In: Nonlinear Functional Analysis and Its Applications, Part 1 (Berkeley, Calif., 1983), Proc. Sympos.
Fluid Mechanics: Numerical Methods Pure Math., vol. 45, pp. 283–299. Providence, RI: American Mathematical Society. Donaldson SK (2002) Floer Homology Groups in Yang–Mills Theory, Cambridge Tracts in Mathematics, vol. 147. (With the assistance of Furuta M and Kotschick D.), Cambridge: Cambridge University Press. Floer A (1988a) An instanton-invariant for 3-manifolds. Communications in Mathematical Physics 118(2): 215–240. Floer A (1988b) Morse theory for Lagrangian intersections. Journal of Differential Geometry 28(3): 513–547. Floer A (1989a) Symplectic fixed points and holomorphic spheres. Communications in Mathematical Physics 120(4): 575–611. Floer A (1989b) Witten’s complex and infinite-dimensional Morse theory. Journal of Differential Geometry 30(1): 207–221. Fukaya K, Oh Y-G, Ohta H, and Ono K (2000) Lagrangian intersection Floer theory – anomaly and obstruction. Preprint. Kronheimer PB and Mrowka TS Floer homology of Seiberg– Witten monopoles (to appear).
365
Kronheimer PB, Mrowka TS, Ozsva´th P and Szabo´ Z Monopoles and lens space surgeries. Annals of Mathematics (to appear). Milnor J (1963) Morse Theory. Based on lecture notes by Spivak M and Wells R, Annals of Mathematics Studies, vol. 51. Princeton, NJ: Princeton University Press. Ozsva´th P and Szabo´ Z Holomorphic triangles and invariants for smooth four-manifolds. Advances in Mathematics (to appear). Ozsva´th P and Szabo´ Z (2004) Holomorphic disks and topological invariants for closed three-manifolds. Annals of Mathematics (2) 159(3): 1027–1158. Poincare´ H (1993) New Methods of Celestial Mechanics, vol. 1. History of Modern Physics and Astronomy, vol. 13. Periodic and asymptotic solutions, Translated from the French, Revised reprint of the 1967 English translation, With end-notes by V. I. Arnol’d, Edited and with an introduction by Daniel L. Goroff. New York: American Institute of Physics. Witten E (1982) Supersymmetry and Morse theory. Journal of Differential Geometry 17(4): 661–692.
Fluid Mechanics: Numerical Methods J-L Guermond, Universite´ de Paris Sud, Orsay, France ª 2006 Elsevier Ltd. All rights reserved.
The objective of this article is to give an overview of some advanced numerical methods commonly used in fluid mechanics. The focus is set primarily on finite-element methods and finite-volume methods.
Fluid Mechanics Models Let be a domain in Rd (d = 2, 3) with boundary @ and outer unit normal n. is assumed to be occupied by a fluid. The basic equations governing fluid flows are derived from three conservation principles: conservation of mass, momentum, and energy. Denoting the density by , the velocity by u, and the mass specific internal energy by ei , these equations are @t þ r ð uÞ ¼ 0
½1
@t ð uÞ þ r ð u uÞ ¼ r s þ f
½2
@t ð ei Þ þ r ð uei Þ ¼ s : e þ qT r jT
½3
other quantities, say ‘ , 1 ‘ L. These quantities may, for example, be the concentration of constituents in an alloy, the turbulent kinetic energy, the mass fractions of various chemical species by unit volume, etc. All these conservation equations take the following form: @t ð ‘ Þ þ r ð u‘ Þ ¼ q‘ r j‘ ;
1 ‘ L
½4
Henceforth, the index ‘ is dropped to alleviate the notation. The above set of equations must be supplemented with initial and boundary conditions. Typical initial conditions are jt = 0 = 0 , ujt = 0 = u0 , and jt = 0 = 0 . Boundary conditions are usually classified into two types: the essential boundary conditions and the natural boundary conditions. Natural conditions impose fluxes at the boundary. Typical examples are ðs n þ R uÞj@ ¼ au ðjT n þ rT ei Þj@ ¼ aT and ðj n þ r Þj@ ¼ a
where s is the stress tensor, e = (1=2)(ru þ ru)T is the strain tensor, f is a body force per unit mass (gravity is a typical example), qT is a volume source (it may model chemical reactions, Joule effects, radioactive decay, etc.), and jT is the heat flux. In addition to the above three fundamental conservation equations, one may also have to add L equations that account for the conservation of
The quantities R, rT , r , au , aT , a are given. Essential boundary conditions consist of enforcing boundary values on the dependent variables. One typical example is the so-called no-slip boundary condition: uj@ = 0. The above system of conservation laws is closed by adding three constitutive equations whose purpose is to relate each field s, jT , and j to the fields
, u, and . They account for microscopic properties of the fluid and thus must be frame-independent. Depending on the constitutive equations and
366 Fluid Mechanics: Numerical Methods
adequate hypotheses on time and space scales, various models are obtained. An important class of fluid model is one for which the stress tensor is a linear function of the strain tensor, yielding the socalled Newtonian fluid model:
however, it is important to focus on simplified models.
s ¼ ðp þ r uÞI þ 2 e
From the above considerations we now extract a small set of elementary problems which constitute the building blocks of most numerical methods in fluid mechanics.
½5
Here p is the pressure, I is the identity matrix, and and are viscosity coefficients. Still assuming linearity, common models for heat and solute fluxes consist of assuming jT ¼ rT;
j ¼ Dr
½6
where T is the temperature. These are the so-called Fourier’s law and Fick’s law, respectively. Having introduced two new quantities, namely the pressure p and the temperature T, two new scalar relations are needed to close the system. These are the state equations. One admissible assumption consists of setting = (p, T). Another usual additional hypothesis consists of assuming that the variations in the internal energy are proportional to those in the temperature, that is, @ei = cP @T. Let us now simplify the above models by assuming that is constant. Then, mass conservation implies that the flow is incompressible, that is, r u = 0. Let us further assume that neither , , nor p depend on ei . Then, upon abusing the notation and still denoting by p the ratio p=, the above set of assumptions yields the so-called incompressible Navier–Stokes equations: ru¼0
½7
@t u þ u ru u þ rp ¼ f
½8
As a result, the mass and momentum conservation equations are independent of that of the energy and those of the solutes: cP ð@t T þ u rTÞ r ðrTÞ ¼ 2 e : e þ qT ½9 1 1 @t þ u r r ðDrÞ ¼ q
The Building Blocks
Elliptic Equations
By taking the divergence of the momentum equation [8] and assuming u to be known and renaming p to , one obtains the Poisson equation ¼ f
where f is a given source term. This equation plays a key role in the computation of the pressure when solving the Navier–Stokes equations; see [54b]. Assuming that adequate boundary conditions are enforced, this model equation is the prototype for the class of the so-called elliptic equations. A simple generalization of the Poisson equation consists of the advection–diffusion equation u r r ðrÞ ¼ f
½12
where > 0. Admissible boundary conditions are (@n þ r)j@ = a, r 0, or j@ = a. This type of equation is obtained by neglecting the time derivative in the heat equation [9] or in the solute conservation equation [10]. Mathematically speaking, [12] is also elliptic since its properties (in particular, the way the boundary conditions must be enforced) are controlled by the second-order derivatives. For the sake of simplicity, assume that u = 0 in the above equation and that the boundary condition is j@ = 0, then it is possible to show that solves [12] if and only if minimizes the functional Z J ð Þ ¼ ðjr j2 f Þ dx
½10
Another model allowing for a weak dependency of on the temperature, while still enforcing incompressibility, consists of setting = 0 (1 (T T0 )). If buoyancy effects induced by gravity are important, it is then possible to account for them by setting f = 0 g(1 (T T0 )), where g is the gravitational acceleration, yielding the so-called Boussinesq model. Variations on these themes are numerous and a wide range of fluids can be modeled by using nonlinear constitutive laws and nonlinear state laws. For the purpose of numerical simulations,
½11
where j j is the Euclidean norm and spans Z 2 H¼ ; jr j dx < 1; j@ ¼ 0
½13
Writing the first-order optimality condition for this optimization problem yields Z Z r r ¼ f
for all 2 H. This is the so-called variational formulation of [12]. When u is not zero, no variational principle holds but a similar way to
Fluid Mechanics: Numerical Methods
reformulate [12] consists of multiplying the equation by arbitrary functions in H and integrating by parts the second-order term to give Z Z ðu rÞ þ r r ¼ f ; 8 2 H ½14
This is the so-called weak formulation of [12]. Weak and variational formulations are the starting point for finite-element approximations. Stokes Equations
Z
qru ¼ 0
½19
Parabolic Equations
Another elementary building block is deduced from [8] by assuming that the time derivative and the nonlinear term are both small. The corresponding model is the so-called Stokes equations,
The class of elliptic equations generalizes to that of the parabolic equations when time is accounted for: @t þ u r r ðrÞ ¼ f ;
jt¼0 ¼ 0
½20
u þ rp ¼ f
½15
Fundamentally, this equation has many similarities with the elliptic equation
ru¼0
½16
þ u r r ðrÞ ¼ f
Assume for the sake of simplicity that the no-slip boundary condition is enforced: uj@ = 0. Introduce the Lagrangian functional Z Lðv; qÞ ¼ ðru : rv qr v f vÞ dx
Set Z 2 X ¼ v; jrvj dx < 1; vj@ ¼ 0 Z 2 M ¼ q; q dx < 1 Then, the pair (u, p) 2 X M solves the Stokes equations if and only if it is a saddle point of L, that is, Lðu; qÞ Lðu; pÞ Lðv; pÞ;
8ðv; qÞ 2 X M
½21
where > 0. In particular, the set of boundary conditions that are admissible for [20] and [21] are identical, that is, it is legitimate to enforce (@n þ r)j@ = a, r 0, or j@ = a. Moreover, solving [21] is always a building block of any algorithm solving [20]. The important fact to remember here is that if a good approximation technique for solving [21] is at hand, then extending it to solve [20] is usually straightforward. Hyperbolic Equations
½17
In other words, the pressure p is the Lagrange multiplier of the incompressibility constraint r u = 0. Realizing this fact helps to understand the nature of the Stokes equations, specially when it comes to constructing discrete approximations. A variational formulation of the Stokes equations is obtained by writing the first-order optimality condition, namely: Z ðru : rv pr v f vÞ dx ¼ 0 8v 2 X Z
equation by arbitrary functions v in X and integrating by parts the Laplacian, and by multiplying the mass equation by arbitrary functions q in M: Z Z ððu ruÞ v þ ru: rv pr vÞ dx ¼ f v ½18
367
When =UL ! 0, where U is the reference velocity scale and L is the reference length scale, [20] degenerates into the so-called transport equation @t þ u r ¼ f
This is the prototypical example for the class of hyperbolic equations. For this equation to be wellposed, it is necessary to enforce an initial condition jt = 0 = 0 and an inflow boundary condition, that is, j@ = a, where @ = {x 2 @; (u n)(x) < 0} is the so-called inflow boundary of the domain. To better understand the nature of this equation, introduce the characteristic lines X(x, s; t) of u(x, t) defined as follows: dt Xðx; s; tÞ ¼ uðXðx; s; tÞ; tÞ
qr u dx ¼ 0 8q 2 M
When the nonlinear term is not zero in the momentum equation, or when this term is linearized, there is no saddle point, but a weak formulation is obtained by multiplying the momentum
½22
Xðx; s; sÞ ¼ x
½23
If u is continuous with respect to t and Lipschitz with respect to x, this ordinary differential equation has a unique solution. Furthermore, [22] becomes dt ½ðXðx; s; tÞ; tÞ ¼ f ðXðx; s; tÞ; tÞ
½24
368 Fluid Mechanics: Numerical Methods
Then ðx; tÞ ¼ 0 ðXðx; t; 0ÞÞ þ
Z
define nn to be the outward normal to Fn , 1 n d þ 1. Define the barycentric coordinates
t
f ðXðx; t; Þ; Þ d
0
provided X(x, t; ) 2 for all 2 [0, t]. This shows that the concept of characteristic curves is important to construct an approximation to [22].
Meshes The starting point of every approximation technique for solving any of the above model problems consists of defining a mesh of on which the approximate solution is defined. To avoid having to account for curved boundaries, let us assume that the domain is a two-dimensional polygon (resp. three-dimensional polyhedron). A mesh of , say T h , is a partition of into small cells, hereafter assumed to be simple convex polygons in two dimensions (resp. polyhedrons in three dimensions), say triangles or quadrangles (resp. tetrahedrons or cuboids). Moreover, this partition is usually assumed to be such that if two different cells have a nonempty intersection, then the intersection is a vertex, or an entire edge, or an entire face. The left panel of Figure 1 shows a mesh satisfying the above requirement. The mesh in the right panel is not admissible.
Finite Elements: Interpolation The finite-element method is foremost an interpolation technique. The goal of this section is to illustrate this idea by giving examples. Let T h = {Km }1mNel be a mesh composed of Nel simplices, that is, triangles in two dimensions or tetrahedrons in three dimensions. Consider the following vector spaces of functions: vhjK 2 Pk ; 1 m Nel g Vh ¼ fvh 2 C0 ðÞ; m
½25
where Pk denotes the space of polynomials of global degree at most k. Vh is called a finite-element approximation space. We now construct a basis for Vh . Given a simplex Km in R d , let vn be a vertex of Km , let Fn be the face of Km opposite to vn , and
Figure 1 Admissible (left) and nonadmissible (right) meshes.
n ðxÞ ¼ 1
ðx vn Þ nn ; ðvl vn Þ nn
1ndþ1
½26
where vl is an arbitrary vertex in Fn (the definition of n is clearly independent of vl provided vl belongs to Fn ). The barycentric coordinate n is an affine function; it is equal to 1 at vn and vanishes on Fn ; its level sets are hyperplanes parallel to Fn . The barycenter of Km has barycentric coordinates 1 1 ;...; dþ1 dþ1 The barycentric coordinates satisfy the following properties: for all x 2 Km , 0 n (x) 1, and for all x 2 Rd , dþ1 X n¼1
n ðxÞ ¼ 1
and
dþ1 X
n ðxÞðx vn Þ ¼ 0
n¼1
Consider the set of nodes {an,m }1nnsh of Km with barycentric coordinates i0 id ;...; ; 0 i0 ; . . . ; id k; i0 þ þ id ¼ k k k These points are called the Lagrange nodes of Km . It is clear that there are nsh = (1=2)(k þ 1)(k þ 2) of these points in two dimensions and nsh = (1=6) (k þ 1)(k þ 2)(k þ 3) in three dimensions. It is remarkable that nsh =Sdim Pk . Let {b1 , . . . , bN } = Km 2T h {a1,m , . . . , ansh ,m } be the set of all the Lagrange nodes in the mesh. For Km 2 T h and n 2 {1, . . . , nsh }, let j(n, m) 2 {1, . . . , N} be the integer such that an, m = bj(n, m) ; j(n, m) is the global index of the Lagrange node an, m . Let {’1 , . . . , ’N } be the set of functions in Vh defined by ’i (bj ) = ij , then it can be shown that f’1 ; . . . ; ’N g is a basis for Vh
½27
The functions ’i are called global shape functions. An important property of global shape functions is that their supports are small sets of cells. More precisely, let i 2 {1, . . . , N} and let V i = {m; 9n; i = j(n, m)} be the set of cell indices to which the S node bi belongs, then the support of ’i is m2V i Km . For k = 1, it is clear that ’i j Km = n for all m 2 V i and all n such that i = j(n, m), and ’i j Km = 0 otherwise. The graph of such a shape function in two dimensions is shown in the left panel of Figure 2. For k = 2, enumerate from 1 to d þ 1 the vertices of Km , and enumerate from d þ 2 to nsh the Lagrange nodes located at the midedges. For a midedge node of index d þ 2 n nsh , let b(n), e(n) 2 {1, . . . , d þ 1} be the two indices of the two Lagrange
Fluid Mechanics: Numerical Methods
369
Figure 2 Two-dimensional Lagrange shape functions: piecewise P1 (left) and piecewise P2 (center and right).
nodes at the extremities of the edge in question. Then, the restriction to Km of a P2 shape function ’i is n ð2n 1Þ; if 1 n d þ 1 ’ijKm ¼ ½28 4bðnÞ eðnÞ ; if d þ 2 n nsh Figure 2 shows the graph of two P2 shape functions in two dimensions. Once the space Vh is introduced, it is natural to define the interpolation operator 3 v 7! h : C0 ðÞ
N X
vðbi Þ’i 2 Vh
½29
i¼1
This operator is such that for all continuous functions v, the restriction of h (v) to each mesh cell is a polynomial in Pk and h (v) takes the same values as v at the Lagrange nodes. Moreover, setting h = maxKm 2T h diam(Km ), and defining Z 1=p p jrj dx for 1 p < 1 krkLp ¼
the following approximation holds: kv h ðvÞkLp þ hkrðv h ðvÞÞkLp chkþ1 kvkCkþ1 ðÞ
½30
where c is a constant that depends on the quality of the mesh. More precisely, for Km 2 T h , let Km be the diameter of the largest ball that can be inscribed into Km and let hKm be the diameter of Km . Then, c depends on = maxKm 2T h hKm =Km . Hence, for the mesh to have good interpolation properties, it is recommended that the cells be not too flat. Families of meshes for which is bounded uniformly with respect to h as h ! 0 are said to be shape-regular families. The above example of finite-element approximation space generalizes easily to meshes composed of quadrangles or cuboids. In this case, the shape functions are piecewise polynomials of partial
degree at most k. These spaces are usually referred to as Q k approximation spaces.
Finite Elements: Approximation We show in this section how finite-element approximation spaces can be used to approximate some model problems exhibited in the section ‘‘Building blocks.’’ Advection–Diffusion
Consider the model problem [21] supplemented with the boundary condition (@n þ r)j@ = g. Assume > 0, þ (1=2)r u 0, and r 0. Define Z að; Þ ¼ ðð þ rÞ þ r r Þdx Z r ds þ @
Then, the weak formulation of [21] is: seek 2 H (H defined in [13]) such that for all 2 H Z Z að; Þ ¼ f dx þ g ds ½31
@
Using the approximation space Vh defined in [25] together with the basis defined in [27], we seek an approximate P solution to the above problem in the form h = N i = 1 Ui ’i 2 Vh . Then, a simple way of approximating [31] consists of seeking U = (U1 , . . . , UN )T 2 RN such that for all 1 i N Z Z aðh ; ’i Þ ¼ f ’i dx þ g’i ds ½32
@
This problem finally amounts to solving the following linear system: AU ¼ F
½33
370 Fluid Mechanics: Numerical Methods
where Aij = a(’j , ’i ) and Z Z Fi ¼ f ’i dx þ
g’i ds
k h kLp þ hkrð h ÞkLp chkþ1 kkCkþ1 ðÞ
½34
where, in addition to depending on the shape regularity of the mesh, the constant c also depends on , , and . Stokes Equations
The line of thought developed above can be used to approximate the Navier–Stokes problem [15]–[16]. Let us assume that the nonlinear term u ru is linearized in the form v ru, where v is known. Let T h be a mesh of , and assume that finite-element approximation spaces have been constructed to approximate the velocity and the pressure, say X h and Mh . Assume for the sake of simplicity that X h X and Mh M. Assume that bases for X h and Mh are at hand, say {’1 , . . . , ’Nu } and { 1 , . . . , Np }, respectively. Set Z aðu; jÞ ¼ ððv ruÞ j þ ru : rjdx
Z bðv; Þ ¼
r vdx
Then, PNu we seek an approximate velocity uh = U j and an approximate pressure ph = PiN=p 1 i i k = 1 Pk k such that for all i 2 {1, . . . , Nu } and all k 2 {1, . . . , Np } the following holds: aðuh ; j i Þ þ bðj i ; ph Þ ¼
bðuh ;
0
+1
–1
0
+1
+1
–1
0
+1
–1
0
0
+1
–1
0
+1
–1
–1
0
+1
–1
0
+1
+1
–1
0
+1
–1
0
0
+1
–1
0
+1
–1
@
The above approximation technique is usually referred to as the Galerkin method. The following error estimate can be proved:
and
–1
kÞ
Z
¼0
f j i dx
½35
½36
Define the matrix A 2 R Nu , Nu such that Aij = a(j j , j i ). Define the matrix B 2 R Np , Nu such that Bki = b(j i , k ). Then, the above problem can be recast into the following partitioned linear system: F A BT U ¼ ½37 P 0 B 0 R where the vector F 2 RNu is such that Fi = f j i . An important aspect of the above approximation technique is that, for the linear system to be
Figure 3 The P1 =P1 finite element: the mesh (left); one pressure spurious mode (right).
invertible, the matrix BT must have full row rank (i.e., B has full column rank). This amounts to R qh r vh dx 9h > 0; inf sup h ½38 qh 2Mh v 2X kvh kX kqh kM h h where kvh k2X ¼
Z
jrvh j2 dx;
kqh k2M ¼
Z
q2h dx
This nontrivial condition is called the Ladyzˇenskaja– Babusˇka–Brezzi condition (LBB) in the literature. For instance, if P1 finite elements are used to approximate both the velocity and the pressure, the above condition does not hold, since there are nonzero R pressure fields qh in Mh such that qh r vh dx = 0 for all vh in X h . Such fields are called spurious pressure modes. An example is shown in Figure 3. The spurious function alternatively takes the values 1, 0, and þ1 at the vertices of the mesh so that its mean value on each cell is zero. Couples of finite-element spaces satisfying the LBB condition are numerous. For instance, assuming k 2, using Pk finite elements to approximate the velocity and Pk1 finite elements to approximate the pressure is acceptable. Likewise, using Q k elements for the velocity and Q k1 elements for the pressure on meshes composed of quadrangles or cuboids is admissible. Approximation techniques for which the pressure and the velocity degrees of freedom are not associated with the same nodes are usually called staggered approximations. Staggering pressure and velocity unknowns is common in solution methods for the incompressible Stokes and Navier–Stokes equations; see also the subsection ‘‘Stokes equations.’’
Finite Volumes: Principles The finite-volume method is an approximation technique whose primary goal is to approximate conservation equations, whether time dependent or
Fluid Mechanics: Numerical Methods
not. Given a mesh, say T h = {Km }1mNel , and a conservation equation @t þ r Fð; r; x; tÞ ¼ f
½39
( = 0 if the problem is time independent and = 1 otherwise), the main idea underlying every finitevolume method is to represent the approximate solution by its mean values over the mesh cells (K1 , . . . , KNel)T 2 RNel and to test the conservation equation by the characteristic functions of the mesh cells {1K1 , . . . , 1KNel}. For each cell Km 2 T h , denote by nKm the outward unit normal vector and denote by F m the set of the faces of Km . The finite-volume approximation to [39] consists of seeking (KP , . . . , KNel)T 2 1 Nel el R such that the function h = N m = 1 Km 1K1 m satisfies the following: for all 1 m Nel Z X m; jKm j dt Km ðtÞ þ Fh ðh ; rh h ; tÞ ¼ f dx ½40 K
2F m
where jKm j ¼
Z
dx K
rh h is an approximation of r, and Fhm, is an approximation of Z Fð; r; x; tÞ nKm ds
The precise definition of the so-called approximate flux Fhm, depends on the nature of the problem (e.g., elliptic, parabolic, hyperbolic, saddle point) and the desired accuracy. In general, the approximate fluxes are required to satisfy the following two important properties: 1. Conservativity: for Km , Kl 2 T h such that = Km \ Kl , Fhm, = Fhl, . 2. Consistency: let be the solution to [39], and set Z 1KNel Z 1K1 dx þ þ dx h ¼ jK1 j K1 jKNel j KN el
then Fhm; ð
h ;rh
h ;tÞ !
The quantity m; F ð h ; r h h
The discretization technique described above is sometimes referred to as cell-centered finite-volume method. Another method, called vertex-centered finite volume method, consists of using the characteristic functions associated with the vertices of the mesh instead of those associated with the cells.
Finite Volumes: Examples In this section we illustrate the ideas introduced above. Three examples are developed: the Poisson equation, the transport equation, and the Stokes equations. Poisson Problem
Consider the Poisson equation [11] equipped with the boundary condition @n j@ = a. To avoid technical details, assume that = [0, 1]d . Let Kh be a mesh of composed of rectangles (or cuboids in three dimensions). The flux function is F(, r, x) = r; hence, Fhm, mustR be a consistent conservative approximation of nKm r ds. Let be an interior face of the mesh and let Km , Kl be the two cells such that = Km \ Kl . Let xKm , xKl be the barycenters of Km and Kl , respectively. Then, an admissible formula for the approximate flux is Fhm; ¼
j j ðKl Km Þ jxKm xKl j
½41
R where j j = ds. The consistency error is O(h) in general, and is O(h2 ) if the mesh is composed of identical cuboids. The conservativity is evident. If is part of @, an admissible R formula for the approximate flux is Fhm, = a ds. Then, upon defining F iKm = F Kmn@ and F @Km = F Km \ @, the finite-volume approximation of the Poisson problem is: seek h 2 RNel such that for all 1 m Nel X m; Z X Z Fh ¼ f dx þ a ds ½42 Km
2F iKm
Z
371
2F @Km
Fð ; r ; x;tÞ nds as h ! 0
Transport Equation
h ; tÞ
Z
Fð ; r ; x; tÞ n ds
is called the consistency error. Note that [40] is a system of ordinary differential equations. This system is usually discretized in time by using standard time-marching techniques such as explicit Euler, Runge–Kutta, etc.
Consider the transport equation @t þ r ðuÞ ¼ f jt¼0 ¼ 0 ;
j@ ¼ a
½43 ½44
[0, T]). Let T h where u(x, t) is a given field in C1 ( be a mesh of . For the sake of simplicity, let us use the explicit Euler time-stepping to approximate [40].
372 Fluid Mechanics: Numerical Methods
Let N be positive integer, set t = T=N, set tn = nt for 0 n N, and partition [0, T] as follows: ½0; T ¼
N1 [
½tn ; tnþ1
n¼0
Denote by nh 2 RNel the finite-volume approximation of h (tn ). Then, [40] is approximated as follows: X m; jKm j nþ1 ðKm nKm Þ þ Fh ðh ; rh h ; tn Þ t 2F m Z ¼ f ðx; tn Þ dx
Let T h be a mesh of composed of triangles (or tetrahedrons). All the angles in the triangulation are assumed to be acute so that, for all K 2 T h , the intersection of the orthogonal bisectors of the sides of K, say xK , is in K. We propose a finite-volume approximation for the velocity and a finite-element approximation for the pressure. Let {e1 , . . . , ed } be a Cartesian basis for Rd . Set 1kKm = 1Km ek for all 1 m Nel and 1 k d; then define n o X h ¼ span 11K1 ; . . . ; 1dK1 ; . . . ; 11KN ; . . . ; 1dKN el
½45
K
R where 0Km = Km 0 dx. The approximate flux Fhm, must be a consistent conservative approximation of R (u nKm ) ds. Let be a face of the mesh and let Km , Kl be the two cells such that = Km \ Kl (note that if is on @, belongs to one cell only and we set Km = Kl ). If is on @ , set Z Fhm; ¼ ðu nKm Þa ds ½46
R If is not on @ , set unm, = (u nKm )ds and define ( n n Km um; if unm; 0 m; Fh ¼ ½47 nKl unm; if unm; < 0
The above choice for the approximate flux is usually called the upwind flux. It is consistent with the analysis that has been done for [22], that is, information flows along the characteristic lines of the field u; see [24]. In other words, the updating of nþ1 km must be done by using the approximate values nh coming from the cells that are upstream the flow field. An important feature of the above approximation technique is that it is L1 -stable, in the sense that max0nN;1mNel nKm cðu0 ; f Þ if the two mesh parameters t and h satisfy the so-called Courant–Friedrichs–Levy (CFL) condition kukL1 t=h c( ), where c( ) is a constant that depends on the mesh regularity parameter = maxKm 2T h hKm =Km . In one dimension, c( ) = 1. Stokes Equations
To finish this short review of finite-volume methods, we turn our attention to the Stokes problem (15)–(16) equipped with the homogeneous Dirichlet boundary condition uj@ = 0.
el
Let {b1 , . . . , bNv } be the vertices of the mesh, and let {’1 , . . . , ’Nv } be the associated piecewise linear global shape functions. Then, set (see the section ‘‘Finite elements: interpolation’’) Nh ¼ spanf’1 ; . . . ; ’Nv g Z Mh ¼ fq 2 Nh ; q dx ¼ 0g
The approximate problem consists of seeking (uK1 , . . . , uKNel) 2 RdNel and ph 2 Mh such that for all 1 m Nel , 1 k d, and all 1 i Nv ,
Z X k 1kKm F m; þ c 1 ; p 1kKm f dx ½48 h ¼ Km h Km
2F m
cðuKm ; ’i Þ ¼ 0
½49
where cðvKm ; ph Þ ¼
Z Km
vKm rph dx
Moreover,
F m; h
¼
8 j j > > ðu uKl Þ > < jx x j Km
if ¼ Km \ Kl
> > > :
if ¼ Km \ @
m
l
j j uK dðxm ; Þ m
where d(xKm , ) is the Euclidean distance between xKm and . This formulation yields a linear system with the same structure as in [37]. Note in particular that cðvh ; ph Þ ¼ krph kL1 vh 2X h kvh kL1 sup
½50
Since the mean value of ph is zero, krph kL1 is a norm on Mh . As a result, an inequality similar to [38] holds. This inequality is a key step to proving that the linear system is wellposed and the approximate solution converges to the exact solution of (15)–(16).
373
Fluid Mechanics: Numerical Methods
Projection Methods for Navier–Stokes In this section we focus on the time approximation of the Navier–Stokes problem:
Furthermore, for all sequences t = (0 , 1 , . . . , N ), define ?;nþ1 ¼
@t u u þ u ru þ rp ¼ f
½51a
ru¼0
½51b
uj@ ¼ 0
½51c
ujt¼0 ¼ u0
½51d
where f is a body force and u0 is a solenoidal velocity field. There are numerous ways to discretize this problem in time, but, undoubtedly, one of the most popular strategies is to use projection methods, sometimes also referred to as Chorin–Temam methods. A projection method is a fractional-step timemarching technique. It is a predictor–corrector strategy aiming at uncoupling viscous diffusion and incompressibility effects. One time step is composed of three substeps: in the first substep, the pressure is made explicit and a provisional velocity field is computed using the momentum equation; in the second substep, the provisional velocity field is projected onto the space of incompressible (solenoidal) vector fields; in the third substep, the pressure is updated. Let q > 0 be an integer and approximate the time derivative of u using a backward difference formula of order q. To this end, introduce a positive integer N, set t = T=N, set tn = nt for 0 n N, and consider a partitioning of the time interval in the form ½0; T ¼
N1 [
n
½t ; t
nþ1
For all sequences vt = (v0 , v1 , . . . , vN ), set q1 X
j vnj
½52
j¼0
where q 1 n N 1. The coefficients j are such that q1
X 1 ðq uðtnþ1 Þ j uðtnj ÞÞ t j¼0 is a qth-order backward difference formula approximating @t u(tnþ1 ). For instance, Dð1Þ vnþ1 ¼ vnþ1 vn Dð2Þ vnþ1 ¼ 32vnþ1 2vn þ 12vn1
j nj
½53
j¼0
Pq1 so that j = 0 j p(tnj ) is a (q 1)th-order extrapolation of p(tnþ1 ). For instance, p?, nþ1 = 0 for q = 1, p?, nþ1 = pn for q = 2, and p?, nþ1 = 2pn pn1 for q = 3. Finally, denote by (u ru)?, nþ1 a qthorder extrapolation of (u ru)(tnþ1 ). For instance, n for q ¼ 1 u run ?;nþ1 ðu ruÞ ¼ 2un run un1 run1 if q ¼ 2 A general projection algorithm is as follows. Set ~ 0 = u0 and l = 0 for 0 l q 1. If q > 1; u ~1 , . . . , u ~q1 , p?, q and (u ru)?, q have assume that u ~ nþ1 been initialized properly. For n q 1, seek u nþ1 ~ j@ = 0 and such that u q1
X j DðqÞ nþ1 ~ nj ~ unþ1 þ r p?;nþ1 þ u t t j¼0 ¼ Snþ1
!
½54a
where Snþ1 = f (tnþ1 ) (u ru)?, nþ1 . Then solve ~ nþ1 ; nþ1 ¼ r u
@n nþ1 j@ ¼ 0
½54b
Finally, update the pressure as follows: pnþ1 ¼
q nþ1 ~ nþ1 þ p?;nþ1 r u t
½54c
The algorithm [54a–c] is known in the literature as the rotational form of the pressure-correction method. Upon denoting ut = (u(t0 ), . . . , u(tN )) and pt = (p(t0 ), . . . , p(tN )), the above algorithm has been proved to yield the following error estimates:
n¼0
DðqÞ vnþ1 ¼ q vnþ1
q1 X
~ t k‘2 ðL2 Þ ct2 kut u ~ t Þk‘2 ðL2 Þ þ kpt pt k‘2 ðL2 Þ ct3=2 krðut u R P n 2 where kt k2‘2 (L2 ) = t N n = 0 j j dx. A simple strategy to initialize the algorithm consists of using D(1) u1 at the first step in [54a]; then using D(2) u2 at the second step, and proceed~1, . . . , u ~q1 have all been ing likewise until u computed. At the present time, projection methods count among the few methods that are capable of solving the timedependent incompressible Navier–Stokes equations in three dimensions on fine meshes within reasonable
374 Fourier Law
computation times. The reason for this success is that the unsplit strategy, which consists of solving DðqÞ nþ1 u unþ1 þ rpnþ1 ¼ Snþ1 t r unþ1 ¼ 0;
unþ1 j@ ¼ 0
½55a ½55b
yields a linear system similar to [37], which usually takes far more time to solve than sequentially solving [54a] and [54b]. It is commonly reported in the literature that the ratio of the CPU time for solving [55a]–[55b] to that for solving [54a–c] ranges between 10 to 30. See also: Compressible Flows: Mathematical Theory; Computational Methods in General Relativity: The Theory; Geophysical Dynamics; Image Processing: Mathematics; Incompressible Euler Equations: Mathematical Theory; Interfaces and Multicomponent Fluids; Magnetohydrodynamics; Newtonian Fluids and Thermohydraulics; Non-Newtonian Fluids; Partial Differential Equations: Some Examples; Variational Methods in Turbulence.
Further Reading Doering CR and Gibbon JD (1995) Applied Analysis of the Navier–Stokes Equations, Cambridge Texts in Applied Mathematics. Cambridge: Cambridge University Press. Ern A and Guermond J-L (2004) Theory and Practice of Finite Elements. Springer Series in Applied Mathematical Sciences, vol. 159. New York: Springer-Verlag. Eymard R, Galloue¨t T, and Herbin R (2000) Finite volume methods. In: Ciarlet PG and Lions JL (eds.) Handbook of Numerical Analysis, vol. VII , pp. 713–1020. Amsterdam: North-Holland. Karniadakis GE and Sherwin SJ (1999) Spectral/hp Element Methods for CFD, Numerical Mathematics and Scientific Computation. New York: Oxford University Press. Rappaz M, Bellet M, and Deville M (2003) Numerical Modeling in Material Science and Engineering. Springer Series in Computational Mathematics, vol. 32. Berlin: Springer Verlag. Temam R (1984) Navier–Stokes Equations. Theory and Numerical Analysis. Studies in Mathematics and its Applications, vol. 2. Amsterdam: North-Holland. Toro EF (1997) Riemann Solvers and Numerical Methods for Fluid Dynamics. A Practical Introduction. Berlin: Springer. Wesseling P (2001) Principles of Computational Fluid Dynamics. Springer Series in Computational Mathematics, vol. 29. Berlin: Springer.
Fourier Law F Bonetto, Georgia Institute of Technology, Atlanta, GA, USA L Rey-Bellet, University of Massachusetts, Amherst, MA, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction In the famous 1822 treatise by Jean Baptiste Joseph Fourier, The´orie analytique de la chaleur, the Discours pre´liminaire opens with: ‘‘Primary causes are unknown to us; but are subject to simple and constant laws, which may be discovered by observation, the study of them being the subject of natural philosophy. Heat, like gravity, penetrates every substance of the universe, its rays occupy all parts of space. The object of our work is to set forth the mathematical laws which this element obeys. The theory of heat will hereafter form one of the most important branches of general physics.’’ After a brief discussion of rational mechanics, he continues with the sentence: ‘‘But whatever may be the range of mechanical theories, they do not apply to the effects of heat. These make up a special order of phenomena, which cannot be explained by the principles of motion and equilibria.’’ Fourier goes on with a thorough description of the phenomenology of heat transport and the derivation of the partial differential equation describing heat transport: the heat equation. A large part of the treatise is
then devoted to solving the heat equation for various geometries and boundary conditions. Fourier’s treatise marks the birth of Fourier analysis. After Boltzmann, Gibbs, and Maxwell and the invention of statistical mechanics in the decades after Fourier’s work, we believe that Fourier was wrong and that, in principle, heat transport can and should be explained ‘‘by the principles of motion and equilibria,’’ that is, within the formalism of statistical mechanics. But well over a century after the foundations of statistical mechanics were laid down, we still lack a mathematically reasonable derivation of Fourier’s law from first principles. Fourier’s law describes the macroscopic transport properties of heat, that is, energy, in nonequilibrium systems. Similar laws are valid for the transport of other locally conserved quantities, for example, charge, particle density, momentum, etc. We will not discuss these laws here, except to point out that in none of these cases macroscopic transport laws have been derived from microscopic dynamics. As Peierls once put it: ‘‘It seems there is no problem in modern physics for which there are on record as many false starts, and as many theories which overlook some essential feature, as in the problem of the thermal conductivity of [electrically] non-conducting crystals.’’
Macroscopic Law Consider a macroscopic system characterized at some initial time, say t = 0, by a nonuniform
Fourier Law
temperature profile T0 (r). This temperature profile will generate a heat, that is, energy current J(r). Due to energy conservation and basic thermodynamics: cv ðTÞ
@ Tðr; tÞ ¼ r J @t
½1
where cv (T) is the specific heat per unit volume. On the other hand, we know that if the temperature profile is uniform, that is, if T0 (r) T0 , there is no current in the system. It is then natural to assume that, for small temperature gradients, the current is given by JðrÞ ¼ ðTðrÞÞrTðrÞ
½2
where (T) is the conductivity. Here we have assumed that there is no mass flow or other mode of energy transport besides heat conduction (we also ignore, for simplicity, any variations in density or pressure). Equation [2] is normally called as Fourier’s law. Putting together eqns [1] and [2], we get the heat equation: cv ðTÞ
@ Tðr; tÞ ¼ r ½ðTÞrT @t
½3
This equation must be completed with suitable boundary conditions. Let us consider two distinct situations in which the heat equation is observed to hold experimentally with high precision: 1. An isolated macroscopic system, for example, a fluid or solid in a domain surrounded by effectively adiabatic walls. In this case, eqn [3] is to be solved subject to the initial condition T(r, 0) = T0 (r) and no heat flux across the boundary of (denoted by @), that is, n(r) rT(r) = 0 if r 2 @ with n the normal vector to @ at r. As t ! 1, the system reaches a stationary state characterized by a uniform determined by the constancy of temperature T the total energy. 2. A system in contact with heat reservoirs. Each reservoir fixes the temperature of some portion (@) of the boundary @. The rest of the boundary is insulated. When the system reaches a stationary state (again assuming no matter flow), its temperature will be given by the solution of eqn [3] with the left-hand side set equal to zero, ~ r ~JðrÞ ¼ r ðrTðrÞÞ ¼0
½4
~ = T for subject to the boundary condition T(r) r 2 (@) and no flux across the rest of the boundary. The simplest geometry for a conducting system is that of a cylindrical slab of height h and cross-
375
sectional area A. It can be either a cylindrical container filled with a fluid or a piece of crystalline solid. In both cases, one keeps the lateral surface of the cylinder insulated. If the top and the bottom of the cylinder are also insulated we are in case (1). If one keeps the top and the bottom in contact with thermostats at temperatures Th and Tb , respectively, this is (for a fluid) the usual setup for a Benard experiment. To avoid convection, one has to make Th > Tb or keep jTh Tb j small. Assuming uniformity in the direction perpendicular to the vertical x-axis one has, in the stationary state, a tempera~ ~ = Tb , T(h) ~ = Th and ture profile T(x) with T(0) ~ T=dx ~ (T)d = const. for x 2 (0, h). In deriving the heat equation, we have implicitly assumed that the system is described fully by specifying its temperature T(r, t) everywhere in . What this means on the microscopic level is that we imagine the system to be in local thermal equilibrium (LTE). Heuristically, we might think of the system as being divided up (mentally) into many little cubes, each large enough to contain very many atoms yet small enough on the macroscopic scale to be accurately described, at a specified time t, as a system in equilibrium at temperature T(r i , t), where r i is the center of the ith cube. For slow variation in space and time, we can then use a continuous description T(r, t). The theory of the heat equation is very developed and, together with its generalizations, plays a central role in modern analysis. In particular, one can consider more general boundary conditions. Here we are interested in the derivation of eqn [2] from first principles. This clearly presupposes, as a first fundamental step, a precise definition of the concept of LTE and its justification within the law of mechanics.
Empirical Argument A theory of heat conduction has as a goal the computation of the conductivity (T) for realistic models, or, at the very least, the derivation of behavior of (T) as a function of T. The early analysis was based on ‘‘kinetic theory.’’ Its application to heat conduction goes back to the works of Clausius, Maxwell, and Boltzmann, who obtained a theoretical p expression for the heat conductivity of ffiffiffiffi gases, T , independent of the gas density. This agrees with experiment (when the density is not too high) and was a major early achievement of the atomic theory of matter. Heat Conduction in Gases
Clausius and Maxwell used the concept of a ‘‘mean free path’’ : the average distance a particle (atom or
376 Fourier Law
molecule) travels between collisions in a gas with particle density . Straightforward analysis gives 1=2 , where an ‘‘effective’’ hard-core diameter of a particle. They considered a gas with temperature gradient in the x-direction and assumed that the gas is (approximately) in local equilibrium with density and temperature T(x). Between collisions, a particle moves a distance carrying a kinetic pffiffiffi energy proportional to T(x) from x to x þ = 3, while in the opposite direction the amount carried is proportional pffiffiffi to T(x þ 3). Taking into pffiffiffiffiaccount the fact that the speed is proportional to T the amount of energy J transported per unit area and time across a plane perpendicular to the x-axis is approximately pffiffiffiffih pffiffiffi i J T TðxÞ Tðx þ 3Þ pffiffiffiffi dT 2 T ½5 dx pffiffiffiffi and so T independent of , in agreement with experiment. It was clear to the founding fathers that starting with a local equilibrium situation the process described above will produce, as time goes on, a deviation from LTE. They reasoned, however, that this deviation from local equilibrium will be small when (=T)dT=dx 1, the regime in which Fourier’s law is expected to hold, and the above calculation should yield, up to some factor of order unity, the right heat conductivity. To have a more precise theory, one can describe the state of the gas through the probability distribution f (r, p, t) of finding a particle in the volume element dr dp around the phase space point (r, p). Here LTE means that p2 f ðr; p; tÞ ’ exp 2mkTðrÞ where m is the mass of the particles. If one computes the heat flux at a point r by averaging the microscopic energy current at r, j = v(1=2mv2 ), over f (r, p, t) then it is only the deviation from local equilibrium which makes a contribution. The result however is essentially the same as eqn [5]. This was shown by Boltzmann, who derived an accurate formula for in gases by using the Boltzmann equation. If one takes from experiment, the above analysis yields a value for , the effective size of an atom or molecule, which turns out to be close to other determinations of the characteristic size of an atom. This gave an evidence for the reality of atoms and the molecular theory of heat. Heat Conduction in Insulating Crystals
In (electrically) conducting solids, heat is mainly transported by the conduction electron. In this case, one can adapt the theory discussed in the previous
section. In (electrically) insulating solids, on the other hand, heat is transmitted through the vibrations of the lattice. In order to use the concepts of kinetic theory, it is useful to picture a solid as a gas of phonons which can store and transmit heat. A perfectly harmonic crystal, due to the fact that phonons do not interact, has an infinite thermal conductivity: in the language of kinetic theory, the mean free path is infinite. In a real crystal, the anharmonic forces produce interactions between the phonons and therefore a finite mean free path. Another source of finite thermal conductivity may be the lattice imperfections and impurities which scatter the phonons. Debye devised a kind of kinetic theory for phonons in order to describe thermal conductivity. One assumes that a small gradient of temperature is imposed and that the collisions between phonons maintain local equilibrium. An elementary argument gives a thermal conductivity analogous to eqn [5] obtained in the last subsection for gases (remembering, however, that the density of phonons is itself a function of T) cv c 2
½6
where, with respect to eqn [5], p has ffiffiffiffi been replaced by cv , the specific heat of phonons, T by c, the (mean) velocity of the phonons, and by c , where is the effective mean free time between phonon collisions. The thermal conductivity depends on the temperature via , and a more refined theory is needed to account for this dependence. This was done by Peierls via a Boltzmann equation for the phonons. In collisions among phonons, the momentum of phonons is conserved only modulo a vector of the reciprocal lattice. One calls ‘‘normal processes’’ those where the phonon momentum is conserved and ‘‘Umklap processes’’ those where the initial and final momenta differ by a nonzero reciprocal lattice vector. Peierls’ theory may be summarized (very roughly) as follows: in the absence of Umklap processes, the mean free path, and thus the thermal conductivity of an insulating solid, is infinite. A success of Peierls’ theory is to describe correctly the temperature dependence of the thermal conductivity. Furthermore, on the basis of this theory, one does not expect a finite thermal conductivity in one-dimensional monoatomic lattices with pair interactions. This seems so far to be a correct prediction, at least in the numerous numerical results performed on various models.
Statistical Mechanics Paradigm: Rigorous Analysis In a rigorous approach to the above arguments, we have to first formulate precisely the problem on a
Fourier Law
mathematical level. It is natural to adapt the standard formalism of statistical mechanics to our situation. To this end, we assume that our system is described by the positions Q and momenta P of a (very large) number of particles, N, with Q = (q1 , . . . , qN ) 2 N , R d , and P = (p1 , . . . , pN ) 2 RdN . The dynamics (in the bulk) is given by a Hamiltonian function H(Q, P). A state of the system is a probability measure (P, Q) on phase space. As usual in statistical mechanics, the value of an observable f (P, Q) will be given by the expected value of f with respect to the measure . In the case of a fluid contained in a region , we can assume that the Hamiltonian has the form " # N X X p2i HðP; QÞ ¼ þ ðqj qi Þ þ uðqi Þ 2m j6¼i i¼1 ¼
N X p2i þ VðQÞ 2m i¼1
½7
where (q) is some short-range interparticle potential and u(qi ) an external potential (e.g., the interaction of the particle with fixed obstacles such as a conduction electron interacting with the fixed crystalline ions). If we want to describe the case in which the temperature at the boundary is kept different in different regions @ , we have to properly define the dynamics at the boundary of the system. A possibility is to use ‘‘Maxwell boundary conditions’’: when a particle hits the wall in @ , it gets reflected and re-emerges with a distribution of velocities m2 mv2 f ðdvÞ ¼ jv j exp dv ½8 x 2kT 2ðkT Þ2 Several other ways to impose boundary conditions have been considered in the literature. The notion of LTE can be made precise here in the so-called hydrodynamic scaling limit (HSL), where the ratio of microscopic to macroscopic scales goes to zero. The macroscopic coordinates r and t are related to the microscopic ones q and , by r = q and t = , that is, if is a cube of macroscopic sides l, then its sides, now measured in microscopic length units, are of length L = 1 l. We then suppose that at t = 0 our system of N = Ld particles is described by an equilibrium Gibbs measure with a temperature T(r) = T( q): roughly speaking, the phase-space ensemble density has the form ( N X
0 ðP; QÞ exp
0 ð qi Þ i¼1 " #) X p2i þ ðqj qi Þ þ uðqi Þ ½9 2m j6¼i
377
where 01 (r) = T0 (r). In the limit ! 0, fixed, the system at t = 0 will be macroscopically in LTE with a local temperature T0 (r) (as already noted, here we suppress the variation in the particle density n(r)). We are interested in the behavior of a macroscopic system, for which 1, at macroscopic times t 0, corresponding to microscopic times = t, = 2 for heat conduction or other diffusive behavior. The implicit assumption then made in the macroscopic description given earlier is that, since the variations in T0 (r) are of order on a microscopic scale, then for 1, the system will, also at time t, be in a state very close to LTE, with a temperature T(r, t) that evolves in time according to Fourier’s law, eqn [1]. From a mathematical point of view, the difficult problem is to prove that the system stays in LTE for t > 0 when the dynamics are given by a Hamiltonian time evolution. This requires proving that the macroscopic system has some very strong ergodic properties, for example, that the only time-invariant measures locally absolutely continuous with respect to the Lebesgue measure are, for infinitely extended spatially uniform systems, of the Gibbs type. This has only been proved so far for systems evolving via stochastic dynamics (e.g., interacting Brownian particles or lattice gases). For such stochastic systems, one can sometimes prove the hydrodynamical limit and derive macroscopic transport equations for the particle or energy density and thus verify the validity of Fourier law. Another possibility, as we already saw, is to use the Boltzmann equation. Using ideas of hydrodynamical space and time scaling described earlier, it is possible to derive a controlled expansion for the solution of the stationary Boltzmann equation describing the steady state of a gas coupled to temperature reservoirs at the top and bottom. One then shows that for 1, being now the ratio =L, the Boltzmann equation for f in the slab has a time-independent solution which is close to a local Maxwellian, corresponding to LTE (apart from boundary layer terms) with a local temperature and density given by the solution of the Navier–Stokes equations which incorporates Fourier’s law as expressed in eqn [2]. The main mathematical problem is in controlling the remainder in an asymptotic expansion of f in power of . This requires that the macroscopic temperature gradient, that is, jT1 T2 j=h, where h = L is the thickness of the slab on the macroscopic scale, be small. Even if this apparently technical problem could be overcome, we would still be left with the question of justifying the Boltzmann equation for such steady states and, of course, it would not tell us anything
378 Fourier Law
about dense fluids or crystals. In fact, the Boltzmann equation itself is really closer to a macroscopic than to a microscopic description. It is obtained in a well-defined kinetic scaling limit in which, in addition to rescaling space and time, the particle density goes to zero, that is, . A simplified model of a crystal is characterized by the fact that all atoms oscillate around given equilibrium positions. The equilibrium positions can be thought of as the points of a regular lattice in Rd , say Zd . Although d = 3 is the physical situation, one can also be interested in the case d = 1, 2. In this situation, Zd with cardinality N, and each atom is identified by its position xi = i þ qi , where i 2 and qi 2 Rd is the displacement of the particle at lattice site i from this equilibrium position. Since interatomic forces in real solids have short range, it is reasonable to assume that the atoms interact only with their nearest neighbors via a potential that depends only on the relative distance with respect to the equilibrium distance. Accordingly, the Hamiltonians that we consider have the general form HðP; QÞ ¼ ¼
X p2 X X i þ Vðqi qj Þ þ Ui ðqi Þ 2m jijj¼1 i2 i X p2 i þ VðQÞ 2m i2
½10
where P = (pi )i2 and analogously for Q. We shall further assume that as jqj ! 1 so do Ui (q) and V(q). The addition of Ui (q) pins down the crystal and ensures that exp [ H(P, Q)] is integrable with respect to dPdQ, and thus the corresponding Gibbs measure is well defined. In this case, in order to fix the temperature at the boundary, one can add a Langevin term to the equation of particles on the boundaries, that is, if i 2 @ the equation for the particle is pffiffiffiffiffiffiffiffiffi _i p_ i ¼ @qi HðP; QÞ pi þ T w ½11 _ i is a standard white noise. Other thermowhere w statting mechanisms can be considered. In this case we can also define LTE using eqn [9] but we run into the same difficulties described above – although the problem is somehow simpler due to the presence of the lattice structure and the fact that the particles oscillate close to their equilibrium points. We can obtain Fourier’s law only by adding stochastic terms, for example, terms like eqn [11], to the equation of motion of every particle and assuming that U(q) and V(q) are harmonic. These added noises can be thought of as an effective description of the chaotic motion generated by the anharmonic terms in U(q) and V(q).
Just how far we are from establishing rigorously the Fourier law is clear from our very limited mathematical understanding of the stationary nonequilibrium state (SNS) of mechanical systems whose ends are, as in the example of the Benard problem, kept at fixed temperatures T1 and T2 . Various models have been considered, for example, models with Hamiltonian [10] coupled at the boundaries with heat reservoirs described by eqns [11]. The best mathematical results one can prove are: the existence and uniqueness of SNS; the existence of a stationary nontrivial heat flow; properties of the fluctuations of the heat flow in the SNS; the central-limit theorem type fluctuations (related to Kubo formula and Onsager relations; and large-deviation type fluctuations related to the Gallavotti–Cohen fluctuation theorem). What is missing is information on how the relevant quantities depend on the size of the system, N. In this context, the heat conductivity can be defined precisely without invoking LTE. To do this, we let ~J be the expectation value in the SNS of the energy or heat current flowing from reservoir 1 to reservoir 2. We then define the conductivity L as ~J=(AT=L), where T=L = (T1 T2 )=L is the effective temperature gradient for a cylinder of microscopic length L and uniform cross section A, and (T) is the limit of L when T ! 0(T1 = T2 = T) and L ! 1. The existence of such a limit with positive and finite is what one would like to prove. See also: Dynamical Systems and Thermodynamics; Ergodic Theory; Interacting Particle Systems and Hydrodynamic Equations; Kinetic Equations; Nonequilibrium Statistical Mechanics: Dynamical Systems Approach; Nonequilibrium Statistical Mechanics: Interaction Between Theory and Numerical Simulations.
Further Reading Ashcroft NW and Mermin ND (1988) Solid State Physics. Philadelphia: Saunders College. Berman R (1976) Thermal Conduction in Solids. Oxford: Clarendon. Bonetto F, Lebowitz JL, and Rey-Bellet L (2000) Fourier’s law: a challenge to theorists. In: Fokas A, Grigoryan A, Kibble T, and Zegarlinsky B (eds.) Mathematical Physics 2000, pp. 128–150. London: Imperial College Press. Brush S (1994) The Kind of Motion We Call Heat. Amsterdam: Elsevier. Debye P (1914) Vortra¨ge u¨ber die Kinetische Theorie der Wa¨rme. Leipzig: Teubner. Fourier JBJ (1822) The´orie Analytique de la Chaleur, Paris: Firmin Didot. (English translation in 1878 by Alexander Freeman. New York: Dover, Reprinted in 2003). Kipnis C and Landim C (1999) Scaling Limits of Interacting Particle Systems. Berlin: Springer.
Fourier–Mukai Transform in String Theory Lepri S, Livi R, and Politi A (2003) Thermal conduction in classical low-dimensional lattices. Physics Reports 377(1): 1–80. Peierls RE (1955) Quantum Theory of Solids. Oxford: Clarendon. Peierls RE (1960) Quantum theory of solids. In: Saint-Aubin Y and Vinet L (eds.) Theoretical Physics in the Twentieth Century, pp. 140–160. New York: Interscience.
379
Rey-Bellet L (2003) Statistical mechanics of anharmonic lattices. In: Advances in Differential Equations and Mathematical Physics, (Birmingham, AL, 2002), Contemporary Mathematics,, vol. 327, pp. 283–298. Providence, RI: American Mathematical Society. Spohn H (1991) Large Scale Dynamics of Interacting Particle. Berlin: Springer.
Fourier–Mukai Transform in String Theory B Andreas, Humboldt-Universita¨t zu Berlin, Berlin, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction The Fourier–Mukai transform has been introduced in the study of abelian varieties by Mukai and can be thought of as a nontrivial algebro-geometric analog of the Fourier transform. Since its original introduction, the Fourier–Mukai transform turned out to be a useful tool for studying various aspects of sheaves on varieties and their moduli spaces, and as a natural consequence, to learn about the varieties themselves. Various links between geometry and derived categories have been uncovered; for instance, Bondal and Orlov proved that Fano varieties, and certain varieties of general type, can be reconstructed from their derived categories. Moreover, Orlov proved a derived version of the Torelli theorem for K3 surfaces and also a structure theorem for derived categories of abelian varieties. Later, Kawamata gave evidence to the conjecture that two birational smooth projective varieties with trivial canonical sheaves have equivalent derived categories, which has been proved by Bridgeland in dimension 3. The Fourier–Mukai transform also enters into string theory. The most prominent example is Kontsevich’s homological mirror-symmetry conjecture. The conjecture predicts (for mirror dual pairs of Calabi–Yau manifolds) an equivalence between the bounded derived category of coherent sheaves and the Fukaya category. The conjecture implies a correspondence between certain self-equivalences (given by Fourier–Mukai transforms) of the derived category and symplectic self-equivalences of the mirror manifold. Besides their importance for geometrical aspects of mirror symmetry, the Fourier–Mukai transforms have also been important for heterotic string compactifications. The motivation for this came from the conjectured correspondence between the
heterotic string and F-theory, which both rely on elliptically fibered Calabi–Yau manifolds. To give evidence for this correspondence, an explicit description of stable holomorphic vector bundles was necessary and inspired a series of publications by Friedman, Morgan, and Witten. Their bundle construction relies on two geometrical objects: a hypersurface in the Calabi–Yau manifold together with a line bundle on it; more precisely, they construct vector bundles using a relative Fourier– Mukai transform. Various aspects and refinements of this construction have been studied by now. For instance, a physical way to understand the bundle construction can be given using the fact that holomorphic vector bundles can be viewed as D-branes and that D-branes can be mapped under T-duality to new D-branes (of different dimensions). We survey aspects of the Fourier–Mukai transform, its relative version and outline the bundle construction of Friedman, Morgan, and Witten. The construction has led to many new insights, for instance, the presence of 5-branes in heterotic string vacua has been understood. The construction also inspired a tremendous amount of work towards a heterotic string phenomenology on elliptic Calabi– Yau manifolds. For the many topics omitted the reader should consult the ‘‘Further reading’’ section.
The Fourier–Mukai Transforms Every object E of the derived category on the product X Y of two smooth algebraic varieties X and Y gives rise to a functor E from the bounded derived category D(X) of coherent sheaves on X to the similar category on Y: E : DðXÞ ! DðYÞ ð F EÞ F 7! E ðFÞ ¼ R^ where , ˆ are the projections from X Y to X and Y, respectively, and denotes the derived tensor product. E (F) is called Fourier–Mukai transform with kernel E 2 D(X Y) (in analogy
380 Fourier–Mukai Transform in String Theory
with the definition of an integral transform with kernel). Note that given a Fourier–Mukai functor E , E (F) is in general a complex having homology in several degrees even if F is a sheaf. Furthermore, a result by Orlov states that if X and Y are smooth projective varieties then any fully faithful functor D(X) ! D(Y) is a Fourier– Mukai functor. In analogy with the Fourier transform, there is a kind of ‘‘convolution product’’ giving the composition of two such functors. More precisely, given smooth algebraic varieties X, Y, Z, and elements E 2 D(X Y) and G 2 D(Y Z), we can define G E 2 D(X Z) by G E ¼ RXZ ðXY E YZ GÞ where XY , YZ , XZ are the projections from X Y Z to the pairwise products giving a natural isomorphism of functors G E ¼ GE Another analogy with the Fourier transform can be drawn. For this, assume that we have sheaves F and G which only have one nonvanishing Fourier– Mukai transform, the ith one i (F) (where i : D(X) ! Coh(Y), F 7! Hi (E (F)); cf. remarks below) in the case of F, and the jth one j (G) in the case of G. Given such sheaves, there is the Parseval formula hþij
ExthX ðF; GÞ ¼ ExtY
ði ðFÞ; j ðGÞÞ
which gives a correspondence between the extensions of F, G and the extensions of their Fourier– Mukai transforms. This formula can be considered as the analog of the Parseval formula for the ordinary Fourier transform for functions on a torus. The Parseval formula can be proved using two facts. First, for arbitrary coherent sheaves E, G the Ext groups can be computed in terms of the derived category, namely Exti ðE; GÞ ¼ HomDðXÞ ðE; G½iÞ Second, the Fourier–Mukai transforms of F and G in the derived category D(X) are given by (F) = i (F)[i] and (G) = j (G)[j]. Since the Fourier– Mukai transform is an equivalence of categories, we have HomDðXÞ ðF; G½iÞ ¼ HomDðXÞ ði ðFÞ; j ðGÞ½i j þ hÞ implying the Parseval formula. A first simple example of a Fourier–Mukai functor can be given: let F be the complex in D(X X) defined by the structure sheaf O of the diagonal
X X. Then it is easy to check that F : D(X) ! D(X) is isomorphic to the identity functor on D(X). Moreover, if we shift degrees by n taking F = O [n] (a complex with only the sheaf O placed in degree n), then F : D(X) ! D(X) is the degree shifting functor G 7! G[n]. As we will be interested in relative Fourier–Mukai transforms for elliptic fibrations, let us consider the case of a Fourier– Mukai transform on an elliptic curve: consider an elliptic curve E with a fixed ^ = Pic0 (E) via origin p0 and identify E with E ^ x 7! OE (x p0 ). As kernel we take the f : E ! E, normalized Poincare´ line bundle P := OEE ( {p0 } E E {p0 }). The restriction of P to p0 E or E p0 is isomorphic to the trivial line bundle O. P has the universal property which can be expressed by P (k(x)) = f (x), where k(x) is the sheaf supported at a point x 2 E; in particular, P (k(p0 )) = OE and P (OE ) = k(p0 )[1], where OE is the structure sheaf of E. Relative Fourier–Mukai Transforms for Elliptic Fibrations
It is often convenient to study problems for families rather than for single varieties. The main advantage of the relative setting is that base-change properties (or parameter dependencies) are better encoded into the problem. We can do that for Fourier–Mukai functors as well. To this end, we consider two b ! B of algebraic vari^:X morphisms p : X ! B, p eties. We will assume that the morphisms are flat and so give nice families of algebraic varieties. We shall define relative Fourier–Mukai functors in this setting by means of a ‘‘kernel’’ E in the derived b category D(X B X). Let us make the relative setting explicit for elliptic fibrations: an elliptic fibration is a proper flat morphism p : X ! B of schemes whose fibers are Gorenstein curves of arithmetic genus 1. We also assume that p has a section : B ,! X taking values in the smooth locus X0 ! B of p. The generic fibres are then smooth elliptic curves, whereas some singular fibers are allowed. If the base B is a smooth curve, elliptic fibrations were studied and classified by Kodaira, who described all the types of singular fibers that may occur, the so-called Kodaira curves. When the base is a smooth surface, more complicated configuration of singular curves can occur and have indeed been studied by Miranda. First let us fix notation and setup. We denote by = (B) the image of the section, by Xt the fiber of p over t 2 B (we assume, in what follows, B is either a smooth curve or surface) and by it : Xt ,! X the inclusion. Furthermore, !X=B is the relative dualizing
Fourier–Mukai Transform in String Theory
sheaf and ! = R1 p OX ! (p !X=B ) , where the isomorphism is Grothendieck–Serre duality for p. The sheaf L = p !X=B is a line bundle whose first Chern class we denote by K = c1 (L). The adjunction formula for ,! X gives that 2 = p K as cycles on X. Moreover, we will consider elliptic fibrations with a section whose fibers are all geometrically integral. This means that the fibration is isomorphic with its Weierstrass model. From Kodaira’s classification of possible singular fibers one finds that the components of reducible fibers of p which do not meet form rational double be point configurations disjoint from . Let X ! X the result of contracting these configurations and let ! B be the induced map. Then all fibers of p :X p are irreducible with at worst nodes or cusps as as the singularities. In this case, one refers to X Weierstrass model of X. The Weierstrass model can be constructed as follows: the divisor 3 is relatively ample and, if : P = P(E ) ! B E = p OX (3) ! OB !2 !3 and p is the associated projective bundle, there is a projective morphism j : X ! P such that j(X) = X. Now special fibers of X ! B can have at most one singular point, either a cusp or a simple node. Thus, in this case 3 is relatively very ample and gives rise to a closed immersion j : X ,! P such that j OP (1) = OX (3), where j is locally a complete intersection whose normal sheaf is 6 N (X=P) ! ! OX (9). by relaV This follows tive duality since !P=B = P=B ! !5 (3), due to the Euler exact sequence 0 ! P=B ! Eð1Þ ! OP ! 0 The morphism p : X ! B is then a local complete intersection morphism (cf. Fulton (1984)) and has a virtual relative tangent bundle TX=B = [j TP=B ] [N X=P ] in the K-group K (X). The Todd class of TX=B is given by 1 ð12 p1 K þ 13p1 K2 Þ TdðTX=B Þ ¼ 1 12 p1 K þ 12
12 p1 K2 þ terms of higher degree ^ ! B denotes the dual elliptic fibration, ^:X Now if p defined as the relative moduli space of torsion-free rank-1 sheaves of relative degree 0, it is known that ^ t ffi Xt between for t 2 B there is an isomorphism X the fibers of both fibrations. Since we assume that the original fibration p : X ! B has a section , then ^ are globally isomorphic; hereafter we p and p b where X ^ denotes the compactified identify X ffi X, relative Jacobian of X. ^ is the scheme representing the Note that X functor which, to any scheme morphism : S ! B, associates the space of equivalence classes of S-flat
381
sheaves on ps : X B S ! S, whose restrictions to the fibers of are torsion-free (the usual definition of ‘‘torsion free’’ is only for integral varieties, i.e., varieties whose local rings have no zero-divisors. In this case, a sheaf M is torsion free if for any open subset U, any nonzero section m of M on U and any nonzero section a of the relevant functions sheaf, one has a m 6¼ 0. When the variety is not integral (it is reducible, or nonreduced) this definition has no real meaning, then what substitutes the notion of ‘‘torsion free’’ is the Simpson definition of ‘‘pure of maximal dimension’’: a sheaf M is ‘‘torsion free’’ in this sense if the support of any of its subsheaves is the whole variety (cf. Huybrechts and Lehn (1997)), of rank 1 and degree 0; two such sheaves F , F 0 are considered to be equivalent if F 0 ffi F ps L for a line bundle L on S (cf. Altman and Kleiman (1980); note the Altman–Kleiman compactification of the relative Jacobian applies to our situation since we consider elliptic fibrations with integral fibers). Moreover, the natural morphb x 7! I x OX ((t)) is an isomorphism ism X ! X, t (of B-schemes); here I x is the ideal sheaf of the point x in Xt . Note also that if : Y ! Xt is the normalization of one of our fibers Xt and z is the exceptional divisor (the pre-image of the singular point x) then (OY (z)) is the maximal ideal of x. b is a fine moduli space. This means The variety X b flat that there exists a coherent sheaf P on X B X b whose restrictions to the fibers of p ^ are over X, torsion free, and of rank 1 and degree 0. The sheaf P is defined, up to tensor product, by the pullback b and is called the universal of a line bundle on X, Poincare´ sheaf, which we will normalize by letting P j X ’ OX . We shall henceforth assume that P is Bb normalized in this way, so that P ¼ I OX ðÞ ^ OX ðÞ q !1 ^ ˆ refer to the diagram where , ˆ and q = p = p πˆ
X ×B X π
X
q p
X pˆ
B
and I is the ideal sheaf of the diagonal immersion X ,! X B X. Starting with the diagram and with the kernel given by the normalized relative universal Poincare´ sheaf P on the fibered product X B X, we define the relative Fourier–Mukai transform as ¼ P : DðXÞ ! DðXÞ F 7! ðFÞ ¼ R^ ð F PÞ
382 Fourier–Mukai Transform in String Theory
Note that (F) can be generalized if we allow changes in the base space B, that is, we consider base-change morphisms g : S ! B. We close this section with some remarks:
An important feature of Fourier–Mukai functors is that they are exact as functors of triangulated categories. In more familiar terms, we can say that for any exact sequence 0 ! N ! F ! G ! 0 of coherent sheaves in X, we obtain an exact sequence
! i1 ðGÞ ! i ðN Þ ! i ðF Þ ! i ðGÞ ! iþ1 ðN Þ !
where we have written = E and i (F) = Hi ((F)) denotes the ith cohomology sheaves of the complexes (F). Given a Fourier–Mukai functor E , a complex F in D(X) satisfies the WITi condition (or is WITi ) b such that if there is a coherent sheaf G on X E b (F) ’ G[i] in D(X), where G[i] is the associated complex concentrated in degree i. Furthermore, we say that F satisfies the ITi condition if, in addition, G is locally free. When the kernel E is simply a sheaf Q on X b flat over X, b the cohomology and base-change X theorem (cf. Hartshorne (1977)) allows one to show that a coherent sheaf F on X is ITi if and b and for all only if H j (X, F Q ) = 0 for all 2 X j 6¼ i, where Q denotes the restriction of Q to X {} and F is WIT0 if and only if it is IT0 . The acronym ‘‘IT’’ stands for ‘‘index theorem,’’ while ‘‘W’’ stands for ‘‘weak.’’ This terminology comes from Nahm transforms for connections on tori in complex differential geometry. The Parseval formula for the relative Fourier– Mukai transform has been proved by Mukai in his original Fourier–Mukai transform for abelian varieties and can be extended to any situation in which a Fourier–Mukai transform is fully faithful. For physical applications, it is often convenient to work in cohomology H (X, Q). The passage from D(X) to H (X, Q) can be described as follows. We first send a complex Z 2 D(X) to its natural class in the K-group; we then make use of the fact that the Chern character ch maps K(X) ! CH (X) Q and finally we apply the cycle map to H (X, Q). This passage (by abuse of notation) is often denoted by ch : D(X) ! H even (X, Q), it commutes with pullbacks and transforms tensor products into dot products. Moreover, pifffiffiffiffiffiffiffiffiffiffiffiffiffi we substitute the Mukai vector v(Z) = ch(Z) Td(X) for the Chern character ch(Z) then we find the commutative
diagram D(X )
ΦE
υ
H *(X, Q)
D(Y ) υ
Φυ(E )
H *(Y, Q)
This can be shown using the Grothendieck– Riemann–Roch theorem and the fact that the power series defining the Todd class starts with constant term 1 and thus is invertible.
Vector Bundles for Heterotic Strings A compactification of the ten-dimensional heterotic string is given by a holomorphic, stable G-bundle V (with G some Lie group specified below) over a Calabi–Yau manifold X. The Calabi–Yau condition, the holomorphy and stability of V are a direct consequence of the required supersymmetry in the uncompactified spacetime. We assume that the underlying ten-dimensional space M10 is decomposed as M10 = M4 X, where M4 (the uncompactified spacetime) denotes the four-dimensional Minkowski space and X a six-dimensional compact space given by a Calabi–Yau 3-fold. To be more precise: supersymmetry requires that the connection A on V satisfies FA2;0 ¼ FA0;2 ¼ 0;
F1;1 ^ J2 ¼ 0
where J denotes a Ka¨hler form of X. It follows that the connection has to be a holomorphic connection on a holomorphic vector bundle and, in addition, satisfies the Donaldson–Uhlenbeck–Yau equation, which has a unique solution if and only if the vector bundle is polystable. In addition to X and V, we have to specify a B-field on X of field strength H. In order to get an anomaly-free theory, the Lie group G is fixed to be either E8 E8 or Spin(32)=Z2 or one of their subgroups and H must satisfy the identity dH ¼ tr R ^ R Tr F ^ F where R and F are, respectively, the associated curvature forms of the spin connection on X and the gauge connection on V. Also tr refers to the trace of the composite endomorphism of the tangent bundle to X and Tr denotes the trace in the adjoint representation of G. For any closed four-dimensional submanifold X4 of the ten-dimensional spacetime M10 , the 4-form tr R ^ R Tr F ^ F must have trivial cohomology. Thus, a necessary topological condition V has to satisfy is ch2 (TX) = ch2 (V), which simplifies to c2 (TX) = c2 (V) for Calabi–Yau manifolds, V being an SU(n) vector bundle.
Fourier–Mukai Transform in String Theory
383
A physical interpretation of the third Chern class can be given as a result of the decomposition of the ten-dimensional spacetime into a four-dimensional flat Minkowski space and X. The decomposition of the corresponding ten-dimensional Dirac operator with values in V shows that massless fourdimensional fermions are in one-to-one correspondence with zero modes of the Dirac operator DV on X. The index of DV can be effectively computed using the Hirzebruch–Riemann–Roch theorem and is given by Z Z 1 indexðDÞ ¼ TdðXÞchðVÞ ¼ c3 ðVÞ 2 X X
The study of the correspondence between the heterotic string (on an elliptic Calabi–Yau 3-fold) and F-theory (on an elliptic Calabi–Yau fourfold) has led Friedman, Morgan, and Witten to introduce a new class of vector bundles which satisfy the anomaly constraint with [W] nonzero. As a result, they prove that the number obtained by integration of [W] over the elliptic fibers of the Calabi–Yau 3–fold agrees with the number of 3-branes given by the Euler characteristic of the Calabi–Yau fourfold divided by 24.
equivalently, we can write the index as index(D) = P3 k k i = 0 (1) dim H (X, V). For stable vector bundles, 0 we have H (X, V) = H 3 (X, V) = 0 and so the index computes the net number of fermion generations Ngen in the respective model. Now it has been observed that the inclusion of background 5-branes changes the anomaly constraint. Various 5-brane solutions of the heterotic string equations of motion have been discussed in the gauge 5-brane, the symmetric 5-brane, and the neutral 5-brane. It has been shown that the gauge and symmetric 5-brane solutions involve finite-size instantons of an unbroken nonabelian gauge group. In contrast, the neutral 5-branes can be interpreted as zero-size instantons of the SO(32) heterotic string. The magnetic 5-brane contributes a source term to the Bianchi identity for the 3-form H, X ð4Þ dH ¼ tr R ^ R Tr F ^ F þ n5 5
Let us now describe how the construction of vector bundles out of spectral data (first considered in Hitchin and Beauville, Narasimhan, and Ramanan) can be easily described in the case of elliptic fibrations by means of the relative Fourier–Mukai transform. This construction was widely exploited by Friedman, Morgan, and Witten to construct stable vector bundles on elliptic Calabi–Yau threefolds X, which we will summarize now. If V ! X is a vector bundle of rank n which is semistable and of degree 0 on each fibre f of X ! B, then its Fourier–Mukai transform 1 (V) is a torsion sheaf of pure dimension 2 on X. The support of 1 (V) is a surface i : C ,! X, which is finite of degree n over B. Moreover, 1 (V) is of rank 1 on C and, if C is smooth, then 1 (V) = i L is just the extension by zero of some line bundle L 2 Pic(C). Conversely, given a sheaf G ! X of pure dimension 2 which is flat over B, then (G) is a vector bundle on X of rank equal to the degree of supp(G) over B. This correspondence between vector bundles on X and sheaves on X supported on finite covers of B is known as the spectral cover construction. The torsion sheaf G is called the spectral sheaf (or line bundle) and the surface C = supp(G) is called the spectral cover. For the description of vector bundles on elliptic Calabi–Yau 3-folds X it is appropriate to take i L with Chern characters given by (E , 2 H 2 (B, Q) and aE , sE 2 Z)
five-branes
and integration over a 4-cycle in X gives the anomaly constraint c2 ðTXÞ ¼ c2 ðVÞ þ ½W The new term 5(4) is a current that integrates to 1 in the direction transverse to a single 5-brane whose class is denoted by [W]. The class [W] is the Poincare´ dual of an integer sum of all these sources and thus [W] should be an integral class, representing a class in H2 (X, Z). [W] can be further specified taking by into account that supersymmetry requires that 5-branes are wrapped on holomorphic curves and thus [W] must correspond to the homology class of holomorphic curves. This fact constrains [W] to be an algebraic class. Further, algebraic classes include negative classes; however, these lead to negative magnetic charges, which are unphysical, and so they have to be excluded. This constrains [W] to be an effective class. Thus, for a given Calabi– Yau 3-fold X the effectivity of [W] constrains the choice of vector bundles V.
Fourier–Mukai Transforms and Spectral Covers
ch0 ði LÞ ¼ 0;
ch1 ði LÞ ¼ n þ
ch2 ði LÞ ¼ E þ aE f;
ch3 ði LÞ ¼ sE
The characteristic classes of the rank-n vector bundle V can be obtained if we apply the Grothendieck– Riemann–Roch theorem to the projection : chðVÞ ¼ ½^ ðchði LÞÞ chðPÞTdðTX=B Þ where Td(TX=B ) as given above.
384 Fourier–Mukai Transform in String Theory
To make sure that the construction leads to SU(n) vector bundles we set E = (1=2)nc1 giving c1 (V) = 0 and the remaining Chern classes are given by c2 ðVÞ ¼ ðÞ þ ð$Þ;
c3 ðVÞ ¼ 2jS
subvariety of the spectral cover of V, contradicting the assumption that this cover is irreducible. So we must have a strict inequality HB (V 0 ) < HB (V). Now taking small enough, we can ensure that J (V 0 ) < J (V), thus V 0 cannot destabilize V.
where 1 c1 ðBÞ2 ðn3 nÞ þ 12 2 14 nð nc1 ðBÞÞ $ ¼ 24 and 2 H 1, 1 (C, Z) is some cohomology class satisfying C = 0 2 H 1, 1 (B, Z). The general solution for has been derived by Friedman, Morgan, and Witten and is given by = (njC C þ nC c1 (B)) and jS = ( n c1 (B)) with S = C \ . The parameter has to be determined such that c1 (L) is an integer class. If n is even, = m(m 2 Z) and in addition we must impose = c1 (B) modulo 2. If n is odd, = m þ 1=2. It remains to discuss the stability of V. The stability depends on the properties of the defining data C and L. If C is irreducible and L a line bundle over C then V will be a vector bundle stable with respect to the polarization J ¼ J0 þ HB ;
>0
if is sufficiently small. This has been proved by Friedman, Morgan, and Witten under the additional assumption that the restriction of V to the generic fiber is regular and semistable. Here J0 refers to some arbitrary Ka¨hler class on X and HB a Ka¨hler class on the base B. It implies that the bundle V can be taken to be stable with respect to J while keeping the volume of the fiber f of X arbitrarily small compared to the volumes of effective curves associated with the base. That J is actually a good polarization can be seen by assuming = 0. Now we observe that HB is not a Ka¨hler class on X since its integral is non-negative on each effective curve C in X; however, there is one curve, the fiber f, where the integral vanishes. This means that HB is on the boundary of the Ka¨hler cone and, to make V stable, we have to move slightly into the interior of the Ka¨hler cone, that is, into the chamber which is closest to the boundary point HB . Also we note that although HB is in the boundary of the Ka¨hler cone, we can still define the slope HB (V) with respect to it. Since ( HB )2 is some positive multiple of the class of the fiber f, semistability with respect to HB is implied by the semistability of the restrictions Vjf to the fibers. Assume that V is not stable with respect to J, then there is a destabilizing sub-bundle V 0 V with J (V 0 ) J (V). But semistability along the fibers says that HB (V 0 ) HB (V). If we had equality, it would follow that V 0 arises by the spectral construction from a proper
D-Branes and Homological Mirror Symmetry Kontsevich proposed a homological mirror symmetry for a pair (X, Y) of mirror dual Calabi–Yau manifolds; it is conjectured that there exists a categorical equivalence between the bounded derived category D(X) and Fukaya’s A1 category F (Y), which is defined by using the symplectic structure on Y. A Lagrangian submanifold with a flat bundle gives an object of F (Y). If we consider a locally trivial family of symplectic manifolds Y (i.e., the symplectic form is locally constant as we vary Y in the family) the object of F (Y) undergoes monodromy transformations going round a loop in the base. On the other hand, the object of D(X) is a complex of coherent sheaves on X and under the categorical equivalence between D(X) and F (Y) the monodromy (of 3-cycles) is mapped to certain selfequivalences in D(X). Since all elements in D(X) can be represented by suitable complexes of vector bundles on X, we can consider the topological K-group and the image Khol (X) of D(X). The Fourier–Mukai transform E : D(X) ! D(X) induces then a corresponding automorphism Khol (X) ! Khol (X) and also an automorphism on H even (X, Q) if we use the Chern character ring homomorphism ch : K(X) ! H even (X, Q), as described above. With this in mind, we can introduce various kernels and their associated monodromy transformations. For instance, let D be the associated divisor defining the large-radius limit in the Ka¨hler moduli space and consider the kernel O (D), with being the diagonal in X X. The corresponding Fourier–Mukai transform acts on an object G 2 D(X) as twisting by a line bundle, that is, G 7! G O(D). This automorphism is then identified with the monodromy about the large complex structure limit point (LCSL point) in the complex structure moduli space. Furthermore, if we consider the kernel given by the ideal sheaf I on , we find that the action of I on H even (X) can be expressed by taking the Chern character ring homomorphism: chðI ðGÞÞ ¼ ch0 ðOXX ðGÞÞ chðGÞ Z ¼ chðGÞ TdðXÞ chðGÞ
Fourier–Mukai Transform in String Theory
Kontsevich proposed that this automorphism should reproduce the monodromy about the principal component of the discriminant of the mirror family Y. At the principal component we have vanishing S3 cycles (and the conifold singularity), thus the action of this monodromy on cohomology may be identified with the Picard–Lefschetz formula. Now for a given pair of mirror dual Calabi–Yau 3-folds, it is generally assumed that A-type and B-type D-branes exchange under mirror symmetry. For such a pair, Kontsevich’s correspondence between automorphisms of D(X) and monodromies of 3-cycles can then be tested. More specifically, a comparison relies on the identification of two central charges associated to D-brane configurations on both sides of the mirror pair. For this, we first have to specify a basis for the 3-cycles i 2 H 3 (Y, Z) such that the intersection form takes the canonical form i j = j, iþb2, 1 þ1 = i, j for i = 0, . . . , b2, 1 . ItP follows that a 3-brane wrapped about the cycle = i ni i has an (electric, magnetic) charge vector n = (ni ). The periods of the holomorphic 3-form are then given by Z i ¼ i
and can be used to provide projective coordinates on the complex structure moduli space. If we choose a symplectic basis (Ai , Bj ) of H2 (Y, Z) then the Ai periods serve as projective coordinates and the Bj periods satisfy the relations j = i, j @F =@i , where F is the prepotential which has, near the largeradius limit, the asymptotic form (as analyzed by Candelas, Klemm, Theisen, Yau, and Hosono, cf. ‘‘Further reading’’): F¼
1X 1X kabc ta tb tc þ cab ta tb 6 abc 2 ab
X c2 ðXÞJa a
24
ta þ
ð3Þ 2ð2iÞ3
ðXÞ þ const:
where (X) is the Euler characteristic of X, cab are rational constants (with cab = cba ) reflecting an Sp(2h11 þ 2) ambiguity, and kabc is the classical triple intersection number given by Z kabc ¼ Ja ^ Jb ^ Jc X
The periods determine the central charge P Z(n) of a 3-brane wrapped about the cycle = i ni [i ]: Z X ZðnÞ ¼ ¼ n i i
i
385
On the other hand, the central charge associated with an object E of D(X) is given by Z c2 ðXÞ eta Ja chðEÞ 1 þ ZðEÞ ¼ 24 X Now, physically it is assumed that the two central charges are to be identified under mirror symmetry. If we compare the two central charges Z(n) and Z(E), then we obtain a map relating the Chern characters ch(E) of E to the D-brane charges n. If we insert the expressions for ch(E) in ch(I (E)), it yields a linear transformation acting on n, such that n6 ! n6 þ n3 , which agrees with the monodromy transformation about the conifold locus. Similarly, the monodromy transformation about the LCSL point corresponding to automorphisms [E] ! [E OX (D)] can be made explicit. Using the central charge identification, the automorphism/monodromy correspondence has been made explicit for various dual pairs of mirror Calabi–Yau 3-folds (given as hypersurfaces in weighted projective spaces). This identification provides evidence for Kontsevich’s proposal of homological mirror symmetry. See also: Derived Categories; Mirror Symmetry: A Geometric Survey.
Further Reading Altman AB and Kleiman SL (1980) Compactifying the Picard scheme. Advances in Mathematics 35: 50–112. Bartocci C, Bruzzo U, Herna´ndez Ruipe´rez D, and Jardim M (2006) Nahm and Fourier–Mukai transforms in geometry and mathematical physics. Progress in Mathematical Physics (to appear). Bondal AI and Orlov DO (2001) Reconstruction of a variety from the derived category and groups of autoequivalences. Compositio Mathematica 125: 327–344. Beauville A, Narasimhan MS, and Ramanan S (1989) Spectral curves and the generalised theta divisor. J. Reine Angew. Math. 398: 169–179. Callan CG Jr., Harvey JA, and Strominger A (1991) Worldbrane actions for string solitons. Nuclear Physics B 367: 60–82. Candelas P, Font A, Katz S, and Morrison DR (1994) Mirror symmetry for two-parameter models. II. Nuclear Physics B 429: 626–674. Donagi RY (1997) Principal bundles on elliptic fibrations. Asian Journal of Mathematics 1: 214–223 (alg-geom/9702002). Donagi RY (1998) Taniguchi lecture on principal bundles on elliptic fibrations, hep-th/9802094. Friedman R, Morgan JW, and Witten E (1997) Vector bundles and F theory. Communications in Mathematical Physics 187: 679–743. Fulton W (1984) Intersection Theory, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 2. Berlin: Springer. Hartshorne R (1977) Algebraic Geometry. Berlin: Springer.
386 Four-Manifold Invariants and Physics Huybrechts D (2006) Fourier–Mukai transforms in algebraic geometry. Oxford: Oxford University Press (to appear). Huybrechts D and Lehn M (1997) The geometry of moduli spaces of sheaves. Braunschweig/Wiesbaden: Vieweg & Sohn. Hitchin NJ (1987) The self-duality equations on a Riemann surface. Proceedings of London Mathematical Society 55(3): 59–126. Hosono S, Klemm A, Theisen S, and Yau S-T (1995) Mirror symmetry, mirror map and applications to complete intersection Calabi–Yau spaces. Nuclear Physics B 433: 501–552. Hosono S (1998) GKZ systems, Gro¨bner fans, and moduli spaces of Calabi–Yau hypersurfaces. In: Topological Field Theory, Primitive Forms and Related Topics (Kyoto, 1996), Progr. Math.,, vol. 160, pp. 239–265. Boston: Birkha¨user. Kawamata Y (2002) D-equivalence and K-equivalence. Journal of Differential Geometry 61: 147–171. Kodaira K (1963a) On compact analytic surfaces. II, III. Annals of Mathematics 77(2): 563–626.
Kodaira K (1963b) On compact analytic surfaces. I, III. Annals of Mathematics 78: 1–40. Kontsevich M (1995) Homological algebra of mirror symmetry. In: Proceedings of the International Congress of Mathematicians, vol. 1, 2 (Zu¨rich, 1994), pp. 120–139. Basel: Birkha¨user. Miranda R (1983) Smooth models for elliptic threefolds. In: The Birational Geometry of Degenerations (Cambridge, MA, 1981), Progr. Math., vol. 29, pp. 85–133. Boston: Birkha¨user. ^ with its Mukai S (1981) Duality between D(X) and D(X) application to Picard sheaves. Nagoya Math. J. 81: 153–175. Orlov DO (1997) Equivalences of derived categories and K3 surfaces. Journal of Mathematical Sciences (New York) 84: 1361–1381 (Algebraic geometry, 7). Witten E (1986) New issues in manifolds of SU(3) holonomy. Nuclear Physics B 268: 79–112.
Four-Manifold Invariants and Physics C Nash, National University of Ireland, Maynooth, Ireland ª 2006 Elsevier Ltd. All rights reserved.
Introduction Manifolds of dimension 4 play a distinguished role in physics and have done so ever since special and general relativity ushered in the celebrated fourdimensional spacetime. It is also the case that manifolds of dimension 4 play a distinguished role in mathematics: many generalities about manifolds of a general dimension do not apply in dimension 4; there are also phenomena in dimension 4 with no counterpart in other dimensions. This article describes some of the more important physical and mathematical properties of dimension 4. We begin with an account of some topological and geometric properties for manifolds in general, but avoiding dimension 4, and then embark on the dimension 4 discussion. The references at the end will serve to take the reader further into the subject.
Topological, Piecewise-Linear, and Differentiable Structures for Manifolds In dealing with topological spaces which are manifolds, one distinguishes three types of manifolds M: topological, piecewise-linear, and differentiable (also called smooth). It is possible to describe the more important differences between these three types using topological techniques. Consider then a manifold M of dimension n; M will always be assumed to be compact, connected and
closed unless we indicate the contrary. The type of M is determined by examining whether the transition functions g are homeomorphisms, (invertible) piecewise-linear maps, or diffeomorphisms. Now, since the transition functions are maps from one subset of Rn to another, we introduce the groups TOPn , PLn , and DIFFn which are all the homeomorphisms, piecewiselinear maps, and diffeomorphisms of Rn , respectively. We are naturally led to the three sets of inclusions: TOP1 TOP2 TOPn
PL1 PL2 PLn ½1 DIFF1 DIFF2 DIFFn
For each of the three sets of inclusions we pass to the direct limit and construct the three limiting groups TOP;
PL;
DIFF
½2
With these three groups are associated the classifying spaces BTOP, BPL and BDIFF. The transition functions g are those of the tangent bundle to M; and there are three possible tangent bundles depending on the type of M and we denote these tangent bundles by TMTOP , TMPL , and TMDIFF in an obvious notation. Then to determine the tangent bundles TMTOP , TMPL , and TMDIFF one simply selects an element of the homotopy classes ½M; BTOP;
½M; BPL; and ½M; BDIFF
½3
respectively. Given this threefold hierarchy of manifold structures one wishes to know when one can straighten out a topological manifold to make it piecewise linear; and also, when can one smooth a piecewise-linear manifold to make it differentiable?
Four-Manifold Invariants and Physics
If dim M 5 of M these two questions can be formulated as lifting problems.
homotopy classes are just the whole cohomology group H k ðM; k ðTOP=PLÞÞ
TOP versus PL for dim M 6¼ 4
!
BPL #
½4
A method for straightening out a PL manifold is now apparent: now a topological manifold is a choice of map : M ! BTOP, and a factorization of through BPL will give M a PL structure. We show this below
M
% !
BTOP
¼
n5
½12
So, for dim M 5, we see that when a closed topological manifold M acquires a PL structure by the lifting process just described, then the possible distinct PL structures are isomorphic to ½13
which is not zero in general. Finally, if dim M 3, then the notions PL and TOP coincide, so we are left with the case dim M = 4 which we shall come to below. Now we wish to describe the next step in the sequence TOP, PL, DIFF which is the smoothing problem.
½5
The existence of the map : M ! BPL satisfying = provides M with a PL structure and is a lifting of the map from the base BTOP to the total space BPL. This lifting method, for passing from TOP structures to PL structures, does work, provided dim M 5, since we have the stability result that TOPn TOP ; ’ PL PLn
H 3 ðM; Z2 Þ
H 3 ðM; Z2 Þ
BTOP
BPL #
½11
which, since k = 3, is just
Taking the first of them, so that we are comparing piecewise-linear and topological structures on M, one can check BPL fibers over BTOP with fiber TOP/PL yielding TOP=PL
387
½6
For the map to exist the obstructions to the lifting which are cohomology classes of the form
PL versus DIFF for dim M 6¼ 4 Similar ideas are used to address the question of smoothing a piecewise-linear manifold – however, the results are different. Let us assume that M is a closed PL manifold with dim M 5. This time the fibration is PL=DIFF
! BDIFF #
½14
BPL
where K(Z2 , 3) is Eilenberg–Mac Lane space so that its sole nonvanishing homotopy group is in dimension 3 giving us ½9 n ðTOP=PLÞ ¼ Z2 if n ¼ 3 0 otherwise
The smoothing of a piecewise-linear M can also be handled with obstruction theory and leads us immediately to the consideration of the homotopy groups n (PL=DIFF). This time the nontrivial homotopy groups of the fiber are much more numerous than in the piecewise-linear case. In fact one has 8 0 if n 6 > > > > if n ¼ 7 Z 28 > > > < Z2 if n ¼ 8 .. ½15 n ðPL=DIFFÞ ¼ .. > . . > > > > Z992 if n ¼ 11 > > :. .. .. .
Any obstruction to ’s existence is a class e(M), say, in
The obstructions to passing from a PL to a DIFF structure on M now lie in
H kþ1 ðM; k ðTOP=PLÞÞ
½7
must vanish. However, Kirby and Siebenmann have shown that TOP=PL ’ KðZ2 ; 3Þ
H 4 ðM; Z2 Þ
dim M 5
½8
½10
When e(M) vanishes, the map exists and furnishes M with a PL structure; if e(M) = 0 it is natural to go on to ask how many (homotopy classes of) such ’s exist? Standard obstruction theory says the relevant
H kþ1 ðM; k ðPL=DIFFÞÞ
½16
and the number of distinct liftings comprises the cohomology group Hk ðM; k ðPL=DIFFÞÞ
½17
388 Four-Manifold Invariants and Physics
As an illustration of all this, consider the case M = S7 ; then the first nontriviality occurs when n = 7 and so the obstruction to smoothing S7 lies in H8 ðS7 ; 7 ðPL=DIFFÞÞ
½18
which is of course zero – this means that S7 can be smoothed, a fact which we know from first principles. However, by the obstruction theory introduced above, the resulting smooth structures are isomorphic to H 7 ðS7 ; 7 ðPL=DIFFÞÞ ¼ H7 ðS7 ; Z28 Þ ¼ Z28
½19
Hence, we have the celebrated result of Milnor and Kervaire and Milnor that S7 has 28 distinct differentiable structures, 27 of which correspond to what are known as exotic spheres. Lastly, if dim M 3, then PL and DIFF coincide – this leaves us with the case of greatest interest namely dim M = 4.
The Strange Case of Four Dimensions In four dimensions there are phenomena which have no counterpart in any other dimension. First of all, there are topological 4-manifolds which have no smooth structure, though if they have a PL structure, then they possess a unique smooth structure. Second, the impediment to the existence of a smooth structure is of a completely different type to that met in the standard obstruction theory – it is not the pullback of an element in the cohomology of a classifying space, that is, it is not a characteristic class. Also the fourdimensional story is far from completely known. Nevertheless, there are some very striking results dating from the early 1980s onwards. We begin by disposing of the difference between PL and DIFF structures: our earlier results together with the vanishing statement n ðPL=DIFFÞ ¼ 0;
n6
½20
mean that every PL 4-manifold possesses a unique DIFF structure. Thus, we can take the crucial difference to be between DIFF and TOP. In Freedman (1982) all, simply connected, topological 4-manifolds were classified by their intersection form q. We recall that q is a quadratic form constructed from the cohomology of M as follows: take two elements and of H2 (M; Z) and form their cup product [ 2 H 4 (M; Z); then we define q(, ) by qð; Þ ¼ ð [ Þ½M
½21
where ( [ )[M] denotes the integer obtained by evaluating [ on the generating cycle [M] of the
top homology group H4 (M; Z) of M. Poincare´ duality ensures that such a form is always nondegenerate over Z and so has det q = 1; q is then called unimodular. Also we refer to q, as ‘‘even’’ if all its diagonal entries are even, and as ‘‘odd’’ otherwise. Freedman’s work yields the following: Theorem (Freedman). A simply connected 4-manifold M with even intersection form q belongs to a unique homeomorphism class, while if q is odd there are precisely two nonhomeomorphic manifolds M with q as their intersection form. This is a very powerful result – the intersection form q very nearly determines the homeomorphism class of a simply connected M, and actually only fails to do so in the odd case where there are still just two possibilities. Further, every unimodular quadratic form occurs as the intersection form of some manifold. As an illustration of the impressive nature of Freedman’s work, choose M to be the sphere S4 , since H 2 (S4 ; Z) is trivial, then q is the zero quadratic form and is of course even; we write this as q = ;. Now recollect that the Poincare´ conjecture in four dimensions is the statement that any homotopy 4-sphere, S4h say, is actually homeomorphic to S4 . Well, since H 2 (S4h ; Z) is also trivial then any S4h also has intersection form q = ;. Applying Freedman’s theorem to S4h immediately asserts that S4h belongs to a unique homeomorphism class which must be that of S4 thereby establishing the Poincare´ conjecture. Freedman’s result combined with a much earlier result of Rohlin (1952) also gives us an example of a nonsmoothable 4-manifold: Rohlin’s theorem asserts that given a smooth, simply connected, 4-manifold with even intersection form q, then the signature – the signature of q being defined to be the difference between the number of positive and negative eigenvalues of q – (q) of q is divisible by 16. Now write 1 0 2 1 0 0 0 0 0 0 B1 2 1 0 0 0 0 0C C B B 0 1 2 1 0 0 0 0C C B B 0 0 1 2 1 0 0 0C q¼B C ¼ E8 ½22 B 0 0 0 1 2 1 0 1C C B B 0 0 0 0 1 2 1 0C A @ 0 0 0 0 0 1 2 0 0 0 0 0 1 0 0 2 (E8 is actually the Cartan matrix for the exceptional Lie algebra e8 ), then, by inspection, q is even, and by calculation, it has signature 8. By Freedman’s theorem there is a single, simply connected, 4-manifold with intersection form q= E8 . However, by
Four-Manifold Invariants and Physics
Rohlin’s theorem, it cannot be smoothed since its signature is 8. The next breakthrough was due to Donaldson (1983). Donaldson’s theorem is applicable to definite forms q, which by appropriate choice of orientation on M we can take to be positive definite. One has: Theorem (Donaldson). A simply connected, smooth 4-manifold, with positive-definite intersection form q is always diagonalizable over the integers to q = diag(1, . . . , 1). One can immediately deduce that no, simply connected, 4-manifold for which q is even and positive definite can be smoothed! For example, the manifold with q = E8 E8 has signature 16 (by Rohlin’s theorem). But since E8 is even, then so is E8 E8 and so Donaldson’s theorem forbids such a manifold from existing smoothly. In fact, in contrast to Freedman’s theorem, which allows all unimodular quadratic forms to occur as the intersection form of some topological manifold, Donaldson’s theorem says that in the positivedefinite, smooth, case only one quadratic form is allowed, namely I. Donaldson’s work makes contact with physics because it uses the Yang–Mills equations as we now outline. Let A be a connection on a principal SU(2) bundle over a simply connected 4-manifold M with positive-definite intersection form. If the curvature 2-form of A is F, then F has an L2 norm which is the Euclidean Yang–Mills action S. One has Z 2 S ¼ kFk ¼ trðF ^ FÞ ½23 M
where F is the usual dual 2-form to F. The minima of the action S are given by those A, called instantons, which satisfy the famous self-duality equations F ¼ F
½24
Given one instanton A which minimizes S one can perturb about A in an attempt to find more instantons. This process is successful and the space of all instantons can be fitted together to form a global moduli space of finite dimension. For the instanton which provides the absolute minimum of S, the moduli space M is a noncompact space of dimension 5. We can now summarize the logic that is used to prove Donaldson’s theorem: there are very strong relationships between M and the moduli space M;
389
for example, let q be regarded as an n n matrix with precisely p unit eigenvalues (clearly p n and Donaldson’s theorem is just the statement that p = n), then M has precisely p singularities which look like cones on the space CP2 . These combine to produce the result that the 4-manifold M has the same topological signature Sign(M) as p copies of CP2 ; and so they have signature a – b, where a of the CP2 ’s are oriented as usual and b have the opposite orientation. Thus, Sign ðMÞ ¼ a b
½25
Now by definition, Sign (M) is the signature (q) of the intersection form q of M. But, by assumption, q is positive definite n n so (q) = n = Sign (M). Hence, n¼ab
½26
However, a þ b = p and p n so we can say that n ¼ a b;
p¼aþbn
½27
but one always has a þ b a b so we have npn)p¼n
½28
which is Donaldson’s theorem.
Donaldson’s Polynomial Invariants Donaldson extended his work by introducing polynomial invariants also derived from Yang–Mills theory and to discuss them we must introduce some notation. Let M be a smooth, simply connected, orientable Riemannian 4-manifold without boundary and A be an SU(2) connection which is anti-self-dual so that F¼ F
½29
Then the space of all gauge-inequivalent solutions to this anti-self-duality equation – the moduli space Mk – has a dimension given by the integer dim Mk ¼ 8k 3ð1 þ bþ 2Þ
½30
Here k is the instanton number which gives the topological type of the solution A. The instanton number is minus the second Chern class c2 (F) 2 H 2 (M; Z) of the bundle on which the A is defined. This means that we have Z 1 k ¼ c2 ðFÞ½M ¼ 2 trðF ^ FÞ 2 Z ½31 8 M The number bþ 2 is defined to be the rank of the positive part of the intersection form q of M.
390 Four-Manifold Invariants and Physics
A Donaldson invariant qM d, r is a symmetric integer polynomial of degree d in the 2-homology H2 (M; Z) of M qM d;r : H2 ðMÞ H2 ðMÞ ! Z |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
½32
d factors
Given a certain map mi , mi : Hi ðMÞ ! H 4i ðMk Þ
½33
if 2 H2 (M) and represents a point in M, we define qM d, r () by writing d r qM d;r ðÞ ¼ m2 ðÞm0 ðÞ½Mk
½34
The evaluation of [Mk ] on the RHS of the above equation means that 2d þ 4r ¼ dim Mk
½35
so that Mk is even dimensional; this is achieved by requiring bþ 2 to be odd. Now the Donaldson invariants qM d, r are differential topological invariants rather than topological invariants but they are difficult to calculate as they require detailed knowledge of the instanton moduli space Mk . However they are nontrivial and their values are known for a number of 4-manifolds M. For example, if M is a complex algebraic surface, a positivity argument shows that they are nonzero when d is large enough. Conversely, if M can be written as the connected sum M ¼ M1 #M2 where both M1 and M2 have bþ 2 > 0, then they all vanish.
Topological Quantum Field Theories Turning now to physics, it is time to point out that the qM d, r can also be obtained, Witten (1988), as the correlation functions of twisted N = 2 supersymmetric topological quantum field theory. The action S for this theory is given by Z 1 1 pffiffiffi S¼ d4 x g tr F F þ F F 4 4 M 1 þ D D þ iD iD 2 i i ½ ; ½ ; 8 2 i 1 ½; ½; 2 ½36 2 8 where F is the curvature of a connection A and (, , , , ) are a collection of fields introduced
in order to construct the right supersymmetric theory; and are both spinless while the multiplet ( , ) contains the components of a 0-form, a 1-form, and a self-dual 2-form, respectively. The significance of this choice of multiplet is that the instanton deformation complex used to calculate dim Mk contains precisely these fields. Even though S contains a metric, its correlation functions are independent of the metric g so that S can still be regarded as a topological quantum field theory. This is because both S and its associated energy momentum tensor T ( S= g) can be written as BRST commutators S = {Q, V}, T = {Q, V 0 } for suitable V and V 0 . With this theory, it is possible to show that the correlation functions are independent of the gauge coupling and hence we can evaluate them in a small coupling limit. In this limit, the functional integrals are dominated by the classical minima of S, which for A are just the instantons F ¼ F
½37
We also need and to vanish for irreducible connections. If we expand all the fields around the minima up to quadratic terms and do the resulting Gaussian integrals, the correlation functions may be formally evaluated. A general correlation function of this theory is given by Z < P >¼ DF exp½S PðF Þ ½38 where F denotes the collection of fields present in S and P(F ) is some polynomial in the fields. S has been constructed so that the zero modes in the expansion about the minima are the tangents to the moduli space Mk . This suggests doing the DF integration as follows: express the integral as an integral over modes, then integrate out all the nonzero modes first leaving a finite-dimensional integration over the compactified moduli space Mk . The Gaussian integration over the nonzero modes is a boson–fermion ratio of determinants, which supersymmetry constrains to be 1, bosonic and fermionic eigenvalues being equal in pairs. This amounts to writing Z < P >¼ Pn ½39 Mk
where Pn denotes some n-form over Mk and n = dim Mk . If the original polynomial P(F ) is judiciously chosen, then calculation of < P > reproduces evaluation of the Donaldson polynomials qM d, r .
Four-Manifold Invariants and Physics
The Seiberg–Witten Equations The Seiberg–Witten equations constitute another breakthrough in the work on the topology of 4-manifolds, since they greatly simplify the calculation of the data supplied by the Donaldson polynomial invariants. We shall discuss this later below but turn now to the equations themselves. If we choose an oriented, compact, closed, Riemannian manifold M, then the data we need for the Seiberg–Witten equations are a connection A on a line bundle L over M and a ‘‘local spinor’’ field . The Seiberg–Witten equations are then @=A ¼ 0;
Fþ ¼ 12
½40
where @=A is the Dirac operator and is made from the gamma matrices i according to = (1=2) [i , j ]dxi ^ dxj . We call a local spinor because global spinors may not exist on M; however, in dimension 4, orientability guarantees that a spinc structure exists on M (a choice of spinc structure on M is an extra piece of data in the Seiberg–Witten case); is then the appropriate section for the spinc bundle and behaves locally like a spinor coupled to the U(1) connection A. Let Spinc (M) denote the set of isomorphism classes of spinc structures on M then, þ for the case bþ 2 > 1 – the case b2 = 1 has some technicalities – the Seiberg–Witten invariants determine a map SW of the form SW : Spinc ðMÞ ! Z
M
If we use a Weitzenbo¨ck formula to relate the Laplacian rA rA to @= A @=A plus curvature terms, we find that S satisfies Z n o j@=A j2 þ 12jFþ þ 12 j2 M Z n o ¼ jrA j2 þ 12jFþ j2 þ 18j j4 þ 14Rj j2 ½43 M
¼
Z n M
jrA j2 þ 14jFj2 þ 18j j4 þ 14Rj j2
þ 2 c21 ðLÞ
where R is the scalar curvature of M and c1 (L) is the Chern class of L. We notice that the action now looks like one for monopoles. But now suppose that R is positive and that the pair (A, ) is a solution to the Seiberg– Witten equations, then the left-hand side (LHS) of this last expression is zero and all the integrands on the RHS are positive so the solution must obey = 0 and Fþ = 0. A technical point is that if M has bþ 2 > 1, then a perturbation of the metric can preserve the positivity of R but perturb Fþ = 0 to be simply F = 0 rendering the connection A flat. Hence, in these circumstances, the solution (A, ) is the trivial one. This means that we have a new kind of vanishing theorem in four dimensions. Theorem (Witten 1994). No 4-manifold with bþ 2 > 1 and nontrivial solution to the Seiberg–Witten equations admits a metric of positive scalar curvature. Now, for technical reasons, we assume that the qM d, r have the property that M qM d;rþ2 ¼ 4qd;r
o ½44
½45
A simply connected M with this property is called of ~M simple type. We also define q d by writing ( M if d ¼ ðbþ qd;0 ; 2 þ 1Þ mod 2 ~M q ¼ ½46 d þ 1 M if d ¼ b2 mod 2 2 qd;1 ; The generating function GM () is now given by
½41
We emphasize that A is just a U(1) abelian connection and so F = dA, with Fþ denoting the self-dual part of F. We shall now have a look at an example of a new result obtained directly from the Seiberg–Witten equations. The equations clearly provide the absolute minima for the action Z n o S¼ j@=A j2 þ 12jFþ þ 12 j2 ½42
391
GM ðÞ ¼
1 X 1 M ~d ðÞ q d! d¼0
½47
According to Kronheimer and Mrowka (1994), GM () can be expressed in terms of a finite number of classes (known as basic classes) i ( i 2 H 2 (M)) with rational coefficients ai (the Seiberg–Witten invariants) resulting in the formula X GM ðÞ ¼ exp½ =2 ai exp½ i ½48 i
Hence, for M of simple type, the polynomial invariants are determined by a (finite) number of basic classes and the Seiberg–Witten invariants. Returning now to the physics we find that the quantum field theory approach to the polynomial invariants relates them to properties of the moduli space for the Seiberg–Witten equations rather than to properties of the instanton moduli space Mk . The moduli space for the Seiberg–Witten equations, unlike the instanton case, is compact and generically has dimension c21 ðLÞ 2 ðMÞ 3ðMÞ 4
½49
392 Four-Manifold Invariants and Physics
(M) and (M) being the Euler characteristic and signature of M, respectively. When c21 ðLÞ ¼ 2 ðMÞ þ 3ðMÞ
½50
we get a zero-dimensional moduli space consisting of a finite collection of points fP1 ; . . . ; PN g
½51
Now each point Pi has a sign i = 1 associated with it coming from the sign of the determinant of elliptic operator whose index gave the dimension of the moduli space. The sum of these signs is an integer topological invariant denoted by nL , that is, nL ¼
N X
i
½52
j¼1
Returning now to our formula for GM (), one finds that X GM ðÞ ¼ 2pðMÞ exp½ =2 nL exp½c1 ðLÞ ½53 T
pðMÞ ¼ 1 þ 14 ð7 ðMÞ þ 11ðMÞÞ
½54
and the sum over L on the RHS of the formula is over line bundles L that satisfy c21 ðLÞ
¼ 2 ðMÞ þ 3ðMÞ
½55
that is, it is a sum over L with zero-dimensional Seiberg–Witten moduli spaces. Comparison of the two formulas for GM () – the first mathematical in origin and the second physical – allows one to identify the Seiberg–Witten invariants ai and the Kronheimer–Mrowka basic classes i as the c1 (L)’s. The results described thus far are for simply connected 4-manifolds but this condition is not obligatory for and there is also a theory in the nonsimply-connected case (Marin˜o and Moore 1999). The physics underlying these topological results is of great importance since many of the ideas originate there. It is known that the computation of the Donaldson invariants there uses the fact that the N = 2 gauge theory is asymptotically free. This means that the ultraviolet limit being one of weak coupling is tractable. However, the less tractable infrared or strong-coupling limit would do just as well to calculate the Donaldson invariants since these latter are metric independent. In Seiberg and Witten’s work, this infrared behavior is actually determined and it is found that, in the strong-coupling infrared limit, the theory is equivalent to a weakly coupled theory of abelian fields and monopoles. There is also a duality
between the original theory and the theory with monopoles which is expressed by the fact that the (abelian) gauge group of the monopole theory is the dual of the maximal torus of the group of the nonabelian theory. We recall that the Yang–Mills gauge group in this discussion is SU(2). Seiberg and Witten’s results mean the replacement of SU(2) instantons used to compute the Donaldson invariants by the counting of U(1) monopoles. This calculation of the nonabelian Donaldson data by abelian Seiberg–Witten data theory is much like the representation theory of a nonabelian Lie group G where everything is determined by an abelian object: the maximal torus. The theory considered by Seiberg and Witten possesses a collection of quantum vacua labeled by a complex parameter u which turns out to parametrize a family of elliptic curves. A central part is played by a function (u) on which there is a modular action of SL(2, Z). The successful determination of the infrared limit involves an electric–magnetic duality and the whole matter is of very considerable independent interest for quantum field theory, quark confinement, and string theory in general.
Seiberg–Witten Theory and Exotic Structures on 4-Manifolds We saw earlier that, when dim M 6¼ 4, a manifold may possess a finite number of differentiable structures, S7 having 28 distinct smooth structures. However, in dimension 4, Seiberg–Witten theory has been used to show that there are many 4-manifolds with a countable infinity of smooth structures. We just mention two: the K3 surface has infinitely many smooth structures as does the manifold 2 CP 2 #5CP : This is another instance of how dimension 4 differs from all other dimensions. This infinite variety of exotic smooth structures in four dimensions is also of great interest to physics. An outstanding four-dimensional matter still is the smooth Poincare´ conjecture which asks whether a smooth 4-manifold M homotopic to S4 is diffeomorphic to S4 ? Such an M is certainly homeomorphic to S4 because this is the standard Poincare´ conjecture proved by Freedman and, if the answer to this question is yes then S4 would be an example of a 4-manifold with no exotic smooth structures. There is at present no consensus on the answer to this question.
Exotic Structures on Open 4-Manifolds If M is an open manifold, that is, a noncompact manifold without boundary, and M = Rn then, for
Four-Manifold Invariants and Physics
n 6¼ 4, there is only one smooth structure; but for n = 4, there are exotic differentiable structures on R4 . In fact, Gompf showed that there is a continuum of exotic differentiable structures that can be placed on R4 . Symplectic and Ka¨hler 4-Manifolds
Many 4-manifolds are symplectic, and symplectic manifolds are central in physics; there are many results obtained using Seiberg–Witten theory concerning the topology and geometry of symplectic manifolds. The exotic K3 structures referred to above are all symplectic and so there is no shortage of symplectic structures even within one homeomorphism class. Taubes obtained far-reaching new results for symplectic 4-manifolds including establishing an equivalence between the Seiberg–Witten invariants in the symplectic case and the Gromov invariants. Ka¨hler manifolds possess, simultaneously, compatible, Riemannian, symplectic and complex structures and, beginning with Witten’s work, there are many results to be found for Ka¨hler 4-manifolds using Seiberg–Witten techniques.
4-Manifolds with Boundary
393
function on the space of gauge orbits A=G if one considers exp (2kif (A)) where k 2 Z (G being the group of gauge transformations). Morse theory applied to this infinite-dimensional setting gives an infinite Morse index to each critical point, a pathology which is avoided by only defining the difference of the index between two critical points using spectral flow. The critical points correspond, via gradient flow and a consideration of the instanton equations F ¼F
½58
on the 4-manifold N R, to the flat connections on the 3-manifold N. The latter are identifiable as the set of (equivalence classes) of representations of the fundamental group 1 (M) in the gauge group SU(2), that is, with Homð1 ðNÞ; SUð2ÞÞ=Ad SUð2Þ
½59
^ denote a For the Seiberg–Witten formulation, let A connection on the 3-manifold N with curvature ^ F(A). Then the Chern–Simons function f(A) is replaced by the abelian Chern–Simons function together with a quadratic fermion term resulting in ^ defined by the function f SW (A), Z n o ^ ¼ ^ ^ FðAÞ ^ D = A^ þ A ½60 f SW ðAÞ N
There is a very important extension of the Donaldson– Seiberg–Witten theory to 4-manifolds M with boundary @M = N. When @M 6¼ , the Donaldson invariants are not numerical invariants but take values in HF(N) where HF(N) denotes what is called the Floer homology of the 3-manifold N. Topological quantum field theory is the ideal setting for this theory since it naturally treats manifolds with boundaries. The Floer homology groups HF(N) act as Hilbert spaces for the quantum fields defined on the boundary. There is now a full interplay of 4-manifold theory and 3-manifold theory as well as Yang–Mills theory in three and four dimensions. This interplay is often realized by taking two 4-manifolds M1 and M2 with the same boundary N and joining them along N to obtain a closed 4-manifold M so that
where D = A^ denotes the self-adjoint Dirac operator in three dimensions acting on a spinor on N; because of the presence of the Chern–Simons function ^ is only defined up to a multiple of 82 in a f SW (A) manner similar to the case for f (A). Gradient flow together with the Seiberg–Witten equations on the 4-manifold N R result in critical points corresponding to the solutions to D = A^ ¼ 0;
^ ¼ 1 FðAÞ 2
½61
Given a 3-manifold N, and an SU(2) connection A, Floer studied the critical points of the Chern– Simons function f(A) defined by Z 1 f ðAÞ ¼ 2 tr A ^ dA þ 23A ^ A ^ A ½57 8 N
which is a three-dimensional version of the Seiberg– Witten equations. The critical point theory of these two functions f (A) ^ permit the construction of the instanton and f SW (A) Floer homology groups HFinst (N) and HFSW (N), respectively. In fact, there are several kinds of Floer homology: Lagrangian Floer homology, instanton Floer homology, Heegard–Floer homology, Seiberg– Witten–Floer homology and conjectures concerning their relations to one another. There are still many unanswered questions of joint interest to mathematicians and physicists in the entire area of 4-manifold theory.
where f(A) is regarded as a function on the infinitedimensional space A of connections. The function f(A) changes by an integer under a gauge transformation and so descends to a single-valued gauge-invariant
See also: Electric–Magnetic Duality; Gauge Theoretic Invariants of 4-Manifolds; Floer Homology; Topological Quantum Field Theory: Overview.
M ¼ M1 [N M2
½56
394 Fractal Dimensions in Dynamics
Further Reading Atiyah MF (1989) Topological quantum field theories. Institut des Hautes Etudes Scientifiques Publications Mathematiques 68: 175–186. Atiyah MF (1995) Floer homology. In: Hofer H, Weinstein A, Taubes C, and Zehnder E (eds.) Floer Memorial Volume. Boston: Birkha¨user. Donaldson SK (1984) An application of gauge theory to four dimensional topology. Journal of Differential Geometry 96: 387–407. Donaldson SK (1996) The Seiberg–Witten equations and 4-manifold topology. Bulletin of the American Mathematical Society 33: 45–70. Donaldson SK (2002) Floer Homology Groups in Yang–Mills Theory. Cambridge: Cambridge University Press. Donaldson SK and Kronheimer PB (1990) The Geometry of Four Manifolds. Oxford: Oxford University Press. Fintushel R and Stern RJ (1998) Knots, links and 4-manifolds. Inventiones Mathematicae 134: 363–400. Fintushel R and Stern RJ (2004) Double node neighborhoods and families of simply connected 4-manifolds with bþ = 1; arXiv: math.GT/0412126. Freedman MH (1982) The topology of 4-dimensional manifolds. Journal of Differential Geometry 17: 357–453. Gadgil S (2004) Open manifolds, Ozsvath–Szabo invariants and exotic R4 s, arXiv:math.GT/0408379. Gompf R (1985) An infinite set of exotic R4 ’s. Journal of Differential Geometry 21: 283–300. Gromov M (1985) Pseudo-holomorphic curves in symplectic manifolds. Inventiones Math. 82: 307–347. Kronheimer PB and Mrowka TS (1994) Recurrence relations and asymptotics for four manifold invariants. Bulletin of the American Mathematical Society 30: 215–221.
Mandelbaum R (1980) Four-dimensional topology: an introduction. Bulletin of the American Mathematical Society 2: 1–159. Marin˜o M and Moore G (1999) Donaldson invariants for nonsimply connected manifolds. Communications in Mathematical Physics 203: 249–267. Morgan J and Mrowka T (1992) A note on Donaldson’s polynomial invariants. International Mathematics Research Notices 10: 223–230. Nash C (1992) Differential Topology and Quantum Field Theory. London–San Diego: Academic Press. Park J, Stipsicz AI, and Szabo Z (2004) Exotic smooth structures 2 on CP2 #5CP , arXiv:math.GT/041221. Rohlin VA (1952) New results in the theory of 4 dimensional manifolds. Doklady Akademii Nauk SSSR 84: 221–224. Seiberg N and Witten E (1994a) Electric–magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang–Mills theory. Nuclear Physics B 426: 19–52. Seiberg N and Witten E (1994b) Electric-magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang–Mills theory – erratum. Nuclear Physics B 430: 485–486. Seiberg N and Witten E (1994c) Monopoles, duality and chiral symmetry breaking in N = 2 supersymmetric QCD. Nuclear Physics B 431: 484–550. Taubes CH (1995a) More constraints on symplectic forms from Seiberg–Witten invariants. Mathematical Research Letters 2: 9–13. Taubes CH (1995b) The Seiberg–Witten and Gromov invariants. Mathematical Research Letters 2: 221–238. Witten E (1988) Topological quantum field theory. Communications in Mathematical Physics 117: 353–386. Witten E (1994) Monopoles and four manifolds. Mathematical Research Letters 1: 769–796.
Fractal Dimensions in Dynamics V Zˇupanovic´ and D Zˇubrinic´, University of Zagreb, Zagreb, Croatia ª 2006 Elsevier Ltd. All rights reserved.
Introduction Since the 1970s, dimension theory for dynamics has evolved into an independent field of mathematics. Its main goal is to measure complexity of invariant sets and measures using fractal dimensions. The history of fractal dimensions is closely related to the names of H Minkowski (Minkowski content, 1903), H Hausdorff (Hausdorff dimension, 1919), G Bouligand (Bouligand dimension, 1928), L S Pontryagin and L G Schnirelmann (metric order, 1932), P Moran (Moran geometric constructions, 1946), A S Besicovitch and S J Taylor (Besicovitch– Taylor index, 1954), A Re´nyi (Re´nyi spectrum for dimensions, 1957), A N Kolmogorov and V M Tihomirov (metric dimension, Kolmogorov
complexity, 1959), Ya G Sinai, D Ruelle, R Bowen (thermodynamic formalism, Bowen’s equation, 1972, 1973, 1979), B Mandelbrot (fractals and multifractals, 1974), J L Kaplan and J A Yorke (Lyapunov dimension, 1979), J E Hutchinson (fractals and self-similarity, 1981), C Tricot, D Sullivan (packing dimension, 1982, 1984), H G E Hentschel and I Procaccia (Hentschel–Procaccia spectrum for dimensions, 1983), Ya Pesin (Carathe´odory–Pesin dimension, 1988), M Lapidus and M van Frankenhuysen (complex dimensions for fractal strings, 2000), etc. Fractal dimensions enable us to have a better insight into the dynamics appearing in various problems in physics, engineering, chemistry, medicine, geology, meteorology, ecology, economics, computer science, image processing, and, of course, in many branches of mathematics. Concentrating on box and Hausdorff dimensions only, we describe basic methods of fractal analysis in dynamics, sketch their applications, and indicate some trends in this rapidly growing field.
Fractal Dimensions in Dynamics
395
Fractal Dimensions Box Dimensions
Let A be a bounded set in R N , and let d(x, A) be Euclidean distance from x to A. The Minkowski sausage of radius " around A (a term coined by B Mandelbrot) is defined as "-neighborhood of A, that is, A" := {y 2 RN: d(y, A) < "}. By the upper s-dimensional Minkowski content of A, s 0, we mean Ms ðAÞ :¼ lim
"!0
jA" j 2 ½0; 1 "Ns
Here jj denotes N-dimensional Lebesgue measure. The corresponding upper box dimension is defined by dimB A :¼ inffs 0: Ms ðAÞ ¼ 0g The lower s-dimensional Minkowski content Ms (A) and the corresponding lower box dimension dimB A are defined analogously. The name of box dimension stems from the following: if we have an "-grid in RN composed of closed N-dimensional boxes with side ", and if N(A, ") is the number of boxes of the grid intersecting A, then log NðA; "Þ dimB A ¼ lim "!0 logð1="Þ and analogously for dimB A. It suffices to take any geometric subsequence "k = bk in the limit, where b > 1 (H Furstenberg, 1970). There are many other names for the upper box dimension appearing in the literature, like the Cantor–Minkowski order, Minkowski dimension, Bouligand dimension, Borel logarithmic rarefaction, Besicovitch–Taylor index, entropy dimension, Kolmogorov dimension, fractal dimension, capacity dimension, and limit capacity. If A is such that dimB A = dimB A, the common value is denoted by d := dimB A, and we call it the box dimension of A. If, in addition to this, both Md (A) and Md (A) are in (0, 1), we say that A is Minkowski nondegenerate. If, moreover, Md (A) = Md (A) =: Md (A) 2 (0, 1), then A is said to be Minkowski measurable. Assume that A is such that d := dimB A and Md (A) exist. Then the value of Md (A)1 is called the lacunarity of A (B Mandelbrot, 1982). A bounded set A RN is said to be porous (A Denjoy, 1920) if there exist > 0 and > 0 such that for every x 2 A and r 2 (0, ) there is y 2 R N such that the open ball Br (y) is contained in Br (x) n A. If A is porous then it is easy to see that dimB A < N (O Martio and M Vuorinen, 1987, A Salli, 1991). We proceed with two examples. Let A := C(a) , a 2 (0, 1=2), be the Cantor set obtained
Figure 1 Spirals of equal box dimensions (4/3) and different lacunarities (0.43 and 0.05).
from [0, 1] by consecutive deletion of 2k middle open intervals of length ak (1 2a) in step k 2 N [ {0}. Then dimB A = ( log 2)=( log (1=a)) (G Bouligand, 1928), and A is nondegenerate, but not Minkowski measurable (Lapidus and Pomerance, 1993). For the spiral of focus type defined by r = m’ in polar coordinates, where 2 (0, 1) and m > 0 are fixed, ’ ’1 > 0, we have dimB = 2=(1 þ ) (Y Dupain, M Mende´s-France, C Tricot, 1983). It ˇ upanovic´, is Minkowski measurable (Zˇubrinic´ and Z 2005), and the larger m, the smaller the lacunarity; see Figure 1. Hausdorff Dimension
For a given subset A of RN (not necessarily bounded) s 0 we define Hs (A) := lim" ! 0 P1 and s inf { i = 1 ri } 2 [0, 1], where the infimum is taken over all finite or countable coverings of A by open balls of radii ri ". The value of Hs (A) is called s-dimensional Hausdorff outer measure of A. The Hausdorff dimension of A, sometimes called the Hausdorff–Besicovitch dimension, is defined by dimH A :¼ inffs 0: Hs ðAÞ ¼ 0g If A is bounded then dimH A dimB A dimB A N. We say that A is Hausdorff nondegenerate (or d-set) if Hd (A) 2 (0, 1) for some d 0. Cantor sets share this property, and dimH C(a) = ( log 2)=( log (1=a)), where a 2 (0, 1=2) (Hausdorff, 1919). Gauge Functions
The notions of Minkowski contents and Hausdorff measure can be generalized using gauge functions h : [0, "0 ) ! R that are assumed to be continuous, increasing, and h(0) = 0. For example, MhðAÞ :¼ lim
"!0
jA" j hð"Þ "N
and similarly for Mh (A) (M Lapidus and C He, 1997), while for Hh (A) it suffices to change ri s with h(ri ) in the above definition of the Hausdorff outer measure (Besicovitch, 1934). Gauge functions are used for sets that are Minkowski or Hausdorff
396 Fractal Dimensions in Dynamics
degenerate. The aim, if possible, is to find an explicit gauge function so that the corresponding generalized Minkowski contents or Hausdorff measure of A be nondegenerate.
Δ1
Δ2
Δ11
Δ12
Δ21
Δ22
Methods of Fractal Analysis in Dynamics Figure 2 Cantor-like set.
Thermodynamic Formalism
Thermodynamic formalism has been developed by Sinai (1972), Ruelle (1973), and Bowen (1975), using methods of statistical mechanics in order to study dynamics and to find dimensions of various fractal sets. We first describe a ‘‘dictionary’’ for explicit geometric constructions of Cantor-like sets. Let Xp be the set of all sequences i = (i1 , i2 , . . . ) of elements ik from a given set of p symbols, say {1, . . . , p}. We endow Xp with the metric d(i, j) := P 2, k 2 jik jk j and introduce the one-sided shift k operator (or left shift) : Xp ! Xp defined by ((i))n = inþ1 , that is, (i1 , i2 , i3 , . . . ) = (i2 , i3 , i4 , . . . ). A set Q ˝ Xp is called the symbolic dynamics if it is compact and -invariant, that is, (Q) ˝ Q. Hence, (Q, ) is a symbolic dynamical system. Denote i[n] := (i1 , . . . , in ). Given a continuous function ’ : Q ! R, let us define the topological pressure of ’ with respect to by X 1 Pð’Þ :¼ lim log Eði½nÞ n!1 n fi½n:i 2 Qg ! n1 X k ’ð ðjÞÞ Eði½nÞ :¼ exp sup fj 2 Q: j½n¼i½ng k¼0
The topological entropy of j Q is defined by h(jQ) := P(0), that is, hðjQÞ ¼ lim
1
n!1 n
log #fi½n: i 2 Qg
where # denotes the cardinal number of a set. The P above function ’n := n1 ’ k has the property k=0 ’nþm = ’n þ ’m n , and therefore we speak about additive thermodynamic formalism. Topological pressure was introduced by D Ruelle (1973) and extended by P Walters (1976). Bowen’s equation (1979) has a very important role in the computation of the Hausdorff dimension of various sets. For the unknown s 2 R, and with a suitably chosen function ’, this equation reads Pðs’Þ ¼ 0
Geometric Constructions
A geometric construction (Q, ) in Rm indexed by symbolic dynamics Q is a family of compact sets
i[n] Rm , i 2 Q, n 2 N, such that diami[n] ! 0 as n ! 1, i[nþ1] ˝ i[n] , i[n] = inti[n] for every i 2 Q and all n, and inti[n] \ intj[n] = ; whenever i[n] 6¼ j[n] (Moran’s open set condition). This family induces the Cantor-like set ! 1 \ [ F :¼ i½n n¼1
i2Q
(see Figure 2). The mapping h : Q ! F defined by h(i) := \1 n = 1 i[n] is called the coding map of F. The above geometric construction includes well-known iterated function systems of similarities as a special case. If 1 , . . . , p are given numbers in (0, 1), and i[n] are balls of radii ri[n] := i1 . . . in , then s := dimH F is the unique solution of Bowen’s equation P(s’) = 0, where ’ is defined by ’(i) := log i1 (Ya Pesin and H Weiss, 1996). In this case Bowen’s equation is equivalent to Moran’s equation (1946), p X
k s ¼ 1
k¼1
This result has been generalized by L Barreira (1996) using the Carathe´odory–Pesin construction (1988). Let us illustrate Barreira’s theory of nonadditive thermodynamic formalism with a special case. Assume that (Q, ) is a geometric construction for which the sets i[n] are balls, and let there exist > 0 such that ri[nþ1] ri[n] and ri[nþm] ri[n] rn (i)[m] for all i 2 Q, n, m 2 N. Then dimH F = dimB F = s, where s is the unique real number such that X 1 log ri½n s ¼ 0 n!1 n fi½n:i 2 Qg lim
½1
This is a special case of Barreira’s extension of Bowen’s equation to nonadditive thermodynamic formalism. Moran’s equation can be deduced from [1] by defining ri[n] := i1 . . . in , where i = (i1 , i2 , . . . ), and 1 , . . . , p 2 (0, 1) are given numbers. Pesin and Weiss (1996) showed that Moran’s open set condition can be weakened so that partial intersections of interiors of pairs of basic sets in the family are allowed. Thermodynamic formalism has been used to study the Hausdorff dimension of Julia sets
Fractal Dimensions in Dynamics
(Ruelle, 1982), horseshoes (H McCluskey and A Manning, 1983), etc. An important example of symbolic dynamics is the topological Markov chain XA generated by a p p matrix A with entries aij 2 {0, 1}: XA :¼ fi ¼ ði1 ; i2 ; . . .Þ 2 Xp: aik ikþ1 ¼ 1 for all k 2 Ng It is a compact, -invariant subset of Xp . The map j XA is called the subshift of finite type (Bowen, 1975). A construction of Cantor-like set F using dynamics Q = Xp is called a simple geometric construction, while a geometric construction is said to be a Markov geometric construction if Q = XA . If F is obtained by a Markov geometric construction such that all i[n] are balls of radii ri[n] := i1 . . . in , where ij 2 (0, 1), ij 2 {1, . . . , p}, then dimB F = dimH F = s, where s is the unique solution of equation (AMs ) = 1. Here Ms := diag(1 s , . . . , p s ) and (AMs ) is the spectral radius of the matrix AMs . This and more general results have been obtained by Pesin and Weiss (1996). Any Cantor-like set F obtained via iterated function system of similarities satisfying Moran’s open set condition is Hausdorff nondegenerate (Moran, 1946). If F is of nonlattice type, that is, the set { log 1 , . . . , log p } is not contained in r Z for any r > 0, then F is Minkowski measurable (D Gatzouras, 1999). Hyperbolic Measures
Let X be a complete metric space and assume that f : X ! X is continuous. Let be an f-invariant Borel probability measure on X (i.e., (f 1 (A)) = (A) for measurable sets A) with a compact support. The Hausdorff dimension of , and the lower and upper box dimensions of (L-S Young, 1982) are defined by dimH :¼ inffdimH Z : Z ˝ X; ðZÞ ¼ 1g dimB :¼ lim inffdimB Z : Z ˝ X; ðZÞ 1 g !0
dimB :¼ lim inffdimB Z : Z ˝ X; ðZÞ 1 g !0
It is natural to introduce the lower and upper pointwise dimensions of at x 2 X by d ðxÞ :¼ lim r!0
log ðBr ðxÞÞ log r
and similarly d (x). It has been shown by Young (1982) that if X has finite topological dimension and if is exact dimensional, that is, d (x) = d (x) =: d for -a.e. x 2 X, then dimH ¼ dimB ¼ d
397
She also proved that hyperbolic measures (ergodic measures with nonzero Lyapunov exponents), invariant under a C1þ -diffeomorphism, > 0, are exact dimensional. F Ledrappier (1986) derived exact dimensionality for hyperbolic Bowen–Ruelle–Sinai measures. This result was extended by Ya Pesin and Ch Yue (1996) to hyperbolic measures with semilocal product structure. J-P Eckmann and D Ruelle (1985) conjectured that the exact dimensionality holds for general hyperbolic measures, and this was proved by Barreira, Pesin, and Schmeling (1996). More precisely, if f is a C1þ -diffeomorphism on a smooth Riemann manifold X without boundary, and if is f-invariant, compactly supported Borel probability measure, then its hyperbolicity implies that d ðxÞ ¼ d ðxÞ ¼ ds ðxÞ þ du ðxÞ for -a.e. x 2 X, where ds (x) and du (x) are stable and unstable pointwise dimensions of at x introduced by Ledrappier and Young (1985). Multifractal Analysis of Functions and Measures
Invariant sets of many dynamical systems are not self-similar. Roughly speaking, the aim of multifractal analysis is to make a decomposition of the invariant set with respect to desired fractal properties and then to study a fractal dimension of each set of the decomposition. Some dynamical systems have invariant sets equal to graphs of Ho¨lderian functions f : R N ! R, so that wavelet methods can be used. One of the goals of multifractal analysis of functions is to study the spectrum of singularities of f defined by df ðÞ :¼ dimH H ðf Þ introduced by U Frisch and G Parisi (1985) in the context of fully developed turbulence. Here H (f ) is the set of points at which the corresponding pointwise Ho¨lder exponent of f is equal to 0. If the function f is self-similar then df () is real analytic and strictly concave (first increasing and then decreasing) on an explicit interval (a, a) (S Jaffard, 1997). It is natural to consider the set C, (f ) of points x0 called chirps of order (, ) (Y Meyer 1996), at which f behaves roughly like jx x0 j sin (1=jx x0 j ), > 0. The function Df (, ) := dimH C, (f ) is called the chirp spectrum of f (S Jaffard 2000). Wavelet methods have found applications in the study of evolution equations and in modeling and detection of chirps in turbulent flows (S Jaffard, Y Meyer, R D Robert, 2001). Basic ideas of multifractal analysis have been introduced by physicists T Halsey, M H Jensen,
398 Fractal Dimensions in Dynamics
L P Kadanoff, I Procaccia, and B I Shraiman (1988). In applications it often deals with an invariant ergodic probability measure associated with the dynamical system considered. Multifractal analysis of a Borel finite measure defined on RN consists in the study of the function d ðÞ :¼ dimH K ðÞ;
0
called the spectrum of pointwise dimensions of . Here K () is the set of points where the pointwise dimension of is equal to : K ðÞ :¼ fx 2 RN: d ðxÞ ¼ d ðxÞ ¼ g It is also of interest to study the Hausdorff dimension of irregular set K() := {x 2 RN: d (x) < d (x)}. These sets are pairwise disjoint and constitute a multifractal decomposition of RN , that is, RN ¼ KðÞ [ ð[ 2 R K ðÞÞ The function d () provides an important information about the complexity of multifractal decomposition. In many situations, there is an open interval (, ) on which the function d () is analytic and strictly concave (first increasing and then decreasing), and equal to the Legendre transform of an explicit convex function. We thus obtain an uncountable family of sets K () with positive Hausdorff dimension, which shows enormous complexity of the multifractal decomposition of RN . These and related questions have been studied by L Olsen (1995), K Falconer (1996), Pesin and Weiss (1996), Barreira and Schmeling (2000), and many other authors. Local Lyapunov Dimension
Let be an open set in RN and let f : ! RN be a C1 -map. To any fixed x 2 we assign N singular values a1 a2 aN 0 of f, defined as square roots of eigenvalues of the matrix f 0 (x)> f 0 (x), where f 0 (x) is the Jacobian of f at x, and f 0 (x)> its transpose. The local Lyapunov dimension of f at x is defined by dimL ðf ; xÞ :¼ j þ s where j is the largest integer in [0, N] such that a1 aj 1 (if there is no such j we let j = 0), and s 2 [0, 1) is the unique solution of a1 aj asjþ1 = 1 (except for j = N, when we define s = 0). This definition, due to B R Hunt (1996), is close to that of Kaplan and Yorke (1979). The Jacobian f 0 (x) contracts k-dimensional volumes (that is, a1 ak < 1) if and only if dimL (f , x) < k. In this case, we say that f is k-contracting at x. Furthermore, the function x 7! dimL (f , x) is upper-semicontinuous, so that for
any compact subset A of the Lyapunov dimension of f on A, dimL ðf ; AÞ :¼ max dimL ðf ; xÞ x2A
is well defined. Yu S Ilyashenko conjectured that if f locally contracts k-dimensional volumes then the upper box dimension of any compact invariant set is < k. Hunt (1996) proved that if A is a compact, strictly invariant set of f (i.e., f (A) = A) then dimB A dimL ðf ; AÞ
½2
This is an improvement of dimH A dimL(f , A) obtained by A Douady and J Oesterle´ (1980), and independently by Ilyashenko (1982). M A Blinchevskaya and Yu S Ilyashenko (1999) proved that if A is any attractor of a smooth map in a Hilbert space that contracts k-dimensional volumes then dimB A k. See [3] below. A continuous variant of this method is used in order to obtain estimates of fractal dimensions of global attractors of dynamical systems (X, S) on a Hilbert space X. Here S(t), t 0, is a semigroup of continuous operators on X, that is, S(t þ s) = S(t)S(s) and S(0) = I. A set A in X is called a global attractor of dynamical system if it is compact, attracting (i.e., for any bounded set B and " > 0 there exists t0 such that for t t0 we have S(t)B ˝ A" ), and A is strictly invariant (i.e., S(t)A = A for all t 0).
Applications in Dynamics Logistic Map
M Feigenbaum, a mathematical physicist, introduced and studied the dynamics of the logistic map f : [0, 1] ! [0, 1], f (x) := x(1 x), 2 (0, 4]. Taking = 1 3.570 the corresponding invariant set A [0, 1] (i.e., S1 (A) [ S2 (A) = A, where Si are two branches of f1 ) has both Hausdorff and box dimensions equal to 0.538 (P Grassberger 1981, P Grassberger and I Procaccia, 1983). The set A has Cantor-like structure, but is not self-similar. Its multifractal properties have been studied by U Frisch, K Khanin, and T Matsumoto (2004). Smale Horseshoe
In the early 1960s S Smale defined his famous horseshoe map and showed that it has a strange invariant set resulting in chaotic dynamics. The notion of strange attractor was introduced in 1971 by Ruelle and Takens in their study of turbulence. Let S be a square in the plane and let f : R2 ! R2 be a map transforming S as indicated in Figure 3, such that on both components of S \ f 1 (S) the map f is
Fractal Dimensions in Dynamics
4
3
S
1
2
S ∩ f –1(S ) f (s)
4′ f (A) A
B
1′ 2′ f (B ) 3′
S ∩ f –1(S ) ∩ f –2(S ) f 2(S )
4′′ 1′′ 2′′ 3′′
f –2(S ) ∩ f –1(S ) ∩ S ∩ f (S )∩ f 2(S )
Figure 3 The Smale horseshoe.
fractal dimensions have important role in the study of homoclinic bifurcations of nonconservative dynamical systems. Since the 1970s the relationship between invariants of hyperbolic sets and the typical dynamics appearing in the unfolding of a homoclinic tangency by a parametrized family of surface diffeomorphisms has been studied by J Newhouse, J Palis, F Takens, J-C Yoccoz, C G Moreira and M Viana. The main result is that if the Hausdorff dimension of the hyperbolic set involved in the tangency is < 1 then the parameter set where the hyperbolicity prevails has full Lebesgue density. If the Hausdorff dimension is >1, then hyperbolicity is not prevalent. This result and its proof were inspired by previous work of J M Marstrand (1954) about arithmetic differences of Cantor sets on the real line. According to the result by Moreira, Palis, and Viana (2001) the paradigm ‘‘hyperbolicity prevails if and only if the Hausdorff dimension is < 1’’ extends to homoclinic bifurcations in any dimension. Using methods of thermodynamic formalism McCluskey and Manning (1983) proved that if f is the above horseshoe map, then there exists a C1 -neighborhood U of f such that the mapping f 7! dimH f is continuous. Continuity of box and Hausdorff dimensions for horseshoes has been studied also by Takens, Palis, and Viana (1988). Lorenz Attractor
affine and preserves both horizontal and vertical directions, and such that points 1, 2, 3, and 4 are mapped to 10 , 20 , 30 , and 40 . Iterating f we get j backward invariant set := \1 j = 0 f (S), forward 1 j invariant set þ := \j = 0 f (S), and invariant set (horseshoe) f := þ \ . These sets have the Cantor set structure. More precisely, assuming that the contraction parameter of f in vertical direction is a 2 (0, 1=2), and the expansion parameter in horizontal direction is b > 2, then þ = [0, 1] C(a) , where C(a) is the Cantor set, = C(1=b) [0, 1], and f = C(1=b) C(a) , so that dimB þ = dimH þ = 1 þ ( log 2)=( log (1=a)) and dimB f ¼ dimH f ¼
399
log 2 log 2 þ log b logð1=aÞ
This is a special case of a general result about horseshoes in R 2 (not necessarily affine), due to McCluskey and Manning (1983), stated in terms of the pressure function. Analogous result as above can be obtained for Smale solenoids. In R3 it is possible to construct affine horseshoes f such that dimH f < dimB f (M Pollicott and H Weiss, 1994). Smale discovered a connection between homoclinic orbits and the horseshoe map. It has been noticed that
E N Lorenz (1963), a meteorologist and student of G Birkhoff, showed by numerical experiments that for certain values of positive parameters , r, b, the quadratic system x_ ¼ ðy xÞ;
y_ ¼ rx y xz;
z_ ¼ xy bz
has the global attractor A, for example, for = 10, r = 28, b = 8=3. In this case dimB A 2.06, which is a numerical result (Grassberger and Procaccia, 1983). Using the analysis of local Lyapunov dimension along the flow in A, G A Leonov (2001) showed that if þ 1 b 2 and r2 (4 b) þ 2(b 1)
(2 3b) > b(b 1)2 then dimB A 3
2ð þ b þ 1Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ 1 þ ð 1Þ2 þ 4r
He´non Attractor
M He´non (1976), a theoretical astronomer, discovered the map f : R 2 ! R2 , f (x, y) := (a þ by x2 , x), capturing several essential properties of the Lorenz system. In the case of a = 1.4 and b = 0.3, Hunt (1996) derived from [2] that for any compact, strictly f-invariant set A in the trapping region [1.8, 1.8]2
400 Fractal Dimensions in Dynamics
there holds dimB A < 1.5. Numerical experiments show that dimB A 1.28 (Grassberger, 1983). Assuming a > 0, b 2 (0, 1), and P (x , x ) 2 A, where P are fixed points of f, Leonov (2001) obtained that dimB A 1 þ
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ln b= lnð x2 þ b x Þ
Here x :¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 b 1 ðb 1Þ2 þ 4a 2
The proof is based on the study of local Lyapunov dimension of f and its iterates on A. Embedology
The physical relevance of box dimensions in the study of attractors is related to the problem of finding the smallest possible dimension n sufficient to ‘‘embed’’ an attractor into R n . If A Rk is a compact set and if n > 2dimB A, then almost every map from Rk into Rn , in the sense of prevalence, is one-to-one on A and, moreover, it is an embedding on smooth manifolds contained in A (T Sauer, J A Yorke, and M Casdagli, 1991). If A is a strange attractor then the same is true for almost every delay-coordinate map from Rk to Rn . This improves an earlier result by H Whitney (1936) and F Takens (Takens’ embedology, 1981). The above notion of prevalence means the following: a property holds almost everywhere in the sense of prevalence if it holds on a subset S of the space V := C1 (R k , R n ) for which there exists a finite-dimensional subspace E V (probe space) such that for each v 2 V we have that v þ e 2 S for Lebesgue a.e. e 2 E.
Furthermore, it is left continuous at 1/4 (O Bodart and M Zinsmeister, 1996), but not continuous (A Douady, P Sentenac, and M Zinsmeister, 1997). Discontinuity of this map is related to the phenomenon of parabolic implosion at c = 1=4. The derivative d0 (c) tends to þ1 from the left at c = 1=4 like (1=4 c)d(1=4)3=2 (G Havard and M Zinsmeister, 2000). Here d(1=4) 1.07, which is a numerical result. Analysis of dimensions is based on methods of thermodynamic formalism. C McMullen (1998) showed that if is an irrational number of bounded type (i.e., its continued fractional expansion [a1 , a2 , . . . ] is such that the sequence (ai ) is bounded from above) and f (z) := z2 þ e2 i z, then the Julia set J(f ) is porous. In particular, dimB J(f ) < 2. Y C Yin (2000) showed that if all critical points in J(f ) of a rational map f : C ! C are nonrecurrent (a point is nonrecurrent if it is not contained in its !-limit set) then J(f ) is porous, hence dimB J(f ) < 2. Urban´ski and Przytycki (2001) described more general rational maps such that dimB J(f ) < 2. Spiral Trajectories
A standard planar model where P the Hopf–Takens 2i bifurcation occurs is r_ = r(r2l þ l1 ˙ = 1, i = 0 ai r ), ’ where l 2 N. If is a spiral tending to the limit cycle r = a of multiplicity m (i.e., r = a is a zero of order m of the right-hand side of the first equation in the system) then dimB = 2 1=m. Furthermore, for m > 1 the spiral is Minkowski measurable ˇ upanovic´, 2005). For m = 1 the (Zˇubrinic´ and Z spiral is Minkowski nondegenerate with respect to the gauge function h(") := "( log (1="))1 .
Julia and Mandelbrot Sets
M Shishikura (1998) proved that the boundary of the Mandelbrot set M generated by fc (z) := z2 þ c has the Hausdorff dimension equal to 2, thus answering positively to the conjecture by B Mandelbrot, J Milnor, and other mathematicians. Also for Julia sets there holds dimH J(fc ) = 2 for generic c in M (i.e., on the set of second Baire category). The proof is based on the study of the bifurcation of parabolic periodic points. Also, each baby Mandelbrot set sitting inside of M has the boundary of Hausdorff dimension 2 (L Tan, 1998). Shishikura’s results hold for more general functions f (z) := zd þ c, where d 2. For Julia sets J(fc ) generated by fc (z) := z2 þ c there holds d(c) := dimH J(fc ) = 1 þ jcj2 =(4 log 2) þ o(jcj2 ) for c ! 0. This and more general results have been obtained by Ruelle (1982). He also proved that the function d(c) when restricted to the interval [0, 1) is real analytic in [0, 1=4) [ (1=4, 1).
Infinite-Dimensional Dynamical Systems
In many situations the dynamics of the global attractor A of the flow corresponding to an autonomous Navier–Stokes system is finite-dimensional (Ladyzhenskaya, 1972). This means that there exists a positive integer N such that any trajectory in A is completely determined by its orthogonal projection onto an N-dimensional subspace of a Hilbert space X. The aim is to find estimates of box and Hausdorff dimensions of the global attractor, in order to understand some of the basic and challenging problems of turbulence theory. If A is a subset of a Hilbert space X, its Hausdorff dimension is defined analogously as for A RN . The definition of the upper box dimension can be extended from A RN to dimB A :¼ lim"!0
log mðA; "Þ logð1="Þ
½3
Fractal Dimensions in Dynamics
where m(A, ") is the minimal number of balls sufficient to cover a given compact set A X. The value of log m(A, ") is called "-entropy of A. Foias¸ and Temam (1979), Ladyzhenskaya (1982), A V Babin and M I Vishik (1982), Ruelle (1983), and E Lieb (1984) were among the first who obtained explicit upper bounds of Hausdorff and box dimensions of attractors of infinite-dimensional systems. For global attractors A associated with some classes of two-dimensional Navier–Stokes equations with nonhomogeneous boundary conditions it can be shown that dimB A c1 G þ c2 Re3=2 , where G is the Grashof number, Re is the Reynolds number, and ci are positive constants (R M Brown, P A Perry, and Z Shen, 2000). V V Chepyzhov and A Apffiffiffi Ilyin (2004) obtained that dimB A (1= 2 )(1 jj)1=2 G for equations with homogeneous boundary conditions, where R2 is a bounded domain, and 1 is the first eigenvalue of . In the case of periodic boundary conditions Constantin, Foias¸ , and Temam (1988) proved that dimB A c1 G2=3 (1 þ log G)1=3 , while for a special class of external forces there holds dimH A c2 G2=3 (V X Liu, 1993). Let us mention an open problem by V I Arnol’d: is it true that the Hausdorff dimension of any attracting set of the Navier–Stokes equation on two-dimensional torus is growing with the Reynolds number? In their study of partial regularity of solutions of three-dimensional Navier–Stokes equations, L Caffarelli, R Kohn, and L Nirenberg (1982) proved that the one-dimensional Hausdorff measure in space and time (defined by parabolic cylinders) of the singular set of any ‘‘suitable’’ weak solution is equal to zero. A weak solution is said to be singular at a point (x0 , t0 ) if it is essentially unbounded in any of its neighborhoods. Dimensions of attractors of many other classes of partial differential equations (PDEs) have been studied, like for reaction–diffusion systems, wave equations with dissipation, complex Ginzburg– Landau equations, etc. Related questions for nonautonomous PDEs have been considered by V V Chepyzhov and M I Vishik since 1992. Probability
Important examples of trajectories appearing in physics are provided by Brownian motions. Brownian motions ! in RN , N 2, have paths !([0, 1]) of Hausdorff dimension 2 with probability 1, and they are almost surely Hausdorff degenerate, since H2 (!([0, 1])) = 0 for a.e. ! (S J Taylor, 1953). Defining gauge functions h(") := "2 log (1=")
log log log (1=") when N = 2, and h(") := "2 log (1=")
401
when N 3, there holds Hh (!([0, 1])) 2 (0, 1) for a.e. ! (D Ray, 1963, S J Taylor, 1964). If N = 1 then a.e. ! has the box and Hausdorff dimensions of the graph of !j[0, 1] equal to 3/2 (Taylor, 1953), and for the gauge function h(") := "3=2 log log (1=") the corresponding generalized Hausdorff measure is nondegenerate. In the case of N 2 we have the uniform dimension doubling property (R Kaufman, 1969). This means that for a.e. Brownian motion ! there holds dimH !(A) = 2 dimH A for all subsets A [0, 1). There are also results concerning almost sure Hausdorff dimension of double, triple, and multiple points of a Brownian motion and of more general Le´vy stable processes. Fractal dimensions also appear in the study of stochastic differential equations, like dxt ¼ X0 ðxt Þ dt þ
d X
Xk ðxt Þ d k ðtÞ;
x0 ¼ x 2 RN
k¼1
The stochastic flow (xt )t0 in RN is driven by a Brownian motion ( (t))t0 in R d . Let us assume that Xk , k = 0, . . . , d, are C1 -smooth T-periodic divergencefree vector fields on RN . Then for almost every realization of the Brownian motion ( (t))t0 , the set of initial points x generating the flow (xt )t0 with linear escape to infinity (i.e., limt ! 1 (jxt j=t) > 0) is dense and of full Hausdorff dimension N (D Dolgopyat, V Kaloshin, and L Koralov, 2002). Other Directions
There are many other fractal dimensions important for dynamics, like the Re´nyi spectrum for dimensions, correlation dimension, information dimension, Hentschel–Procaccia spectrum for dimensions, packing dimension, and effective fractal dimension. Relations between dimension, entropy, Lyapunov exponents, Gibbs measures, and multifractal rigidity have been investigated by Pesin, Weiss, Barreira, Schmeling, etc. Fractal dimensions are used to study dynamics appearing in Kleinian groups (D Sullivan, C J Bishop, P W Jones, C McMullen, B O Stratmann, etc.), quasiconformal mappings and quasiconformal groups (F W Gehring, J Va¨isa¨la, K Astala, C J Bishop, P Tukia, J W Anderson, P Bonfert-Taylor, E C Taylor, etc.), graph directed Markov systems (R D Mauldin, M Urban´ski, etc.), random walks on fractal graphs (J Kigami, A Telcs, etc.), billiards (H Masur, Y Cheung, P Ba´lint, S Tabachnikov, N Chernov, D Sza´sz, I P To´th, etc.), quantum dynamics (J-M Barbaroux, J-M Combes, H Schulz-Baldes, I Guarneri, etc.), quantum gravity (M Aizenman, A Aharony, M E Cates, T A Witten, G F Lawler, B Duplantier, etc.), harmonic analysis (R S Strichartz, Z M Balogh, J T Tyson, etc.),
402 Fractional Quantum Hall Effect
number theory (L Barreira, M Pollicott, H Weiss, B Stratmann, B Saussol, etc.), Markov processes (R M Blumenthal, R Getoor, S J Taylor, S Jaffard, C Tricot, Y Peres, Y Xiao, etc.), and theoretical computer science (B Ya Ryabko, L Staiger, J H Lutz, E Mayordomo, etc.), and so on. See also: Bifurcations of Periodic Orbits; Chaos and Attractors; Dissipative Dynamical Systems of Infinite Dimension; Dynamical Systems in Mathematical Physics: An Illustration from Water Waves; Ergodic Theory; Generic Properties of Dynamical Systems; Holomorphic Dynamics; Homoclinic Phenomena; Hyperbolic Dynamical Systems; Image Processing: Mathematics; Lyapunov Exponents and Strange Attractors; Partial Differential Equations: Some Examples; Polygonal Billiards; Quantum Ergodicity and Mixing of Eigenfunctions; Stochastic Differential Equations; Synchronization of Chaos; Universality and Renormalization; Wavelets: Applications; Wavelets: Mathematical Theory.
Further Reading Barreira L (2002) Hyperbolicity and recurrence in dynamical systems: a survey of recent results. Resenhas 5(3): 171–230. Chepyzhov VV and Vishik MI (2002) Attractors for Equations of Mathematical Physics, Colloquium Publications, vol. 49. Providence, RI: American Mathematical Society. Chueshov ID (2002) Introduction to the Theory of InfiniteDimensional Dissipative systems, ACTA Scientific Publ. House. Kharkiv. Falconer K (1990) Fractal Geometry. Chichester: Wiley. Falconer K (1997) Techniques in Fractal Geometry. Chichester: Wiley.
Ladyzhenskaya OA (1991) Attractors for Semigroups of Evolution Equations. Cambridge: Cambridge University Press. Lapidus M and van Frankenhuijsen M (eds.) (2004) Fractal Geometry and Applications: A Jubilee of Benoıˆt Mandelbrot, Proc. Sympos. Pure Math., vol. 72, Parts 1 and 2, Providence, RI: American Mathematical Society. Mandelbrot B and Frame M (2002) Fractals. In: Encyclopedia of Physical Science and Technology, 3rd edn., vol. 6, pp. 185–207. San Diego, CA: Academic Press. Mauldin RD and Urban´ski M (2003) Graph Directed Markov Systems: Geometry and Dynamics of Limit Sets, Cambridge Tracts in Mathematics, 148. Cambridge: Cambridge University Press. Palis J and Takens F (1993) Hyperbolicity & Sensitive Chaotic Dynamics at Homoclinic Bifurcations, Cambridge Studies in Advanced Mathematics, 35. Cambridge: Cambridge University Press. Pesin Ya (1997) Dimension Theory in Dynamical Systems: Contemporary Views and Applications, Chicago Lecture Notes in Mathematics. Chicago, IL: University of Chicago Press. Schmeling J and Weiss H (2001) An overview of the dimension theory of dynamical systems. In: Katok A, de la Llave R, Pesin Ya, and Weiss H (eds.) Smooth Ergodic Theory and Its Applications, (Seattle, 1999), Proceedings of Symposia in Pure Mathematics, vol. 69, pp. 429–488. Providence, RI: American Mathematical Society. Tan L (ed.) (2000) The Mandelbrot Set, Theme and Variations. London Mathematical Society Lecture Note Series, 274. Cambridge: Cambridge University Press. Temam R (1997) Infinite-Dimensional Dynamical Systems in Mechanics and Physics, 2nd edn. Applied Mathematical Sciences, 68. New York: Springer. Zinsmeister M (2000) Thermodynamic Formalism and Holomorphic Dynamical Systems, SMF/AMS Texts and Monographs 2. Providence, RI: American Mathematical Society, Socie´te´ Mathe´matique de France, Paris.
Fractional Quantum Hall Effect J K Jain, The Pennsylvania State University, University Park, PA, USA
below) exhibits plateaus on which it is precisely quantized at
ª 2006 Elsevier Ltd. All rights reserved.
RH ¼
Introduction Interacting particles sometimes collectively behave in ways that take us by complete surprise. In a superfluid 4 He atoms flow without viscosity, and in a superconductor electrons flow without resistance. Such behaviors announce emergent structures and principles which have often found applications in other areas. This article concerns the surprising collective effects that occur when electrons are confined in two dimensions and subjected to a strong transverse magnetic field. At low temperatures, the Hall resistance (defined
h fe2
½1
where h and e are fundamental constants and f is a plateau-specific rational fraction. This phenomenon is known as the ‘‘fractional quantum Hall effect’’ (FQHE), or, after its discoverers, the ‘‘Tsui–Stormer– Gossard’’ (TSG) effect. The underlying state provides a new paradigm for collective behavior in nature, and is understood in terms of a new class of quasiparticles known as ‘‘composite fermions,’’ which are topological bound states of electrons and quantized vortices. This article will outline the basics of the experimental phenomenology and our theoretical understanding of this effect.
Fractional Quantum Hall Effect 403
The Hall Effect The Ohm’s law, I = V=R, tells us that the current through a resistor is proportional to the applied voltage. The local form of the law is J ¼ E
½2
where is the conductivity, and J = qv is the current density for particles of charge q and density moving with a velocity v. In 1879, E H Hall discovered that in the presence of a crossed electric and magnetic fields (E and B), the current flows in a direction ‘‘perpendicular’’ to the plane containing the two fields. Alternatively, the passage of current induces a voltage perpendicular to the direction of the current flow. This is known as the Hall effect (see Figure 1). The phenomenon has a classical origin. A consequence of the Lorentz force law of electrodynamics, 1 F ¼q Eþ vB ½3 c which gives the force on a particle of charge q moving with a velocity v, is that for crossed electric and magnetic fields the particle drifts in the direction E B with a velocity v = cE=B. The current density is therefore given by J = qv, where is the (three-dimensional) density of particles. That produces the Hall resistivity H ¼
Ey B ¼ Jx qc
½4
The von Klitzing Effect Molecular beam epitaxy allows controllable layer by layer growth in which one type of semiconductor, say GaAs, can be grown on top of another, say Alx Ga1x As, to produce an atomically sharp interface. By appropriately doping such structures, electrons can
B I I VL
VH
Figure 1 Schematics of magnetotransport measurement. I, VL , and VH are the current, longitudinal voltage, and the Hall voltage, respectively. The longitudinal and Hall resistances are defined as RL VL =I and RH VH =I.
be captured at the interface, thus producing a twodimensional electron system (2DES). We note that these are three-dimensional electrons confined to move in two dimensions. The interaction has the standard Coulomb form V(r) = e2 =r, where is the dielectric constant of the host material. (In a hypothetical world which has only two space dimensions, the interaction would be logarithmic.) The ‘‘integral quantum Hall effect’’ (IQHE) or the ‘‘von Klitzing effect’’ was discovered unexpectedly by von Klitzing and collaborators in 1980, in their study of Hall effect in a 2DES. In two dimensions, one defines the Hall resistance as RH ¼
VH I
½5
which, from classical electrodynamics, is expected to be proportional to the magnetic field B. That is indeed the case at small magnetic fields. At sufficiently high B, however, quantum mechanical effects appear in a dramatic manner. The essential observations are as follows. 1. When plotted as a function of the magnetic field B, the Hall resistance exhibits numerous plateaus. On any given plateau, RH is precisely quantized with values given by RH ¼
h ne2
½6
where n is an integer (hence the name ‘‘integral quantum Hall effect’’). The plateau occurs in the vicinity of Be=hc = n, where is the ‘‘filling factor’’ (defined below). 2. In the plateau region, the longitudinal resistance exhibits an Arrhenius behavior: RL exp 2kB T
½7
This gives a filling-factor dependent energy scale , which indicates the presence of a gap in the excitation spectrum. RL vanishes in the limit T ! 0. The absolute accuracy of the quantization has been established to a few parts in 108 for 1 uncertainty, and the relative accuracy to a few parts in 1010 . There is presently no known ‘‘intrinsic’’ correction to the quantization. Perhaps, the most remarkable aspect of the effect is its universality. It is independent of the sample type, geometry, various material parameters (the band mass of the electron or the dielectric constant of the semiconductor), and disorder. The combination h=e2 also occurs in the definition of the fine
404 Fractional Quantum Hall Effect
structure constant = e2 = hc, the value of which is approximately 1/137. The Hall effect measurements in dirty, solid state systems thus provide one of the most accurate values for . Finally, the lack of resistance at T = 0 is to be contrasted with ordinary metals, for which the resistance at T ! 0, called the residual resistance, is finite and proportional to disorder.
Landau Levels The Hamiltonian for a nonrelativistic electron moving in two space dimensions in a perpendicular magnetic field is given by 1 eA 2 H¼ pþ 2mb c
½8
Here, mb is the electron’s band mass and e its charge. For a uniform magnetic field, the vector potential A satisfies
The TSG Effect The next revolution occurred in 1982 with the discovery of the TSG effect, that is, plateaus on which the Hall resistance is quantized at values given by eqn [1] (see Figure 2). The observation of the RH = h=fe2 plateau is often referred to as the observation of the fraction f. Improvement of experimental conditions has led to the observation of a large number of fractions over the years, revealing the richness of the TSG effect. At the time of the writing of this article, the number of observed fractions is more than 50 if one counts only fractions below unity. As in the von Klitzing effect, the longitudinal resistance exhibits an Arrhenius behavior, vanishing in the limit T ! 0.
Ñ A ¼ B^z
½9
Because A is a linear function of the spatial coordinates, it follows that H is a generalized twodimensional harmonic oscillator Hamiltonian which is quadratic in both the spatial coordinates and in the canonical momentum p = ihÑ, and therefore can be diagonalized exactly. A convenient gauge choice is the symmetric gauge: Br B ¼ ðy; x; 0Þ ½10 2 2 pffiffiffiffiffiffiffiffiffiffiffiffiffi With the magnetic length ‘ = hc=eB and the cyclotron energy h!c = heB=mb c chosen as the A¼
3
2.5
RH (h /e 2)
2
1.5
4 3
2
2/5 3/5 2/3
R
1
0.5
1/3
3/7
4/7
4/9
5/9
4/3 5/3
5/7
0 4/5
0
7/5
10
20
30
Magnetic field (T) Figure 2 The TSG effect. The Hall resistance (RH ) exhibits many precisely quantized plateaus, concurrent with minima in the longitudinal resistance (R). Reproduced with permission from Perspectives in Quantum Hall Effects; HL Stormer and DC Tsui; SD Sarma and A Pinczuk (eds.); Copyright ª 1997, Wiley. Reprinted with permission of John Wiley & Sons, Inc.
Fractional Quantum Hall Effect 405
units for length and energy, the Hamiltonian can be expressed as " # 1 @ y 2 @ x 2 i H¼ þ i þ ½11 2 @x 2 @y 2 Choosing as independent variables z x iy;
z x þ iy
½12
we get H¼
1 @2 1 @ @ 4 þ zz z þ z 2 @z @z @z@z 4
½13
Now define the following sets of ladder operators: 1 z @ þ2 ½14 b ¼ pffiffiffi @z 2 2
1 z @ 2 by ¼ pffiffiffi @z 2 2
½15
1 z @ 2 a ¼ pffiffiffi @z 2 2
½16
1 z @ þ2 a ¼ pffiffiffi @z 2 2
½17
y
½b; by ¼ 1
½18
and all the other commutators are zero. In terms of these operators, the Hamiltonian can be written as H ¼ ay a þ 12
½19
The eigenvalue of ay a is an integer, n, called the Landau level (LL) index. The z-component of the canonical angular momentum operator, the only relevant component for the two-dimensional problem, is defined as @ @ @ Lz ¼ i ¼ z z ¼ ay a by b @ @z @z
ðby Þmþn ðay Þn jm; ni ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi j0; 0i ðm þ nÞ! n!
½23
where m = n, n þ 1, . . . . The single-particle orbital at the bottom of the two ladders defined by the two sets of raising and lowering operators is 1 hrj0; 0i 0;0 ðrÞ ¼ pffiffiffiffiffiffi ezz=4 2
½24
which satisfies aj0; 0i ¼ bj0; 0i ¼ 0
½25
The single-particle states are particularly simple in the lowest Landau level (n = 0): 2
zm ezz=4‘ 0;m ðrÞ ¼ hrj0; mi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ‘2 2m m!
½26
Aside from the ubiquitous Gaussian factor, a general state in the lowest Landau level is given by a polynomial of z; it does not involve any z. In other words, apart from the Gaussian factor, the lowest Landau level wave functions are analytic functions of z. Landau Level Degeneracy
which have the property that ½a; ay ¼ 1;
and
pffiffiffiffiffiffiffi The state 0, m (r) is peaked strongly at r = 2m ‘. Neglecting order-1 effects, there are m statespin ffiffiffiffiffiffiffithe lowest Landau level in a disk of radius r = 2m ‘, giving a degeneracy of (2 ‘2 )1 per unit area per Landau level. (The same degeneracy is obtained for higher Landau levels as well.) It is equal to B= 0 , where 0 = hc=e is called the flux quantum, that is, there is one state per flux quantum in each Landau level. Filling Factor
The number of filled Landau levels, called the filling factor, is given by ¼ 2 ‘2 ¼
½20
Exploiting the property [H, Lz ] = 0, the eigenfunctions will be chosen to diagonalize H and Lz simultaneously. The eigenvalue of Lz will be denoted by m. The analogy to the Harmonic oscillator problem immediately gives the solution Hjm; ni ¼ En jm; ni
½21
1 En ¼ n þ 2
½22
where
0 B
½27
The Origin of Plateaus The von Klitzing effect can be explained in terms of a model which neglects the interactions between electrons. It occurs because the ground state at an integral filling is unique and nondegenerate, separated from excitations by a gap. Laughlin (1981) showed that the disorder-induced Anderson localization also plays a crucial role in the establishment of the Hall plateaus. To see this, imagine changing the filling away from an integer by adding some
406 Fractional Quantum Hall Effect
electrons or holes. In a perfect system, the additional particles would also be free to carry current, but in the actual, disordered sample, they are immobilized by impurities (which create localized states in the energy gap), and do not contribute to transport. The transport properties therefore remain unaffected as the filling factor is varied slightly away from an integer, and the system continues to behave as though it had filled shells.
The Lowest Landau Level Problem The TSG effect arises due to interelectron interaction. We wish to obtain solutions for the Schro¨dinger equation H ¼ E at an arbitrary filling , where 2 X 1 h e e2 X 1 rj þ Aðr j Þ þ H¼ 2mb i c j > > > : jDun jðÞ ! jDujðÞ
The space BV() is clearly a vector space and, with the norm kukBVðÞ ¼ kukL1 ðÞ þ jDujðÞ
½2
it becomes a Banach space. The total variation jDuj() appearing above is intended as jDujðÞ ¼ sup
(
N Z X
) i dDi u : 2
i¼1
N C1 c ð; R Þ;
jj 1
Z 1 N ¼ sup u div dx : 2 Cc ð; R Þ; jj 1
R and is sometimes indicated by jDuj. The space BVloc () is defined in a similar way, requiring that u 2 BV(0 ) for every 0 . From the point of view of functional analysis, the space BV() does not verify the nice properties of Sobolev spaces. In particular,
the Banach space BV() is not separable; the Banach space BV() is not reflexive; and the class of smooth functions is not dense in BV() for the norm [2].
Further properties of the space BV() concern the embeddings into Lebesgue spaces, traces, and Poincare´-type inequalities. More precisely, we have:
Embeddings The space BV() is embedded continuously into LN=(N1) () and compactly into Lp () for every p < N=(N 1). Traces Every function u 2 BV() has a boundary trace which belongs to L1 (@), and the trace operator from BV() into L1 (@) is continuous. Poincare´ inequalities There exist suitable constants c1 and c2 such that for every u 2 BV()
Z
juj dx c1 jDujðÞ þ
Z
Z
N1
juj dH @
ju u j dx c2 jDujðÞ
Z 1 u dx where u ¼ jj
Free Interfaces and Free Discontinuities: Variational Problems
Sets of Finite Perimeter An important class of functions with bounded variation are those that can be written as 1E , the characteristic function of a set E, taking the value 1 on E and 0 elsewhere. This is the natural class where many phase-transition problems with sharp interfaces may be framed. Definition 3 For a measurable set E RN the perimeter of E in is defined as PerðE; Þ ¼ jD1E jðÞ
413
Moreover, the generalized inner normal versor E (x) exists for HN1 -a.e. x 2 @ E, and we have D1E ¼ E ðxÞHN1 9@ E Note that the lower-semicontinuity of jD1E j() entails the lower-semicontinuity of E 7! HN1 ( \ @ E) with respect to the weak -convergence of 1E . As a consequence, we may apply the direct methods of the calculus of variations to obtain, for example, existence of minimizers of Z min PerðE; RN Þ g dx E
The equality above is intended as Per(E, ) = þ1 = BV(). If Per(E, ) < þ1 then the set whenever 1E 2 E is called a set of finite perimeter in . Note that by the compactness property above for BV functions, a family of characteristic functions of sets with finite perimeter in a bounded open set with equibounded perimeter is weakly -precompact, and its limit is of the same form. For a set E of finite perimeter in , we may define the inner normal versor and the reduced boundary as follows. Definition 4 Let E be a set of finite perimeter in . We call reduced boundary @ E the set of all points x 2 \ sptjD1E j such that the limit E ðxÞ ¼ lim
D1E ðBr ðxÞÞ
r!0 jD1E jðBr ðxÞÞ
exists and satisfies jE (x)j = 1. The vector E (x) is called the generalized inner normal versor to E. In order to link the measure-theoretical objects introduced above with some structure property of sets of finite perimeter, we introduce, for every t 2 [0, 1] and every measurable set E RN , the set Et defined by Et ¼
jE \ Br ðxÞj ¼t x 2 RN: lim r!0 jBr ðxÞj
½3
For instance, if E is a smooth domain of RN , E1 is the interior part of E, E0 is its exterior part, while E1=2 is the boundary @E. The main properties of the reduced boundary and of the generalized inner normal versor are stated in the following result. Theorem 5 Let E be a set of finite perimeter in . Then its reduced boundary @ E coincides HN1 -a.e. with the set E1=2 introduced in Definition 3, and we have the equality PerðE; Þ ¼ HN1 ð \ @ EÞ ¼ HN1 ð \ E1=2 Þ
that are sets with prescribed mean curvature g. This lower-semicontinuity property can be further generalized, for example, as in the following result for anisotropic perimeters. Theorem 6 The energy
Let ’ : SN1 ! R be a Borel function. Z
’ðE Þ dHN1
\@ E
is lower-semicontinuous with respect to the weak convergence of 1E in BV() if and only if the positively one-homogeneous extension of ’ from SN1 to RN is convex. This result immediately implies the existence of solutions of isovolumetric problems of the form Z min
’ðE Þ dHN1: jEj ¼ c
@E
whose solutions are obtained by suitably scaling the Wulff shape of ’.
The Structure of BV Functions The simplest situation occurs when N = 1 and so is an interval of the real line. In this case, decomposing the derivative u0 into positive and negative parts, and taking their primitives, we obtain that u 2 BV() if and only if u is the sum of two bounded monotone functions (one increasing and one decreasing). Therefore, in the one-dimensional case, the BV functions share all the properties of monotone functions. The situation is more delicate when N > 1, for which we need the notion of approximate limit. Definition 7 Let u 2 BV(). We say that u has the approximate limit z at x if Z 1 lim juðyÞ zj dy ¼ 0 r!0 jBr ðxÞj Br ðxÞ
414 Free Interfaces and Free Discontinuities: Variational Problems
The set where no approximate limit exists is called the approximate discontinuity set, and is denoted by Su . In a similar way, when x 2 Su we may define the approximate values zþ and z , by requiring that Z 1 lim þ juðyÞ zþ j dy ¼ 0 r!0 jBr ðx; Þj Bþ ðx;Þ r Z 1 juðyÞ z j dy ¼ 0 lim r!0 jBr ðx; Þj B ðx;Þ r where Bþ r ðx; Þ ¼ fy 2 Br ðxÞ: ðy xÞ > 0g B r ðx; Þ ¼ fy 2 Br ðxÞ: ðy xÞ < 0g Analogous definitions can be given in the vectorvalued case, when u 2 BV(; Rm ). The triplet (zþ , z , ) in Definition 7 is unique up to interchanging zþ with z and changing sign to , and is denoted by (uþ (x), u (x), u (x)). We are now in a position to describe the structure of the measure Du when u 2 BV(), or more generally u 2 BV(; Rm ). We first apply the Radon–Nikodym theorem to Du and we decompose it into absolutely continuous and singular parts: Du = (Du)a þ (Du)s . We denote by ru the density of the absolutely continuous part, so that we have Du ¼ ru LN þ ðDuÞs The singular part (Du)s can be further decomposed into an (N 1)-dimensional part, concentrated on the approximate discontinuity set Su , and the remaining part, which vanishes on all sets with finite HN1 measure. More precisely, if u 2 BV(; Rm ), we have Du ¼ ru LN þ ðuþ ðxÞ u ðxÞÞ u ðxÞ HN1 9Su þ ðDuÞc
½4
the three terms on the right-hand side are mutually singular and are, respectively, called the absolutely continuous part, the jump part, and the Cantor part of the gradient measure Du. In the vector-valued case, Du is an m N matrix of finite Borel measures, ru is an m N matrix of functions in L1 (), and the jump term in [4] is an (N 1)-dimensional measure of rank 1. The structure of the Cantor part (Du)c is described by the Alberti’s rank-1 theorem (see Alberti (1993)). Theorem 8 For every u 2 BV(; Rm ) the Cantor part (Du)c is a measure with values in the m N matrices of rank 1.
Convex Functionals on BV Many problems of the calculus of variations deal with the minimization of energies of the form Z FðuÞ ¼ f ðx; u; DuÞ dx ½5
The direct methods to obtain the existence of at least a minimizer require some coercivity hypotheses on F, as well as its lower-semicontinuity. This last issue, already rather delicate when working in Sobolev spaces (see, e.g., Buttazzo (1989) and Dacorogna (1989)), presents additional difficulties when the unknown function u varies in the space BV(), due to the fact that Du is a measure, and the precise meaning of the integral in [5] has to be clarified. In this section, we limit ourselves to consider the simpler situation of convex functionals, and we also assume that the integrand f (x, u, Du) depends only on x and Du. It is then convenient to study the problem in the framework of functionals defined on the space of finite Borel vector measures M(; Rk ). Let f : R N Rk ! [0, þ1] be a Borel function such that
f is lower-semicontinuous, and f (x, ) is convex for every x 2 RN . We denote by f 1 (x, z) the recession function associated with f, given by f 1 ðx; zÞ ¼ lim
t!þ1
f ðx; z0 þ tzÞ t
where z0 is any point in Rk such that f (x, z0 ) < þ1 (in fact, the definition above is independent of the choice of z0 ). Then we may consider the functional Z Z ds FðÞ ¼ f ðx; a ðxÞÞ dx þ f 1 x; djs j ½6 sj dj where = a dx þ s is the Lebesgue–Nikodym decomposition of into absolutely continuous and singular parts, and the notation ds =djs j stands for the density of s with respect to its total variation js j. For simplicity, the last term R on the right-hand side of [6] is often denoted by f 1 (x, s ). For the functional F, the following lowersemicontinuity result holds (see, e.g., Buttazzo (1989). Theorem 9 Under the assumptions above the functional [6] is sequentially lower-semicontinuous for the weak convergence on M(; R k ). Moreover, if f ðx; zÞ c0 jzj aðxÞ with c0 > 0 and a 2 L1 ðÞ
½7
Free Interfaces and Free Discontinuities: Variational Problems
then the functional F turns out to be coercive for the same topology. From Theorem 9 we deduce immediately a lowersemicontinuity result for functionals defined on BV(; Rm ). Corollary 10 Under the assumptions above on the integrand f(with k = mN) the functional defined on BV(; Rm ) by Z f ðx; ðDuÞa Þ dx FðuÞ ¼ Z dðDuÞs 1 þ f x; ½8 djDujs djDujs is sequentially lower-semicontinuous for the weak convergence. Moreover, under the assumption [7] the functional F is coercive with respect to the same topology. For some extensions of the result above to the case when f (x, ) is quasiconvex (in the vector-valued situation m > 1), we refer the interested reader to Fonseca and Mu¨ller (1992) and references therein. Fixing boundary data is another difference between variational problems on Sobolev spaces and on BV spaces. Due to the fact that the class {u 2 BV(): u = u0 on @} is not weakly closed, to set in a correct way a minimum problem of Dirichlet type on BV() with datum u0 2 BV(RN ) it is convenient to consider a larger domain 0 and for every u 2 BV() the extended function u on ~¼ u u0 on 0 n
D~ u ¼ Du9 þ Du0 90 n þ ðu0 uÞ HN1 9@ being the exterior normal versor to . We have then the following functional on BV(0 ): Z Z ~ uÞ ¼ Fð~ f ðx; ðD~ uÞa Þ dx þ f 1 ðx; ðD~ uÞs Þ 0 0 Z Z a f ðx; ðDu0 Þa Þ dx ¼ f ðx; ðDuÞ Þ dx þ 0 n Z Z þ f 1 ðx; ðDuÞs Þ þ f 1 ðx; ðDu0 Þs Þ 0 n Z þ f 1 ðx; ðu0 uÞ Þ dHN1 @
0 n
0 n
irrelevant for the minimization, we end up with the functional Z f 1 ðx; ðu0 uÞ Þ dHN1 Fu0 ðuÞ ¼ FðuÞ þ @
where F is as in [8]. The Dirichlet problem we consider is then Z f 1 ðx; ðu0 uÞ ÞdHN1: min FðuÞ þ @ u 2 BVðÞ ½9 For instance, if f (z) = jzj, problem [9] becomes Z Z N1 jDuj þ ju u0 j dH : u 2 BVðÞ min
@
Under the assumptions considered, the problem above admits a solution u 2 BV(), but in general we do not have u = u0 on @ in the sense of BV traces.
Nonconvex Functionals on BV In order to introduce the class of nonconvex functionals on BV(), let us denote v = Du so that every functional (v) provides an energy F(u). If we work in the setting of Sobolev spaces, we have u 2 W 1, p () (p 1), which implies v 2 Lp (; RN ); now, it happens that in this case all ‘‘interesting’’ functionals are convex. More precisely, it can be proved that a functional : Lp (; RN ) ! [0, þ1], which is
sequentially lower-semicontinuous for the weak
whose distributional gradient is
If we drop the constant term Z Z f ðx; ðDu0 Þa Þ dx þ
415
f 1 ðx; ðDu0 Þs Þ
convergence of Lp (; RN ), and local on Lp (; RN ) in the sense that (v þ w) = (v) þ (w) whenever v w 0 in , has to be necessarily convex, and of the form Z ðvÞ ¼ ðx; vðxÞÞ dx
for a suitable integrand such that (x, ) is convex. Then the energies F(u) defined on Sobolev spaces and obtained by a functional (v) through the identification v = Du are necessarily convex. This is no longer true if is defined on the space M(; R N ) of measures, and hence F is defined on BV(). The first example of a nonconvex functional on M(; RN ) in the literature comes from the so-called Mumford–Shah model for computer vision (see below) and is given by Z ðÞ ¼ ja ðxÞj2 dx þ #ðA Þ
416 Free Interfaces and Free Discontinuities: Variational Problems
where a is the absolutely continuous part of , A is the set of atoms of , and # is the counting measure. The functional is set equal to þ1 on all measures whose singular part s is nonatomic. A general representation result (see Bouchitte and Buttazzo (1992) and references therein) establishes that a functional : M(; RN ) ! [0, þ1], which is
sequentially lower-semicontinuous for the weak
of x and depend only on jvj, the formula above reduces to Z FðuÞ ¼ ðjrujÞdx þ jDujc ðÞ
þ
Z
ðj½ujÞdHN1
½11
Su
convergence of M(; RN ), and local on M(; RN ) in the sense that ( þ ) = () þ () whenever and are mutually singular in ,
where , ,
has to be of the form
In the original Mumford–Shah model for computer vision, is a rectangle of the plane, u0 : ! [0, 1] represents the gray level of a picture, c1 and c2 are positive scale and contrast parameters, and the variational problem under consideration is Z Z jruj2 dx þ c1 ju u0 j2 dx min
ðÞ ¼
Z
Z
a
ðx; Þ d þ 1 ðx; c Þ Z # ðx; ðxÞÞ d# þ
where is a non-negative measure, = a dx þ c þ # is the decomposition of into absolutely continuous, Cantor, and atomic parts, (x, v) is an integrand convex in v, and 1 is its recession function. The novelty is now represented by the integrand (x, v) which has to be subadditive in v and satisfying the compatibility condition lim
t!þ1
ðx; tvÞ ðx; tvÞ ¼ limþ t!0 t t
When has a superlinear growth the condition above gives that the slope of (x, ) at the origin has to be infinite. For instance, in the Mumford–Shah case we have ðx; vÞ ¼ jvj2 ;
ðx; vÞ ¼
1 0
if v 6¼ 0 if v ¼ 0
½10
Coming back to the case u 2 BV(), we have the decomposition (see [4]): Du ¼ ru LN þ ðDuÞc þ ½uu ðxÞ HN1 9Su where we considered, for simplicity, only the scalar case m = 1 and denoted by [u] the jump uþ u . We have then the functional FðuÞ ¼
Z
Z
ðx; ruÞ dx þ 1 ðx; ðDuÞc Þ Z ðx; ½uu Þ dHN1 þ Su
For instance, in the case, when (x, v) and
homogeneous–isotropic (x, v) are independent
satisfy the compatibility condition ¼ 1 ð1Þ ¼ limþ t!0
ðtÞ t
½12
þ c2 HN1 ðSu Þ: ðDuÞc 0
½13
The solution u then represents the reconstructed image, whose contours are given by the jump set Su . We refer to Giorgi and Ambrosio (1988) and to the book by Morel and Solimini (1995) for further details about this model. Analogously, in the case of the study of fractures of an elastic membrane, a problem similar to [13] provides the vertical displacement u of the membrane, together with its fracture set Su . We refer to some recent papers (see Dal Maso and Toader (2002) and Francroft and Marigo (1998), and references therein) for a more detailed description of fracture mechanics problems, even in the more delicate vectorial setting of elasticity. Using the functional F in [11] we have the generalized Mumford–Shah problem, Z 2 min FðuÞ þ c1 ju u0 j dx : u 2 BVðÞ
where is convex, is subadditive, and the compatibility condition [12] is fulfilled. If we set K = Su and assume that it is closed, the Mumford–Shah problem can be rewritten as Z Z jruj2 dx þ c1 ju u0 j2 dx min nK
nK
closed; ðK \ Þ : K u 2 H1 ð n KÞ
þ c2 H
N1
Free Probability Theory
and this justifies the name ‘‘free discontinuity problems,’’ which is often used in this setting. The regularity properties of optimal pairs (u, K) are far from being fully understood; some partial results are available but the Mumford–Shah conjecture:
in the case N = 2 for an optimal pair (u, K) the set K is locally the finite union of C1, 1 arcs remains still open. We refer to Ambrosio et al. (2000) for a list of the regularity results on the problem above that are known thus for.
Further Reading Alberti G (1993) Rank one properties for derivatives of functions with bounded variation. Proc. Roy. Soc. Edinburgh A 123: 239–274. Ambrosio L, Fusco N, and Pallara D (2000) Functions of Bounded Variation and Free Discontinuity Problems. Oxford Mathematical Monographs. Oxford: Clarendon. Bouchitte G and Buttazzo G (1992) Integral representation of nonconvex functionals defined on measures. Ann. Inst. H. Poincare´ Anal. Non Line´aire 9: 101–117. Buttazzo G (1989) Semicontinuity, Relaxation and Integral Representation in the Calculus of Variations. Pitman Res. Notes Math. Ser., vol. 207. Harlow: Longman.
417
Dacorogna B (1989) Direct Methods in the Calculus of Variations. Appl. Math. Sciences. vol. 78. Berlin: Springer. Dal Maso G and Toader R (2002) A model for the quasi-static growth of brittle fractures: existence and approximation results. Arch. Ration. Mech. Anal. 162: 101–135. De Giorgi E and Ambrosio L (1988) New functionals in the calculus of variations. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 82(2): 199–210. Evans LC and Gariepy RF (1992) Measure Theory and Fine Properties of Functions. Studies in Advanced Math Ann. Harbor: CRC Press. Federer H (1969) Geometric Measure Theory. Berlin: Springer. Fonseca I and Mu¨ller S (1992) Quasi-convex integrands and lower semicontinuity in L1 . SIAM J. Math. Anal. 23: 1081–1098. Francfort G and Marigo J-J (1998) Revisiting brittle fracture as an energy minimization problem. J. Mech. Phys. Solids 46: 1319–1342. Giusti E (1984) Minimal Surfaces and Functions of Bounded Variation. Boston: Birkha¨user. Massari U and Miranda M (1984) Minimal Surfaces of Codimension One. Amsterdam: North-Holland. Morel J-M and Solimini S (1995) Variational Methods in Image Segmentation. Progress in Nonlinear Differential Equations and their Applications, vol. 14. Boston: Birkha¨user. Ziemer WP (1989) Weakly Differentiable Functions. Berlin: Springer.
Free Probability Theory D-V Voiculescu, University of California at Berkeley, Berkeley, CA, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Free probability is a probability theory adapted to quantities with the highest degree of noncommutativity. A basic feature of this is that the definition of independence is modified in such a way that the freely independent random variables will not commute in general. The exploration of this notion of independence, which was initially motivated by questions about operator algebras (Voiculescu 1985), has produced a theory that runs parallel to an unexpectedly large part of classical probability theory. The applications of the theory have also gone into unexpected directions, once it turned out that the large-N limit of systems of random matrices is a key asymptotic model in the theory (Voiculescu 1991). There are several signs like the connections to large N for random matrices and to the combinatorics of noncrossing partitions (Speicher 1998) (which correspond to certain planar diagrams), that perhaps these
connections may go even further towards the large-N limit of models in gauge theory. In this article the noncommutative probability and the random matrix angle will be emphasized and very little will be said about the operator algebras and the combinatorics. After discussing free independence and models based on free products of groups and creation and annihilation operators on the Boltzmann full Fock space, we continue with the semicircle law, which is the substitute for the Gauss law in this context, and with the nonlinear free harmonic analysis arising from addition and multiplication of free random variables. We then devote two longer sections to the asymptotic free independence of large random matrices and to free entropy, the free probability analog of Shannon’s information-theoretic entropy for continuous random variables.
Freeness of Noncommutative Random Variables Classical probability deals with expectation values of numerical random variables, that is, with
418 Free Probability Theory
numerical functions on a space of events and with their integrals with respect to a probability measure on the space of events. In noncommutative probability, the random variables, like quantum-mechanical quantities, are elements of a noncommutative algebra A over C, with unit 1 2 A, which is endowed with a linear expectation functional ’ : A ! C, so that ’(1) = 1. Frequently, A is a -algebra of operators on some Hilbert space H and ’(T) = hT , i for some unit vector 2 H. We call (A,’) a noncommutative probability space and the elements a 2 A, noncommutative random variables. In this section we shall discuss the basics around the notion of freeness (Voiculescu 1985), which plays the role of independence in free probability. If = (ai )i2I A is a family of noncommutative random variables, the role of joint distribution is played by the collection of noncommutative moments ’(ai1 . . . ain ). This can also be extended by linearity to a distribution functional : ChXi j i 2 Ii ! C, where ChXi j i 2 Ii is the ring of polynomials in noncommutative indeterminates Xi (i 2 I) and ðPðXi ji 2 IÞÞ ¼ ’ðPðai ji 2 IÞÞ If A is a C -algebra of operators on H, a = a 2 A and ’( ) = h , i, the distribution of a can also be identified with the probability measure a on R a ð!Þ ¼ hEð!; aÞ; i where E( ; a) is the spectral measure of a. Indeed, then Z a ðPðXÞÞ ¼ PðtÞda ðtÞ A family (Ai )i2I A, 1 2 Ai of subalgebras is ‘‘free’’ (which is short for freely independent) if ’ða1 . . . an Þ ¼ 0 whenever aj 2 Aij , 1 j n, ij 6¼ ijþ1 and ’(aj ) = 0. (Here it is only required that consecutive aj ’s be in different Ai ’s. Thus, we may have i1 = i3 , provided i1 6¼ i2 .) A family of sets of random variables (!i )i2I , !i A is free if the algebras Ai generated by 1 [ {!i } are free in (A,’). Except for rather trivial situations, free random variables in (A,’) do not commute. Note also that, as in the case of classical independence, if (!i )i2I are disjoint freely independent sets of random variables, then, if the distributionsS!i (i 2 I) are given, the distribution ! of ! = i2I !i is completely determined. Example 1 Let the group G be the free product of its subgroups (Gi )i2I , that is, G is generated by these
subgroups and there is no nontrivial relation among elements of different Gi ’s. Further, let be the regular representation (g)eh = egh of G on the Hilbert space with orthonormal basis (eg )g2G . Then, with respect to the expectation functional (T) = hTee , ee i on operators on l2 (G), the sets ((Gi ))i2I are freely independent. Example L 2 If H is a complex Hilbert, let T H = k0 Hk denote the full Boltzmann Fock space, with vacuum vector 1 so that H0 = C1. If h 2 H and 2 T H, let l(h) = h denote the left creation operator and ’(X) = hX1, 1i the vacuum expectation. Then, if the Hi (i 2 I) are pairwise orthogonal subspaces in H, the -subalgebras of operators generated by l(Hi ) [ l (Hi ), indexed by i 2 I, are freely independent with respect to ’.
Free Independence with Amalgamation over a Subalgebra The classical notion of conditional independence also has a free counterpart based on the notion of free independence with amalgamation over a subalgebra. This subject is technically more complicated and we will only aim at giving an idea about what kind of concepts are involved. In the classical context, if (X, , ) is a probability space with a -algebra , then the conditional independence with respect to a -subalgebra of events, 0 , amounts to replacing in the definition of independence the expectation functional (which is the integral with respect to ) by the E conditional expectation functional L1 (X, , ) ! 1 L (X, 0 , (0 )). In free probability, one considers an extension of the theory, from the (A,’) framework to an (A,, B) framework (Voiculescu 1995), where A is an algebra with unit over C, B 3 1 is a subalgebra, and : A ! B is B–B-bilinear and jB = idB . Then the definition of B-freeness (or free independence with amalgamation over B) of a family of subalgebras (Ai )i2I , B Ai A requires that ða1 . . . an Þ ¼ 0 whenever aj 2 Aij , ij 6¼ ijþ1 (1 j n), and (aj ) = 0. In the case of a unital -algebra of bounded operators M with an expectation functional ( ) = h, i which is tracial (i.e., ([m1 , m2 ]) = 0 if m1 , m2 2 M) and given a subalgebra 1 2 N M, as in the classical theory, there is a certain canonical construction in operator algebra theory of a ‘‘con ! N, where M, N are ditional expectation’’ : M
Free Probability Theory
algebras of operators obtained as completionseparates from M and N. With this construction, in the trace-state setting there is complete analogy with the classical notion of conditional independence. Several other constructions of free probability have been extended to the (A,, B) B-valued context. A group-theoretic example similar to Example 1 can be constructed from a group G which is a free product with amalgamation over a subgroup H G of subgroups H Gi Gi 2 I. Then A is the algebra constructed from the left-regular representation of G, whereas B is an algebra constructed from the left-regular representation of H.
The Semicircle Law In free probability the semicircle law appears as the limit law in the free central limit theorem (Voiculescu 1985). Here is a weak, rather algebraic, version of this fact: If (an )n2N are freely independent in (A,’) and satisfy the conditions that ’ðan Þ ¼ 0ðn 2 NÞ X ’ a2n ¼ 1 lim N 1 N!1
1nN
sup’ akn ¼ Ck < 1ðk 2 NÞ n2N
P then, if SN = N 1=2 1nN an , we have the convergence of moments of the distribution of SN to the semicircle distribution Z 2 lim ’ðSkN Þ ¼ ð2Þ 1 tk ð4 t2 Þ1=2 dt N!1
2
Thus, the semicircle law, given by the density (2) 1 (4 t2 )1=2 on [ 2, 2] is the free analog of the (0,1) Gauss law. Two coincidences involving the semicircle law should be noted. The field operators s(h) = 2 1 (l(h) þ l(h) ) on the Boltzmann Fock space (Example 2) have semicircle distributions with respect to the vacuum expectation ( ) = h1, 1i. It turns out that this goes farther: if H = HR R C is the complexification of a real Hilbert space, then the map HR 3 h ! s(h) is the analog in free probability of the Gaussian process over the Hilbert space HR (Voiculescu 1985). It is often called the semicircular process over HR . This points to an important connection of free probability to the full Boltzmann statistics. The other coincidence is that the semicircle law is well known as the Wigner limit distribution of
419
eigenvalues of large Gaussian random matrices. As we shall see, this is a clue to a deep connection of free probability to the large-N limit of random matrices (Voiculescu 1991).
Free Convolution Operations In classical probability theory, the distribution of the sum of two independent random variables is computed by the convolution product of their distributions. This has a free probability analog. If a,b are free random variables in (A,’) with distributions a , b : C[X] ! C, then the joint distribution {a, b} is completely determined by a , b and in particular aþb , the distribution of a þ b, also depends only on a , b . It follows that there is an additive free convolution operation þ on distributions so that aþb = aþb whenever a, b are free (Voiculescu 1985). The same can be done with multiplication replacing addition, and this defines the multiplicative free convolution operation by the equation a b = ab , when a, b are free (Voiculescu 1985). A slightly surprising feature of
is that in spite of noncommutativity of a and b, the multiplicative operation turns out to be commutative, which of course is obvious for þ. In the classical context, convolutions are bilinear operations which can be computed using integrals. The free convolutions are quite nonlinear and their computation is via another route, which can also be explained by a classical analogy. Classically, the logarithm of the Fourier transform linearizes convolution, that is, log F ð Þ ¼ log F ðÞ þ log F ð Þ and we may compute as the ( log F ) 1 of log F () þ log F ( ). The linearizing transform for þ is the R-transform (Voiculescu 1986), which is obtained by the following procedure. ! C is a distribution, let G (z) = z 1 þ PIf : C[X] n n 1 , which, in case is a compactly n1 (X )z supported probability measure on R, is the Laurent series at 1 of the Cauchy transform Z dðtÞ t z From this, one obtains, by inversion at 1, the series K , so that G (K (z)) = z and one defines R (z) = K (z) z 1 , which is a power series in z. Then R ¢
¼ R þ R
In case the distribution corresponds to a measure, the formal inversion amounts to inverting an analytic function.
420 Free Probability Theory
For the multiplicative operation þ, it is more convenient to describe an analog of the Mellin transform, that is, no logarithm will be taken. This is the S-transform (Voiculescu 1991), obtained as follows. If : C[X] ! C is Pa distribution with (X) 6¼ 0, one forms (z) = n1 (Xn )zn and its inverse so that ( (z)) = z. Then 1
S ðzÞ ¼ z ð1 þ zÞ ðzÞ has the property that S ¼ S S
The free central limit theorem can be easily proved using the R-transform. Another easy application of the R-transform is to find the free analog of the Poisson law, that is, lim ðð1 a=nÞ 0 þ a=n 1 Þ¢
n!1
where a > 0. The free Poisson law is ¼ ð1 aÞ 0 þ if 0 a 1
if a > 1 where has support [(1 a1=2 )2 , (1 þ a1=2 )2 ] and density (2t) 1 (4a (t (1 þ a))2 )1=2 . This distribution is well known in random matrix theory as the Marchenko–Pastur distribution, again a coincidence pointing to a random matrix theory connection. Because probability measures on R are distributions of self-adjoint operators and a sum of selfadjoint operators is again such an operator, the additive free convolution þ yields an operation on probability measures on R. Similarly, it can be shown that gives rise to operations on probability measures on {z 2 C j jzj = 1} and on probability measures on [0,1). With the R-transform machinery at hand, the free analogs of many of the classical results around addition of independent random variables have been developed (we recommend Voiculescu (1998c) for a survey of these developments). This includes the classification of infinitely divisible laws (Levy– Khintchine type theorem), classification of stable laws, domains of attraction, and convolution semigroups. Note that the free laws are rather different from the classical ones, but the classification results are quite parallel, that is, the indexing parameters are almost the same. The situation is similar in the multiplicative context. As in the classical case, these results about laws yield in particular processes with independent increments, which in the free framework are free increments. As in the classical setting, also in the free setting, convolution semigroups are connected to differential
equations. In the additive free case, a semigroup is a family (t )t0 of probability measures on R, so that tþs = t þ s . If G(t,z) is the Cauchy transform of t (which is an analytic function on the half-plane Im z > 0), the equation (Voiculescu 1986) is a semilinear complex PDE: @G @G þ R1 ðGÞ ¼0 @t @z where R1 is the R-transform of 1 . In particular, when 1 is the semicircle law, R1 (z) = z > 0 and the PDE is a complex Burgers equation in the upper half-plane.
Noncrossing Partitions The series expansion of the R-transform X Rn ðÞzn R ðzÞ ¼ n0
has as coefficients polynomials Rn () in the moments (Xk ). More precisely, assigning to (Xk ) a degree k, Rn () is a polynomial of degree n and Rn () (Xn ) = polynomial in (Xk ) with k < n. The linearization property of the R-transform implies that Rn ð þ Þ ¼ Rn ðÞ þ Rn ð Þ For classical convolution, polynomials with similar properties satisfying Cn ð Þ ¼ Cn ðÞ þ Cn ð Þ are called cummulants and satisfy X log ðezX Þ ¼ Cm ðÞzn n1
There are combinatorial formulas involving the lattice of all partitions of the set {1, . . . , n} which give the classical cummulants. For free cummulants, like Rn () and generalizations of these, there are similar formulas provided the lattice of all partitions is replaced by the lattice NC(n) of noncrossing partitions (Speicher 1998). A partition = (V1 , . . . , Vm ) of {1, . . . , n} is noncrossing if there are no a < b < c < d so that {a, c} Vk , {b, d} Vl and k 6¼ l. More generally, a family R(n) (a1 , . . . , an ) of free cummulants, where a1 , . . . , an are in some (A,’), is defined recursively as follows (Speicher 1998). For n = 1, one has R(1) (a) = ’(a). If = (V1 , . . . , Vm ) 2 NC(n), where Vk = {i(1, k) < < i(nk , k)}, we define Y R½ða1 ; . . . ; an Þ ¼ RðjVk jÞ ðaið1;kÞ ; . . . ; aiðnk ;kÞ Þ 1km
Free Probability Theory
The recurrence relation for cummulants is then X ’ða1 . . . an Þ ¼ R½ða1 ; . . . ; an Þ 2NCðnÞ
Note that the right-hand side involves only R(k) ’s with k n and that actually R(n) appears only in and is equal to R[({1, . . . , n})](a1 , . . . , an ) (the coarsest partition). A key property of R(n) (a1 , . . . , an ) is that if {1, . . . , n} = q and (ak )k2 , (al )l2 are freely independent, then R(n) (a1 , . . . , an ) = 0. If is the distribution of a 2 (A, ’), then the cummulants Rn () are given by Rn ðÞ ¼ RðnÞ ða; . . . ; aÞ The noncrossing condition on partitions corresponds to a planarity requirement for diagrams and as such is very suggestive of connections to planar diagrams occurring in the constant term of large-N expansions from random matrix theory and more generally gauge theory. For more details on the subject of noncrossing partitions, we refer the reader to the memoir by Speicher (1998).
Asymptotic Freeness of Random Matrices The explanation for the coincidences between certain laws in free probability and in random matrix theory is that freeness occurs asymptotically among random matrices in the large-N limit (Voiculescu 1991). Random matrices can be put in a noncommutative probability framework (AN , ’N ), where AN = L1 0 (, MN ; d) (the N N complex matrixvalued functions on the probability space (, d) which are p-integrable for all p 2 [1,1)) and the expectation functional is Z ’N ðXÞ ¼ N 1 tr Xð!Þdð!Þ
The basic example is provided by an n-tuple of Gaussian random matrices (Voiculescu 1991). Let ðNÞ ðNÞ Tj ¼ ap;q;j 2 N; 1 j n 1p; qN
(N) (N) where a(N) p, q; j = aq, p; j and the ap, q; j 1 p q N, 1 j n are (0, N 1 )-Gaussian and independent. Then (Tj(N) )1jn as N ! 1 converges in noncommutative distribution to the freely independent n-tuple (l(ej ) þ l (ej ))1jn in the Boltzmann Fock space
421
context of Example 2 for an orthonormal system e1 , . . . , en 2 H, that is, convergence of moments: ðNÞ ðNÞ lim ’N Ti1 Tik
N!1
¼ hðlðei1 Þ þ l ðei1 ÞÞ ðlðeik Þ þ l ðeik ÞÞ1; 1i In particular, the limit variables (l(ej ) þ l (ej ))1jn are free. More generally, asymptotic freeness of variables or sets of variables in (AN , ’N ) can be defined without the existence of a limit distribution, that is, by requiring only that the freeness relations among noncommutative moments hold asymptotically as N ! 1. Note that in these random matrix questions, the joint classical distribution of an n-tuple of random (N) matrices (X(N) 1 , . . . , Xn ) in AN is a probability n measure on (MN ) which contains more information than the collection of noncommutative moments, which is the distribution of the noncommutative variables in (AN , ’N ). In particular, for one random matrix the classical distribution gives the joint distribution of all entries, whereas the noncommutative distribution gives information only about the distribution of eigenvalues. From the Gaussian n-tuple using operator techniques much more general asymptotic freeness results have been obtained. For instance (Voiculesu 1998b): (N) (N) (N) Let (X(N) 1 , . . . , Xm , Y1 , . . . , Yn ) be (m þ n)tuples of self-adjoint N N random matrices with mþn classical joint distribution N on (Msa . Assume N) that N is invariant under the action of the unitary group U(N) which takes (X1 , . . . , Xm , Y1 , . . . , Yn ) into (X1 , . . . , Xm , UY1 U , . . . , UYn U ) and assume that there is a bound R on the operator norms (N) kX(N) j k and kYj k independent of N. Then the (N) (N) (N) sets {X1 , . . . , X(N) m } and {Y1 , . . . , Yn } are asymptotically free as N ! 1. Note that the uniform bound on the operator norms can be easily replaced by weaker conditions. Once we know that certain random matrices are asymptotically free and that the large-N limit in noncommutative distribution exists, the results of free probability apply. For instance, if X(N) and Y (N) are asymptotically free and have limit distributions and , then the limit distribution of X(N) þ Y (N) and of X(N) Y (N) are the free convolutions þ and, respectively, . Free probability techniques have also been successful in dealing with other questions about the asymptotic behavior of random matrices. If T1(N) , . . . , Tn(N) is an n-tuple of i.i.d. Hermitian Gaussian random, then the uniform operator norms
422 Free Probability Theory
of polynomials in noncommutative indeterminates have the property that ðNÞ
lim kPðT1 ; . . . ; TnðNÞ Þk
N!1
¼ kPðlðe1 Þ þ lðe1 Þ ; . . . ; lðen Þ þ lðen Þ Þk
almost surely (Haagerup and Thorbjoernsen). This result is a far-reaching generalization of the results about largest eigenvalues of one Gaussian random matrix. The use of operator-valued free random variables (with respect to certain subalgebra) was an essential ingredient in the proof. Also, in another direction, freeness of operator-valued free random variables was used to obtain a free probability treatment of Gaussian random band matrices and generalizations of these (Shlyakhtenko 1996). Finally, quite recently, extensions of the free probability framework have appeared which are adapted to the study of fluctuations of systems of random matrices in the large-N limit.
Free Entropy There are free probability analogs also for informationtheoretic quantities (Voiculescu 1994, 1998a). Let (f1 , . . . , fn ) be an n-tuple of classical numerical random variables the joint distribution of which has density p(t1 , . . . , tn ) with respect to the n-dimensional Lebesgue measure n on Rn . The entropy quantity associated by Shannon to (f1 , . . . , fn ) is Z Hðf1 ; . . . ; fn Þ ¼ p log p dn Rn
The free analog of H(f1 , . . . , fn ) is the free entropy quantity (X1 , . . . , Xn ). Here Xj = Xj , 1 j n, are noncommutative self-adjoint random variables in (M,), where M is a -algebra of bounded operators on a Hilbert space H. The expectation functional in addition to the positivity properties, equivalent to the requirement that it can be defined by a unit vector ( ) = h, i, also has the property of a trace (XY) = (YX) for all X, Y 2 M. For instance, the noncommutative random variables arising from the large-N limit of n-tuples of self-adjoint random matrices live in noncommutative probability frameworks (M,) of this kind. There are two approaches to defining free entropy and, since there are only partial results about the equivalence of these approaches, the quantities obtained are denoted by (X1 , . . . , Xn ) (Voiculescu 1994) and (X1 , . . . , Xn ) (Voiculescu 1998a). The quantity is often referred to as the ‘‘microstates free entropy,’’ its definition being inspired by the Boltzmann formula S = k log W, whereas the other entropy, sometimes called ‘‘microstates-free free
entropy,’’ is obtained via a free probability analog of the Fisher information (Voiculescu 1998a). The microstates used to define are matricial and the reason why this choice produced a quantity with the right behavior with respect to free independence can be found in the asymptotic freeness properties of random matrices. Given Xj = Xj 2 M,1 j n and m 2 N, k 2 N, > 0 the microstates (X1 , . . . , Xn ; m, k, ) are n-tuples (A1 , . . . , An ) of self-adjoint k k matrices, such that, for noncommutative moments of order up to m, we have jk 1 trk ðAi1 . . . Aip Þ ðXi1 . . . Xip Þj < where 1 p m, 1 ij n, 1 j p. One obtains (X1 , . . . , Xn ) by taking the infimum over > 0 and m 2 N of n lim sup k 2 log vol ð. . .Þ þ log k 2 k!1 n where vol is the volume on (Msa k ) corresponding to the Hilbert–Schmidt norm Hilbert space structure (Voiculescu 1994). When n = 1, there is a simple formula for (X). If is the probability measure on R which represents the distribution of X = X 2 M with respect to the expectation , then ZZ ðXÞ ¼ log js tjdðsÞdðtÞ þ C
where the exact value of the constant C is 3/4 þ 1/2 log 2. For n > 1 there is no simple formula for (X1 , . . . , Xn ), but there are several properties which provide a better understanding of this quantity. If Xj are such that (Xj ) > 1, then ðX1 ; . . . ; Xn Þ ¼ ðX1 Þ þ þ ðXn Þ if and only if X1 , . . . , Xn are freely independent in (M, ). Clearly, this property of with respect to free independence is analogous to the property of H(f1 , . . . , fn ) with respect to classical independence. Further, if F1 , . . . , Fn are power series in n noncommuting indeterminates, there is a changeof-variable formula ðF1 ðX1 ; . . . ; Xn Þ; . . . ; Fn ðX1 ; . . . ; Xn ÞÞ ¼ log j det jðJ ðFÞÞ þ ðX1 ; . . . ; Xn Þ involving the Kadison–Fuglede positive determinant j det j and a certain noncommutative Jacobian J (F), F = (F1 , . . . , Fn ) defined in Mn M Mop , where Mop is the opposite algebra of M. (For
Free Probability Theory
definitions and the many technical conditions under which this formula holds, see Voiculescu (1994).) The free entropy also satisfies semicontinuity, subadditivity, and a semicircular bound (analogous to the classical Gaussian bound) properties. An unexpected feature of is a degeneration of convexity. If the trace state is a convex combination = 0 þ (1 ) 00 , where 0 , 00 are trace states and where 0 6¼ 00 on the algebra generated by X1 , . . . , Xn , and n > 1, then ðX1 ; . . . ; Xn Þ ¼ 1 (for a reference consult the survey Voiculescu (2002)). With the free entropy at hand, an important variational problem can be formulated for the noncommutative distribution of an n-tuple of selfadjoint noncommutative random variables T1 , . . . , Tn in the tracial context. The quantity to be maximized is ðT1 ; . . . ; Tn Þ ðPðT1 ; . . . ; Tn ÞÞ where P is a given self-adjoint polynomial in noncommutative indeterminates (see Voiculescu (2002) for comments on this problem). If n = 1, this is a classical problem for the logarithmic energy ZZ Z log js tjdðsÞdðtÞ PðtÞdðtÞ
423
The analog in free probability of the Fisher information (Voiculescu 1998a) is obtained by using the free difference quotient derivations, which are the appropriate derivations in this maximally noncommutative setting. On the polynomials in n noncommutative indeterminates, the kth partial free difference quotient @k : ChX1 ; . . . ; Xn i ! ChX1 ; . . . ; Xn i2 is defined on noncommutative monomials by the formula X @k Xi1 Xip ¼ Xi1 Xij 1 Xijþ1 Xip fjjij ¼kg
If Xj = Xj , 1 j n, are noncommutative random variables in (M, ), which do not satisfy any nontrivial algebraic relations, to simplify matters we can assume that M is generated by X1 , . . . , Xn and identify M with ChX1 , . . . , Xn i. The trace state gives rise to a scalar product hm1 , m2 i = (m2 m1 ) on M. Let L2 (M, ) denote the Hilbert space obtained from M. Then, skipping some technicalities, @k will give rise to a densely defined operator of L2 (M, ) into L2 (M, ) L2 (M, ). If 1 1 is in the domain of the adjoints @k , the free Fisher information of the n-tuple X1 , . . . , Xn is defined to be X k@k ð1 1Þk2L2 ðM;Þ ðX1 ; . . . ; Xn Þ ¼ 1kn
where is a probability measure on R. To explain the second approach, based on Fisher information, we begin by recalling some facts about Fisher information in the classical context. If f is a numerical random variable with distribution given by the density p(t) on R, then
Z 0 2
d
p Fisherðf Þ ¼ p dt ¼
1
2 dt p L ðR;p dtÞ Here d=dt is the differential operator defined on test functions in L2 (R, p dt). Then p0 d ¼ 1 dt p The classical connection to entropy is that the Fisher information is a derivative of the entropy when the variable becomes the starting point of a Brownian motion. This can be written as Fisherðf Þ ¼
d Hðf þ t1=2 gÞjt¼0 dt
where g and f are independent and g is (0, 1) Gaussian. The several-variables version is treated by using partial derivatives.
In case 1 1 is not in the domain of some @k , the free Fisher information is given the value 1. The ‘‘microstates-free free entropy’’ is then defined by ðX1 ; . . . ; Xn Þ ¼
n log 2e 2 Z 1 n þ 1þt 0
ðX1 þ t1=2 S1 ; . . . ; Xn þ t1=2 Sn Þ dt
where S1 , . . . , Sn are (0, 1)-semicircular and freely independent and also freely independent of {X1 , . . . , Xn }. For n = 1 it is known that (X) = (X) and the free Fisher information is Z 22 ðXÞ ¼ p3 ðtÞdt 3 if p(t) is the density with respect to the Lebesgue measure of the distribution of X. The computation of @ 1 1 is possible in the one-variable case and up to a factor the result is (Hp)(X), where Hp is the Hilbert transform of p.
424 Free Probability Theory
Several of the classical inequalities for the Fisher information have free probability analogs (Voiculescu 1998a) (Cramer–Rao inequality, Stam inequality, information-log-Sobolev inequality, and others). For n > 1 only , the easier of the inequalities among and , has been established (Biane et al. 2003). This result was obtained based on an important connection of and to large deviations. The deviations studied are for the noncommutative distributions of n-tuples of matrices in the case of an n-tuple of Gaussian random matrices. In this context is related to the quantity to be estimated and is related to the rate function. For more details on free entropy, the reader is referred to the survey articles by Voiculescu (1998c, 2002).
Concluding Comments For more details, additional results, and bibliography, we refer the reader to the expositions in Voiculescu (1998c), Voiculescu et al. (1992) and Speicher (1998). To get even more detail, the reader may consult, besides the original papers of the present author, those of P Biane, R Speicher, D Shlyakhtenko, K J Dykema, A Nica, U Haagerup, H Bercovici, L Ge, F Radulescu, A Guionnet, T Cabanal–Duvillard, M Anshelevich, to name a few of the main contributors. Also, via random matrices, there are connections to physics models (especially large-N 2D Yang–Mills QCD) in work of I M Singer, M Douglas, D Gross– R Gopakumar, P Zinn–Justin. In a loose sense, one may view the noncrossing partitions combinatorics as related to the work on planar diagrams and the large-N limit of t’Hooft and Brezin–Itzykson–Parisi– Zuber in the 1970s.
Further Reading Biane P, Capitaine M, and Guionnet A (2003) Large deviation bounds for matrix Brownian motion. Inventiones Mathematicae 152: 433–459. Haagerup U and Thorbjoernsen S (2005) A new application of random matrices: Ext(Cr (F2 )) is not a group. Annals of Mathematics 162(2). Shlyakhtenko D (1996) Random Gaussian band matrices and freeness with amalgamation. International Mathematics Research Notices 20: 1013–1025. Speicher R (1998) Combinatorial theory of the free product with amalgamation and operator-valued free probability. Memoirs of the American Mathematical Society 627. Voiculescu D (1985) Symmetries of some reduced free product C -algebras. In: Araki H, Moore CC, Stratila S-V, and Voiculescu DV (eds.) Operator Algebras and Their Connections with Topology and Ergodic Theory, Lecture Notes in Mathematics,, vol. 1132, pp. 556–588. Springer. Voiculescu D (1986) Addition of certain non-commuting random variables. Journal of Functional Analysis 66: 323–346. Voiculescu D (1987) Multiplication of certain non-commuting random variables. Journal of Operator Theory 18: 223–235. Voiculescu D (1991) Limit laws for random matrices and free products. Inventiones Mathematicae 104: 201–220. Voiculescu D (1994) The analogues of entropy and of Fisher’s information measure in free probability theory II. Inventiones Mathematicae 118: 411–440. Voiculescu D (1995) Operations on certain noncommuting operator-valued random variables. Asterisque 232: 243–275. Voiculescu D (1998a) The analogues of entropy and of Fisher’s information measure in free probability theory V: Noncommutative Hilbert transforms. Inventiones Mathematicae 132: 182–227. Voiculescu D (1998b) A strengthened asymptotic freeness result for random matrices with applications to free entropy. International Mathematics Research Notices 1: 41–63. Voiculescu D (1998c) Lectures on free probability theory. In: Lectures on Probability Theory and Statistics, Ecole d’Ete de Probabilites de Saint-Flour XXVIII–1998, Lecture Notes in Mathematics,, vol. 1738, pp. 280–349. Berlin: Springer. Voiculescu D (2002) Free entropy. Bulletin of London Mathematical Society 34: 257–278. Voiculescu D, Dykema KJ, and Nica A (1992) Free Random Variables, CRM Monograph Series, vol. 1. Providence: American Mathematical Society.
See also: Large Deviations in Equilibrium Statistical Mechanics; Large-N and Topological Strings; Random Matrix Theory in Physics.
Frobenius Manifolds see WDVV Equations and Frobenius Manifolds
Functional Equations and Integrable Systems 425
Functional Equations and Integrable Systems H W Braden, University of Edinburgh, Edinburgh, UK ª 2006 Elsevier Ltd. All rights reserved.
Introduction Functional equations have a long and interesting history in connection with mathematical physics and touch upon many branches of mathematics. They have arisen in the context of both classical and quantum completely integrable systems in several different ways and we shall survey some of these. In the great majority of cases functional equations appear in the integrable system setting as the result of an ansatz: a particular form of a solution is either guessed or postulated, the consistency of which yields a functional equation. What the ansatz is for can vary significantly. As outlined below, amongst others, one may postulate algebraic structures in the form of the existence of a Lax pair or of conserved quantities; in the quantum setting, one may postulate properties of a ground-state wave function or the ring of commuting differential operators. Appearing in this way, functional equations are really just another of the (significant) tools-of-thetrade for constructing and discovering new integrable systems. However, as one surveys both the functional equations and the functions they describe one sees certain common features. The functions are most frequently associated with an elliptic curve, a genus-1 abelian variety. One can seek to associate these to another fundamental ingredient of modern integrable systems, the Baker–Akhiezer function. Indeed, very few of the ansa¨tze made directly suggest that the systems being constructed will be completely integrable. This very desirable property usually is a bonus of the construction and hints of more fundamental connections. Another fundamental connection we shall mention is that with topology. The phase space of a completely integrable system is rather special, admitting (generically) a foliation by tori. The functional equations we encounter often also characterize the Hirzebruch genera associated with the index theorems of known elliptic operators. These are typically evaluated by Atiyah–Bott fixed-point theorems for circle actions on the manifold. A general understanding of the various interconnections has yet to be achieved. To bring to focus our discussion we shall concentrate on functional equations arising from studying systems with an arbitrary number of particles (n below). In principle, there could be many different interactions between the particles and symmetry will
be used to limit these. The use of symmetry is a key ingredient, often implicit, in the various ansa¨tze we shall describe. For simplicity, we shall most often focus on the situation where the particles are identical. In algebraic terms, we focus on the symmetric group Sn and root systems of type an ; generalizations frequently exist for other root systems and Weyl groups and we shall simply note this at the outset.
Lax Pairs The modern approach to integrable systems is to utilize a Lax pair, that is, a pair of matrices L, M such that the zero curvature condition L_ = [L, M] is equivalent to the equations of motion. By construction, Lax pairs produce the conserved quantities tr Lk . To establish integrability, one must further show both that there are enough functionally independent conserved quantities and that these are in involution. (R-matrices are the additional ingredient of the modern approach to establishing involutivity.) Lax pairs can fail on both counts, and so the construction of a Lax pair is but the first step in establishing a system to be completely integrable. The great merit of the modern approach is that it provides a unified framework for treating the many disparate completely integrable systems known. Unfortunately the construction of a Lax pair is often far from straightforward and typically hides the ‘‘clever tricks’’ frequently employed in establishing integrability. In the present context, we shall outline how functional equations have been used to construct Lax pairs. The paradigm for this approach is the Calogero–Moser system. Beginning with the ansatz (for n n matrices) Ljk ¼ pj jk þ gð1 jk ÞAðqj qk Þ 2 3 X Mjk ¼ g4jk Bðqj ql Þ ð1 jk ÞCðqj qk Þ5 l 6¼ j
one finds L_ = [L, M] yields the equations of motion for the Hamiltonian system (n 3) X 1X 2 H¼ pj þ g2 Uðqj qk Þ 2 j ½1 j > > > > 1 1 1 2 > > < ¼: Xð1Þ dz þ Xð2Þ dz sin 1 cos 2 > > dz þ dz d ¼ > > > r r > > > : ¼: X2 dz1 þ X2 dz2 ð1Þ ð2Þ
½71
Functional Integration in Quantum Physics
frame uB ) into the space of pointed paths x on MD (paths such that x(tb ) = xb ):
The dynamical vector fields are, therefore, @ sin @ @r r @
½72
@ cos @ þ @r r @
½73
Xð1Þ ¼ cos Xð2Þ ¼ sin
Here h = and eqn [69] reads
@ s @2 1 @2 1@ ¼ þ þ @t 4 @r2 r2 @2 r @r
½75
In order to bring eqn [75] in the form [64], we think of a frame u(t) as a linear map from RD into the tangent space Tx(t) MD : ½76
ðta ; xÞ ¼ ðxÞ
¼ gij Di Dj
Choose a basis {e(A) } in RD and {e() } in Tx(t) MD such that 1
z_ ðtÞ ¼ z_ A ðtÞeðAÞ ¼ uðtÞ ðx_ ðtÞeðÞ Þ
and Di is the covariant derivative defined by the Riemann connection .
Semiclassical Expansions Classical mechanics is a limit of quantum mechanics; therefore, it is natural to expand the action functional S of a given system around, or near, its classical value – namely its minimum S(q), where q is a solution of the Euler–Lagrange equation,
þ
1 00 S ðqÞ 2!
1 000 S ðqÞ þ 3!
½87
x : T ! MD
½79
and , 2 Tq X is a vector field at q 2 X. The second variation of S is called its Hessian ½80
The construction [64]–[69] gives a parabolic equation on the bundle. If the connection is the metric connection, then the parabolic equation on the bundle gives, by projection on the base space, the parabolic equation with the Laplace–Beltrami operator. Explicitly, the projection on the base space of [67] is Z ðtb ; xb Þ :¼ Ds; Q ðzÞ exp QðzÞ s P b RD ððDev zÞðta ÞÞ
SðxÞ ¼ SðqÞ þ S 0 ðqÞ þ
where x 2 X is a path
where the dynamical vector fields are XðAÞ ð ðtÞÞ ¼ ð ð ðtÞÞ uðtÞÞ eðAÞ
½86
Set
½78
Insert u(t) u(t)1 into [75], then _ ¼ XðAÞ ð ðtÞÞ z_ A ðtÞ
ðtÞ
½84
½85
S 0 ðqÞ ¼ 0 ½77
½83
where is the Laplace–Beltrami operator on (MD , g),
Let _ z_ ðtÞ :¼ uðtÞ1 xðtÞ
½82
is the projection on the base space. The path integral [81] is the solution of the equations
½74
Example (Paths with values on a Riemannian manifold (MD , g)). Consider the frame bundle over MD and a connection defining the horizontal lift _ _ of a vector x(t),
(t)
uðtÞ : R D ! TxðtÞ MD
ð ÞðtÞ ¼: ðDev zÞðtÞ
@ s ðtb ; xb Þ ðtb ; xb Þ ¼ @tb 4
This example is trivial because x(t, z) is not a functional of z but a function of z(t) given by [70]. In the following example, x(t, z) is a functional of z.
_ _ ¼ ð ðtÞÞ xðtÞ
ðtÞ
441
½81
where Dev is the Cartan development map, namely the bijection, defined by [82], from the space of pointed paths z on Tb MD (identified to RD via the
S 00 ðqÞ ¼: Hessðq; ; Þ
½88
The arena of semiclassical expansions of a functional integral schematically written as Z I¼ Dx expðiSðxÞ=hÞ ððxðta ÞÞÞ ½89 Xa;b
consists of the intersection Ua, b of two spaces X a, b X the space of paths satisfying D initial conditions (a) and D final conditions (b), and U2D (S) the space of critical points of S q 2 U2D ðSÞ;
S 0 ðqÞ ¼ 0
Ua;b :¼ X a;b \ U2D ðSÞ
½90 ½91
442 Functional Integration in Quantum Physics
a, b
D
proof of [94] rests on the following property of quadratic forms Q. Let L : X ! Y linearly and
Ua, b
D
QX ¼ QY L
U 2D(S ) h
Ua, b : U 2D(S )
a, b
According to the notations used in [26], [27], Z DX ðxÞ exp QX ðxÞ ¼ 1 s X
D
Figure 4 Intersection of the space P a, b MD (abbreviated to X a, b ) of paths on MD with fixed points, and the 2D-dimensional space U 2D (S) of critical points of the system S. (Adapted from a Plenum Press publication with permission by Springer-Verlag.)
The nature of the intersection Ua, b determines the behavior of the system S. Figure 4 shows the intersection of the space X a, b of paths on MD with fixed points. It also shows the space U2D (S) of critical points of S. We consider first the case in which Ua, b consists of a single point q, or several isolated points q(i) . The semiclassical expansion consists in dropping the terms beyond the Hessian:
Z 2i 1 IWKB :¼ D exp SðqÞ þ S 00 ðqÞ h 2 Xb ðxðta ÞÞ
½92
where the initial wave function accounts for the D initial conditions of the system, and X b is the space of pointed paths xðtb Þ ¼ xb ; and ðtb Þ ¼ 0
for every x 2 X b
½93
The integral IWKB is the Gaussian defined by the Hessian. Explicit calculations of IWKB exploit the power of Jacobi fields of S at q. Example (Momentum-to-position transitions) (e.g., Cartier and DeWitt-Morette 2006). We have
½94
where S is the action function (a.k.a. Hamilton’s principal function) Sðqðtb Þ; pðta ÞÞ ¼ SðqÞ þ hpa ; xðta Þi
According to [35], [27], Z 1¼ DX ðxÞ exp QX ðxÞ s ZX ¼ DY ðLxÞ exp QY ðLxÞ s Y Z ¼ jdetLj DY ðxÞ exp QX ðxÞ s
½95
where the classical path q is characterized by its initial momentum pa and its final position xb . The
½97
½98 ½99
If s = 1, that is, if QX and QY are positive definite, then Z DY ðxÞ expðQX ðxÞÞ ¼¼ detðQX =QY Þ1=2 ½100 X
If s = i, that is, for Feynman integrals Z DY ðxÞ expðQX ðxÞÞ X
¼ jdetðQX =QY Þj1=2 iIndðQX =QY Þ
½101
where ‘‘Ind(QX =QY )’’ is the ratio of the numbers of negative pffiffiffiffiffiffi eigenvalues of QX and QY respectively, and i = 1 = ei=2 . Equation [100] is a key equation for semiclassical expansions where it is convenient to break up the second variation S 00 (q) into two quadratic forms: S 00 ðqÞ ¼ Q0 ðÞ þ QðÞ
WKB Approximations
2i Sðqðtb Þ; pðta ÞÞ IWKB ðxb ; tb ; pa ; ta Þ ¼ exp h
1=2 @2S det i @q ðtb Þ@pj ðta Þ
½96
½102
where Q0 is the kinetic energy. The quadratic form Q0 is a convenient Gaussian volume element for computing [92]. Moreover, splitting the Hessian into Q0 þ Q corresponds to splitting the system into a ‘‘free’’ system and a perturbation. In eqns [100] and [101] the determinant of the ratios of the infinite-dimensional quadratic forms QX =QY have been shown (Cartier and DeWittMorette 2006) to be a finite-dimensional determinant, thanks to Jacobi field technology. Degenerate Hessians; Beyond WKB
When Ua, b consists of isolated points, the Hessian is not degenerate, and the semiclassical expansion is usually called the (strict) WKB approximation. When the Hessian is degenerate, S 00 ðqÞ ¼ 0
for
6¼ 0
½103
Functional Integration in Quantum Physics
443
there is at least one nonzero Jacobi field h along q, S 00 ðqÞh ¼ 0;
h 2 Tq U2D ðSÞ
½104
with D vanishing initial conditions (a) and D vanishing final conditions (b). Equation [104] is the defining equation of Jacobi fields. The vanishing boundary conditions imply that h 2 Tq X a, b as well as being a Jacobi field. For understanding the intersections Ua, b when the Hessian is degenerate, one can construct the following basis for the intersecting tangent spaces Tq U2D (S) and Tq X a, b :
Basis for Tq U2D (S): a complete set (if it exists) of linearly independent Jacobi fields. It can be constructed by varying the 2D conditions (a), (b) satisfied by q 2 X a, b . Basis for Tq X a, b : a complete set of orthonormal eigenvectors {k } of the Jacobi operator J (q) defined by the Hessian S 00 ðqÞ ¼: hhJ ðqÞ; i; i J ðqÞk ¼ k k ;
k 2 f0; 1; . . .g
½105 ½106
The basis {k } diagonalizes the Hessian. When the Hessian is degenerate, there is at least one eigenvector of J (q) with zero eigenvalue. 1. The intersection Ua, b is of dimension l > 0. Let {uk } be the coordinates of in the {k } basis of Tq X a, b . Then the diagonalized Hessian is S 00 ðqÞ ¼
1 X
k ðuk Þ2
½107
k¼0
There are l zero eigenvalues {k } when the system of Euler–Lagrange equations decouples (possibly after a change of variable in X a, b ) into two sets: l constraint equations, and D l equations determining D l coordinates {qA } of q. Say l = 1, for simplicity. Then SðxÞ ¼ SðqÞ þ c0 u0 þ
1 1X ðuk Þ2 2 k¼1 k
þ Oðjuj3 Þ
½108
where
Figure 5 A flow of particles scattered by a repulsive Coulomb potential. (Reprinted from Physical Review D with permission by the American Physical Society.)
decomposes into the product of an ordinary integral over u0 and a Gaussian functional integral defined by a nondegenerate quadratic form. The integral over u0 yields a Dirac -function, (c0 =h). The propagator vanishes unless the conservation law c0 = 0 is satisfied. Conservation laws appear in the classical limit of quantum physics. The quantum system may have less symmetry than its classical limit. 2. The intersection Ua, b is a multiple root of the Euler–Lagrange equation. The flow of classical solutions has an envelope, known as a caustic. Caustics abound in physics: the soap bubble problem, scattering of particles by a repulsive Coulomb potential (see Figure 5), rainbow scattering from a source at infinity, glory scattering etc. (Cartier and DeWitt-Morette 2006). Let us consider a specific example for simplicity. For instance, the scattering of particles of given momenta pa by a repulsive Coulomb potential. Let q and q be two solutions of the Euler– Lagrange equation with slightly different boundary conditions at tb . Compute I(x b , tb ; pa , ta ) by expanding the action functional not around q but around q. The path q is not in X a, b and the expansion of the action functional has to be carried up to and including the third variation. As before, let {uk } be the coordinates of in the base {k }, k 2 {0, 1 . . . }. The integral over u0 is an Airy integral Z
3 1=3 1=3
Ai c du0 exp i cu0 þ u0 ½110 3 R where
¼
Z
S dt j j0 ðtÞ c0 ¼ q ðtÞ T
½109
The change of variable ! {uk } is a linear change of variable of type [33]. The integral [92]
h
Z
dr
Z
T
ds
T
Z T
dt
3 S q ðrÞq ðsÞq ðtÞ
½111
0 ðrÞ0 ðsÞ0 ðtÞ
c¼
2 h
Z T
dt
S 0 ðtÞ x b xb qðtÞ
½112
444 Functional Integration in Quantum Physics
The leading contribution of the Airy function when h tends to zero can be computed by the stationary phase method. When x b is in the ‘‘illuminated’’ region, the probability amplitude I(x b , tb ; pa , ta ) oscillates rapidly as h tends to zero. When x b is in the ‘‘dark’’ region, the probability amplitude decays exponentially. Quantum mechanics softens up the caustics. The two kinds of degeneracies described in sections (1) and (2) may occur simultaneously. This happens, for instance, in glory scattering for which the cross section, to leading terms in the semiclassical expansions, has been obtained by functional integration in closed form in terms of Bessel functions (Cartier and DeWitt-Morette 2005). 3. The intersection Ua, b is the empty set. There is no classical solution corresponding to the quantum transition. This phenomenon, called ‘‘tunneling’’ or ‘‘barrier penetration,’’ is a rich chapter of quantum physics which can be found in most of the books listed under ‘‘Further reading.’’
A Multipurpose Tool Functional integration provides insight and techniques to quantum physics not available from the operator formalism. Just as an example, one can quote the section ‘‘Beyond WKB’’ which has often been dismissed in the operator formalism by stating that ‘‘WKB breaks down’’ in such cases. The power of functional integration stems from the power of infinite-dimensional spaces. For instance, compare the Lagrangian of a system with its action functional Z _ SðxÞ ¼ dt LðxðtÞ; xðtÞÞ; x 2 X a;b T
x : T ! MD ;
S : X a;b ! R
½113
A classical solution q of the system can be defined either by a solution of the Euler–Lagrange equation, together with the boundary conditions dictated by q 2 X a, b or by an extremum of the action functional, S 0 (q) = 0. The path q is a significant point in X a, b but it is not isolated and the Hessian S 00 (q) gives much information on q, such as conservation laws, caustics, tunneling. A list of applications is beyond the scope of this article. We treat only two applications, then give in the ‘‘Further reading’’ section a short list of books that develop such applications as polarons, phase transitions, properties of quantum gases, scattering processes, many-body theory of bosons and fermions, knot invariants, quantum crystals, quantum field theory, anomalies, etc.
The Homotopy Theorem for Paths Taking Their Values in a Multiply-Connected Space
The space X a, b of paths x x : T ! MD ;
x 2 X a;b
probes the global properties of their ranges MD . When MD is multiply connected, X a, b is the sum of distinct homotopy classes of paths. The integral over X a, b is a linear combination of integrals over each homotopy class of paths. The coefficients of this linear combinations are provided by the homotopy theorem. The principle of superposition of quantum states requires the probability amplitude for a given transition to be a linear combination of probability amplitudes. It follows that the absolute value of the probability amplitude for a transition from the state a at ta to the state b at tb has the form X ½114 ðÞK ðb; tb ; a; ta Þ jKðb; tb ; a; ta Þj ¼ where K is the interval over paths in the same homotopy class. The homotopy theorem (Laidlaw and Morette-DeWitt 1971) and (Schulman 1971) in Cartier and DeWitt-Morette (2006)) states that the set {()} forms a representation of the fundamental group of the multiply connected space MD . One cannot label a homotopy class by an element of the fundamental group unless one has chosen a point c 2 MD and a homotopy class for paths going from c to a and for paths going from c to b – in brief, unless one has chosen a homotopy mesh on MD . The fundamental group based at c is isomorphic to the fundamental group based at any other point of MD but not canonically so. Therefore, eqn [114] is only an equality between absolute values of probability amplitudes. The proof of the homotopy theorem consists in requiring [114] to be independent of the chosen homotopy mesh. Application: Systems of n-Indistinguishable Particles in R D
In order that there be a one-to-one correspondence between the system and its configuration space, x : T ! RDn =Sn ¼: R D;n where Sn is the symmetric group for n permutations; the coincidence points in R D, n are excluded so that Sn acts effectively on RD, n . Note that R1, n is not connected, but R2, n is multiply connected. When D 3, RD, n is simply connected and the fundamental group on RD, n is isomorphic to Sn .
Functional Integration in Quantum Physics
There are only two scalar unitary representations of Sn : B
: 2 Sn ! 1 1 F : 2 Sn ! 1
for all permutations for even permutations for odd permutations
Therefore, in R3 there are two different propagators of indistinguishable particles: X Kbose ¼ B ðÞK ½115
is a symmetric propagator X Kfermi ¼ F ðÞK
445
then G ¼ G 1 G 2
½120
Explicitly, in QFT Z dG ð’Þ expð2ih J; ’iÞ Z Z ¼ dG2 ð’2 Þ dG1 ð’1 Þ
expð2ih J; ’1 þ ’2 iÞ
½121
where ’ ¼ ’1 þ ’2 ½116
is an antisymmetric propagator. The arguments leading to the existence of (scalar) bosons and fermions in R 3 fails in R2 . Statistics cannot be assigned to particles in R 2 ; particles ‘‘without’’ statistics have been called anyons. Application: a Spinning Top
Schulman’s analysis of the Schro¨dinger equation for a spinning top (Schulmann 1968) motivated the formulation of the homotopy theorem. Therefore, Schulman’s results can easily be formulated as an application of [114].
The additive property [119] makes it possible to express a covariance G as an integral over an independent scale variable. Let 2 [0, 1] be an independent scale variable. (some authors use 2 [1, 1[ and 1 2 [0, 1[). A scale variable has no physical dimension: ½ ¼ 0
½123
The scaling operator S acting on a function f of length dimension [f ] is by definition S f ðxÞ :¼ ½f f ðx=Þ
S ½a; b½ ¼ ½a=; b=½
Scaling Properties of Gaussians
We rewrite the definition [26] of Gaussian volume elements as Z dG ðxÞ expð2ihx0 ; xiÞ :¼ expðiWðx0 ÞÞ ½117 X
where the covariance G is defined by the variance W, Wðx0 Þ ¼ hx0 ; Gx0 i In quantum field theory the definition [26] reads Z dG ð’Þ expð2ih J; ’iÞ :¼ expðiWð JÞÞ ½118
½124
the scaling of an interval [a, b[ is given by S [a, b[ = {s=js 2 [a, b[}, that is,
Application: Instantons (DeWitt 2004)
The homotopy theorem reformulated for functional integrals applies to the total houtjini amplitude of instantons in Minkowski spacetime.
½122
½125
The scaling of a functional F is ðS FÞð’Þ ¼ FðS ’Þ
½126
In order to decompose a covariance into an integral of scale-dependent contributions we note that a covariance G is a two-point function [31]. In quantum field theory [118], the engineering length dimension of G is twice the field dimension ½G ¼ 2½’
½127
Let x, y 2 spacetime and G be a Laplacian Green function. One can introduce a scaled (truncated) Green function Z l G½l0 ;l½ ðx; yÞ :¼ d s Ss=l0 uðjx yjÞ ½128 l0
where
where ’ is a field on spacetime (Minkowski, or Euclidean) and J is called the source. A Gaussian G can be decomposed into the convolution of any number of Gaussians. For example, if W ¼ W1 þ W2 ! G ¼ G1 þ G2
½119
½l ¼ 1;
½s ¼ 1;
l0 l;
½u ¼ ½G
d s ¼ ds=s ½129
such that lim
l0 ¼0;l¼1
G½l0 ;l ½ ðx; yÞ ¼ Gðx; yÞ
½130
446 Functional Integration in Quantum Physics
Example G(x, y) = cD =jx yjD2 ; then the only requirement on the function u in [128] is Z
1
d rr2½’ uðrÞ ¼ cD ;
½r ¼ 1
½131
0
All objects defined by the scaled covariance [128] are labeled with the interval [l0 , l[. For instance, a Gaussian volume element G[l0 , l[ is abbreviated to [l0 , l[ . A Coarse-Graining Operator
The following coarse-graining operator has been used for constructing a parabolic semigroup equation in the scaling variable (Brydges et al. 1998): Pl F :¼ Sl=l0 ½l0 ;l½ F
½132
where the convolution product is by definition Z ½l0 ;l½ F ð’Þ ¼ d½l0 ;l½ ð ÞFð’ þ Þ
The coarse-graining operator Pl rescales the convolution of a Gaussian volume element [l0 , l[ so that all volume elements entering the construction of the semigroup renormalization equation are scale independent. Some properties of the coarse-graining operator:
Pl2 Pl1 = Pl2 l1 =l0 . The scaled eigenfunctions of the coarse-graining operator are Wick-ordered monomials (Wurm and Berg 2002)
n½’ l l0 Pl : ’n ðxÞ : ½l0 ;1½ ¼ x : ½l0 ;1½ ½133 : ’n l0 l Note that Pl preserves the scale range.
Let H be the generator of the coarse-graining
Functional Integrals in Quantum Field Theory
Functional integrals in quantum field theory have been modeled to some extent on path integrals in quantum mechanics: mutatis mutandis, the definition [23] of Gaussian volume elements, the diagram expansion [30], the property [36] of linear maps, semiclassical expansions [87], the homotopy theorem [114], and the scaling eqns [135] apply to functional integrals in quantum field theory. The time ordering encoded in a path integral becomes a chronological ordering dictated by light cones in functional integrals of fields on Minkowski fields. The fundamental difference between quantum mechanics (systems with a finite number of degrees of freedom) and quantum field theory (systems with an infinite number of degrees of freedom) can be said to be ‘‘radiative corrections.’’ In quantum field theory, the concept of ‘‘particle’’ is intrinsically associated to the concept of ‘‘field.’’ A particle is affected by its field. Its mass and charge are modified by the surrounding fields, namely its own and other fields interacting with it. One speaks of ‘‘bare mass’’ and ‘‘renormalized mass’’ when the bare mass is renormalized by surrounding fields. Computing radiative corrections is a delicate procedure because the Green functions G defined by [25] are singular. Regularization techniques have been developed for handling singular Green functions. Particles in quantum mechanics are simply particles, and bosons and fermions can be treated separately. Not so in quantum field theory. Therefore, the configuration space in quantum field theory is a supermanifold. For functional integrals in this theory, we refer the reader to the ‘‘Further reading’’ section, in particular to the book of A Das for an introduction, to the book of B DeWitt for an in-depth study, and to the book of K Fujikawa and H Suzuki for applications to quantum anomalies.
operator @ Pl ; H :¼ @l l¼l0
@ @ ¼l @l @l
½134
The semigroup renormalization equation (a.k.a. the flow equation) @ Pl Fð’Þ ¼ HPl Fð’Þ @l Pl0 Fð’Þ ¼ Fð’Þ
½135
Brydges et al. have applied the coarse-graining operator to the quantum field theory known as ‘‘’4 ’’ (more precisely the Wick-ordered Lagrangian of ’4 ). The flow equation [135] plays the role of the ‘‘-function’’ equation in perturbative quantum field theory.
Concluding Remarks The key issue in functional integration is the domain of integration, that is, a function space. This infinitedimensional space, say X, cannot be considered as the limit n = 1 of Rn . Concepts of RD stated without reference to D are likely to be meaningful on X. Other approaches which have been used for exploring X are
projective system of finite-dimensional spaces coherently defined on X (DeWitt-Morette et al. 1979), one-parameter curves on X (Figure 1), and projecting X on finite-dimensional spaces (cylindrical integrals).
Functional Integration in Quantum Physics
Functional integration has advanced our understanding of infinite-dimensional spaces, and like all good mathematical tools, it improves with usage. See also: BRST Quantization; Euclidean Field Theory; Feynman Path Integrals; Infinite-Dimensional Hamiltonian Systems; Knot Theory and Physics; Malliavin Calculus; Path Integrals in Noncommutative Geometry; Quantum Mechanics: Foundations; Stationary Phase Approximation; Topological Sigma Models.
Bibliography/Further Reading The book (Cartier and DeWitt-Morette 2006) was nearing completion while this article was being written; many results and brief comments in this article are fully developed in the book. The book (Grosche and Steiner 1998) includes a bibliography with 945 references. Albeverio SA and Hoegh-Krohn RJ, Mathematical Theory of Feynman Path Integrals, Lecture Notes in Mathematics, vol. 523. Berlin: Springer. Brillouin ML (1926) La me´chanique ondulatoire de Schro¨dinger; une me´thode ge´ne´rale de resolution par approximations successives. Comptes Rendus Acad. Sci. 183: 24–44. Remarques sur la Me´canique Ondulatoire. Journal of Physics Radium 7: 353–368. Brydges DC, Dimock J, and Hurd TR (1998) Estimates on renormalization group transformations. Can. J. Math. 50: 756–793. A non-gaussian fixed point for O4 in 4-dimensions. Communications in Mathematical Physics 198: 111–156. Cartier P and DeWitt-Morette C (2006) Functional Integration; Action and Symmetries. Cambridge: Cambridge University Press. Chaichian M and Demichev A (2001) Path Integrals in Physics, vol. I and II. Bristol, UK: Institute of Physics. Collins D (1997) Functional integration over complex Poisson processes, an appendix to a rigorous mathematical foundation of functional integration by Cartier P and Dewitt-Morette. In: Dewitt-Morette C, Cartier P, and Folacci A (eds.) Functional Integration: Basic and Applications, pp. 42–50. New York: Plenum. Das A (1993) Field Theory, a Path Integral Approach. Singapore: World Scientific. DeWitt B (2004) The Global Approach to Quantum Field Theory. Oxford: Oxford University Press. DeWitt-Morette C, Maheswari A, and Nelson B (1979) Path integration in non-relativistic quantum mechanics. Physics Reports 50: 255–372. Dirac PAM (1933) The Lagrangian in quantum mechanics. Physics Zeitschrift Sowjetunion 3: 64–72. Reprinted in Quantum Electrodynamics. Edited by J Schwinger. Dover, New York, 1958, 312–320. Elworthy KD (1982) Stochastic Differential Equations on Manifolds. Cambridge: Cambridge University Press.
447
Feynman RP (1966) The development of the space-time view of quantum electrodynamics, Nobel Prize Award Address, Stockholm, December 11, 1965. Lex Prix Nobel en 1965. Reprinted in Physics Today. 19(8): 31–44. Stockholm: Nobel Foundation. Feynman RP and Hibbs AR (1961) Quantum Mechanics and Path Integrals. New York: McGraw-Hill. Fujikawa K and Suzuki H (2004) Path Integrals and Quantum Anomalies (Originally published in Japanese in 2001 by Iwanami Shoten Pub. Tokyo.). Oxford: Oxford University Press. Glimm J and Jaffe A (1987) Quantum Physics: A Functional Integral Point of View. New York: Springer. Grosche G and Steiner F (1998) Handbook of Feynman Path Integrals. Springer Tracts in Modern Physics. Berlin: Springer. Johnson GW and Lapidus ML (2000) The Feynman Integral and Feynman’s Operational Calculus. Oxford: Oxford University Press. Kleinert H (2004) Path Integrals in Quantum Mechanics, Statistics, and Polymer Physics, 3rd edn. Singapore: World Scientific. Kramers HA (1927) Wellenmechanik und halbzahlige Quantisierung. Zeitschrift Physics 39: 828–840. LaChapelle J (2004) Path integral solution of linear second order partial differential equations, I. The general construction; II. Elliptic, parabolic, and hyperbolic cases. Annals in Physics 314: 262–424. Laidlaw MGG and DeWitt CM (1971) Feynman functional integrals for systems of indistinguishable particles. Physical Review D 3: 1375–1378. Marinov MS (1980) Path integrals in quantum theory: an outlook of basic concepts. Physics Reports 60: 1. Morette C (1951) On the definition and approximation of Feynman’s path integrals. Physical Review 81: 848–852. Roepstorff G (1994) Path Integral Approach to Quantum Physics: An Introduction (Original German edition: Pfadintegrale in der Quantenphysik, Braunschweig: Vieweg 1991.). Berlin: Springer. Schulman LS (1968) A path integral for spin. Physical Review 176: 1558–1569. Schulman LS (1971) Approximate topologies. Journal of Mathematical Physics 12: 304–308. Schulman LS (1981) Techniques and Applications of Path Integration. New York: Wiley. Simon B (1979) Functional Integration and Quantum Physics. New York: Academic Press. t’Hooft G and Veltmann HJG (1973) Diagrammer. CERN Preprint 73–9: 1–114. Wenzel G (1926) Eine Verallgemeinerung der Quantembedingungen fu¨r die Zwecke der Wellenmechanik. Zeitschrift Physics 38: 518–529. Wurm A and Berg M (2002) Wick calculus. ArXiv:Physics/ 0212061. Zinn-Justin J (2003) Inte´grale de chemin en me´canique quantique: Introduction (Path Integrals in Quantum Mechanics, Oxford University Press, 2004). Paris: EDP Sciences, Les Ulis et CNRS Editions.
G G-Convergence and Homogenization G Dal Maso, SISSA, Trieste, Italy ª 2006 Elsevier Ltd. All rights reserved.
Introduction Several asymptotic problems in the calculus of variations lead to the following question: given a sequence F k of functionals, defined on a suitable function space, does there exist a functional F such that the solutions of the minimum problems for F k converge to the solutions of the corresponding minimum problems for F ? -convergence, introduced by Ennio De Giorgi and his collaborators in 1975, and developed as a powerful tool to attack a wide range of applied problems, provides a unified answer to this kind of question.
Definition and Main Properties
It follows immediately from the definition that, if F k -converges to F , then F k þ G -converges to F þ G for every continuous function G : U ! R. The first general property of -limits is lower semicontinuity: if F k -converges to F , then F is lower semicontinuous on U; that is, F ðuÞ lim inf F ðuk Þ k!1
for every u 2 U and for every sequence uk converging to u in U. Another important property of -convergence is compactness: every sequence F k has a -convergent subsequence. For every k assume that the function F k has a minimum point uk . The following property is the link between -convergence and convergence of minimizers: if F k -converges to F and uk converges to u, then u is a minimum point of F and F k (uk ) converges to F (u), hence min F ðvÞ ¼ lim min F k ðvÞ v2U
Let U be a topological space with a countable base and let F k be a sequence of functions defined on U with values in the extended real line R := R [ {1, þ1}. We say that F k -converges to a function F : U ! R, or that F is the -limit of F k , if for every u 2 U the following conditions are satisfied: 1. For every sequence uk converging to u in U we have F ðuÞ lim inf F k ðuk Þ k!1
2. There exists a sequence uk converging to u in U such that F ðuÞ ¼ lim F k ðuk Þ k!1
Property (1) appears to be a variant of the usual definition of lower semicontinuity. Property (2) requires the existence, for every u 2 U, of a ‘‘recovery sequence,’’ which provides an approximation of the value of F at u by means of values attained by F k near u.
k!1 v2U
½1
Under suitable coerciveness assumptions, the convergence of uk is obtained by a compactness argument. We recall that a sequence of functions F k is said to be equicoercive if for every t 2 R there exists a compact set Kt (independent of k) such that fu 2 U: F k ðuÞ tg Kt
½2
for every k. If F k is equicoercive and -converges to F , the previous result implies that [1] holds. If, in addition, F is not identically þ1, then the sequence uk of minimizers considered above has a subsequence ukj which converges to a minimizer u of F . The whole sequence uk converges to u whenever F has a unique minimizer u. In many applications to the calculus of variations, U is the Lebesgue space Lp (; Rm ), with a bounded open subset of Rn and 1 p < þ1, but the effective domains of the functionals F k , defined as {u 2 U: F k (u) 2 R}, are often contained in the Sobolev space W 1,p (; Rm ), composed of all functions u 2 Lp (; Rm ) whose distributional gradient
450 G-Convergence and Homogenization
ru belongs to Lp (; Rmn ). When one considers homogeneous Dirichlet boundary conditions, the effective domains of the functionals F k are often contained in the smaller Sobolev space W01,p (; Rm ), composed of all functions of W 1,p (; Rm ) which vanish on the boundary @, technically defined as m 1,p the closure of C1 (; Rm ). 0 (; R ) in W In this case, the equicoerciveness condition [2] can be obtained by using Rellich’s theorem, which 1,p asserts that the natural embedding of W0 (; Rm ) m p into L (; R ) is compact. Therefore, a sequence of functionals F k defined on Lp (; Rm ) is equicoercive if there exists a constant > 0 such that F k ðuÞ
Z
jrujp dx
1,p
for every u 2 W0 (; Rm ), while F k (u) = þ1 for every u 2 = W01,p (; Rm ).
Homogenization Problems Many problems for composite materials (fibered or stratified materials, porous media, materials with many small holes or fissures, etc.) lead to the study of mathematical models with many interacting scales, which may differ by several orders of magnitude. From a microscopic viewpoint, the systems considered are highly inhomogeneous. Typically, in such composite materials, the physical parameters (such as electric and thermal conductivity, elasticity coefficients, etc.) are discontinuous and oscillate between the different values characterizing each component. When these components are intimately mixed, these parameters oscillate very rapidly and the microscopic structure becomes more and more complex. On the other hand, the material becomes quite simple from a macroscopic point of view, and it tends to behave like an ideal homogeneous material, called ‘‘homogenized material.’’ The purpose of the mathematical theory of homogenization is to describe this limit process when the parameters which describe the fineness of the microscopic structure tend to zero. Homogenization problems are often treated by studying the partial differential equations that govern the physical properties under investigation. Due to the small scale of the microscopic structure, these equations contain some small parameters. The mathematical problem consists then in the study of the limit of the solutions of these equations when the parameters tend to zero. -convergence is a very useful tool to obtain homogenization results for systems governed by variational principles, which are the only ones described in this article.
Let Q := (1=2, 1=2)n be the open unit cube in Rn centered at 0. We say that a function u defined on Rn is Q-periodic if, for every z 2 Rn with integer coordinates, we have u(x þ z) = u(x) for every x 2 Rn . Let f : Rn Rmn ! [0, þ1) be a function such that x 7! f (x, ) is measurable and Q-periodic on Rn for every 2 Rmn and 7! f (x, ) is convex on Rmn for every x 2 Rn . Given a bounded open set Rn and a constant p > 1, let F " : Lp (; Rm ) ! [0, þ1] be the family of functionals defined by (R F " ðuÞ :¼
1;p
if u 2 W0 ð; Rm Þ
f ðx="; ruÞ dx
otherwise
þ1
In the applications to composite materials, the functional F " represents the energy of the portion of the material occupying the domain . The fact that the energy density depends on x=" reflects the "-periodic structure of the material, which implies that the energy density oscillates faster and faster as " ! 0. Assume that there exist two constants > 0 such that jjp f ðx; Þ ð1 þ jjp Þ
½3
for every x 2 and every 2 Rmn . Then for every sequence "k ! 0 the functionals F "k -converge to the functional F hom : Lp (; Rm ) ! [0, þ1] defined by (R 1;p f ðruÞ dx if u 2 W0 ð; Rm Þ F hom ðuÞ :¼ hom þ1 otherwise ½4 The integrand fhom : Rmn ! [0, þ1) is obtained by solving the cell problem Z fhom ðÞ :¼ min f ðx; þ rwÞ dx ½5 1;p w2Wper ðQ;Rm Þ
Q
1,p where Wper (Q; Rm ) denotes the space of functions 1,p n w 2 Wloc (R ; Rm ) which are Q-periodic. The function fhom is always convex and satisfies [3]. If it is strictly convex, the basic properties of -convergence imply that for every g 2 Lq (; Rm ), with 1=p þ 1=q = 1, the solutions u" of the minimum problems Z h i x ½6 min f ; rv gðxÞv dx 1;p " v2W0 ð;Rm Þ
converge in Lp (; Rm ), as " ! 0, to the solution u of the minimum problem Z min ½ fhom ðrvÞ gðxÞv dx ½7 v2W01;p ð;Rm Þ
G-Convergence and Homogenization
Similar results can be proved for nonhomogeneous Dirichlet boundary conditions, as well as for Neumann boundary conditions. In the special case m = 1, p = 2, and f ðx; Þ ¼
n 1X aij ðxÞj i 2 i;j¼1
½8
with aij (x) Q-periodic, the function fhom takes the form fhom ðÞ ¼
n 1X ahom j i 2 i;j¼1 ij
for suitable constant coefficients ahom ij . By considering the Euler equations of the problems [6] and [7] in this special case, from the previous result we obtain the homogenization theorem for symmetric elliptic operators in divergence form, which asserts that for every g 2 L2 () the solutions u" of the Dirichlet problems n x X Di aij Dj u" ðxÞ ¼ gðxÞ on " i;j¼1 u" ðxÞ ¼ 0
on @
converge in L2 () to the solution u of the Dirichlet problem
n X
ahom Di Dj uðxÞ ¼ gðxÞ ij
on
i;j¼1
uðxÞ ¼ 0
on @
An extensive literature is devoted to precise estimates of the homogenized coefficients ahom ij , depending on various structure conditions on the periodic coefficients aij (x). Some of these estimates are based on a clever use of the variational formula [5]. Explicit formulas for ahom are known in the case ij of layered materials, which correspond to the case where Rn is periodically partitioned into parallel layers on which the coefficients aij (x) take constant values. Easy examples show that, even if the composite material is isotropic at a microscopic layer (i.e., aij (x) = a(x)ij for some scalar function a(x)), the homogenized material can be anisotropic (i.e., ahom 6¼ aij ), due to the anisotropy of the periodic ij function a(x), which describes the microscopic distribution of the different components of the composite material. In the vector case m > 1, the convexity hypothesis on 7! f (x, ) is not satisfied by the most interesting functionals related to nonlinear elasticity. If 7! f (x, ) is not convex, one can still prove that F "k -converges
451
to a functional F hom : Lp (; Rm ) ! [0, þ1] of the form [4], but this time fhom : Rmn ! [0, þ1) cannot be obtained by solving a problem in the unit cell. Instead, it is given by the asymptotic formula Z 1 fhom ðÞ :¼ lim n min f ðx; þ rwÞ dx R!1 R w2W 1;p ðQR ;Rm Þ QR 0 where QR := (R=2, R=2)n is the open cube of side R centered at 0. Similar formulas can be obtained for quasiperiodic integrands f and for stochastic homogenization problems. In the nonperiodic case one can prove that, if g" : Rn Rmn ! [0, þ1) are arbitrary Borel functions satisfying [3], with constants independent of ", and G" : Lp (; Rm ) ! [0, þ1] are defined by (R g ðx; ruÞ dx if u 2 W01;p ð; Rm Þ G" ðuÞ :¼ " þ1 otherwise then there exists a sequence "k ! 0 such that the functionals G"k -converge to a functional G of the form (R 1;p gðx; ruÞ dx if u 2 W0 ð; Rm Þ GðuÞ :¼ þ1 otherwise with g satisfying [3]. In this case, no easy formula provides the integrand g(x, ) in terms of simple operations on the integrands g"k (x, ). The indirect connection between these integrands can be obtained by introducing the functions M" (x, , ) defined, for x 2 , 2 Rmn , and 0 < < dist(x, @), by Z M" ðx; ; Þ :¼ min g" ðy; þ rwÞ dy w2W01;p ðBðx;ÞÞ
Bðx;Þ
where B(x, ) is the open ball with center x and radius . These functions describe the local behavior of the integrands g" in some special minimum problems. The sequence G"k -converges to G if and only if M"k ðx; ; Þ jBðx; Þj k!1 M"k ðx; ; Þ ¼ lim sup lim sup jBðx; Þj !0 k!1
gðx; Þ ¼ lim inf lim inf !0
for almost every x 2 and every 2 Rmn . Similar results have also been proved for integral functionals of the form (R g ðx; u; ruÞ dx if u 2 W01;p ð; Rm Þ G" ðuÞ :¼ " þ1 otherwise under suitable structure conditions for the integrands g" .
452 G-Convergence and Homogenization
Perforated Domains In some homogenization problems, the integrand is fixed, but the domain depends on a small parameter " and its boundary becomes more and more fragmented as " ! 0. A typical example is given by periodically perforated domains with small holes. Given a bounded open set Rn and a compact set K Q, both with smooth boundaries, for every " > 0 we consider the perforated sets [ ð"z þ "KÞ ½9 " :¼ n z2Z"
where Z" is the set of vectors z 2 Rn with integer coordinates such that "z þ "Q . Given g 2 L2 (), let F " : L2 () ! [0, þ1] be the functionals defined by (R h i 2 1 jruj gu dx if u 2 W01;2 ðÞ F ðuÞ :¼ " 2 "
otherwise
þ1
½10 Minimizing [10] is equivalent to solving the mixed problems u" ¼ g u" ¼ 0 @u" ¼0 @
on " on @
½11
on @" n@
The homogenization formula [5] is still valid, with minor modifications. It leads to a matrix of coefficients ahom such that ij Z n X ahom :¼ min j þ rwj2 dx j i ij i;j¼1
1;2 w2Wper ðQÞ
QnK
for every 2 Rn . For every sequence "k ! 0 the -limit of the functionals F "k is the functional F : L2 () ! [0, þ1] defined by # 8 " n R 1X > > > ahom Dj u Di u mgu dx > ij < 2 i;j¼1 F ðuÞ :¼ > if u 2 W01;2 ðÞ > > > : þ1 otherwise where m := jQnKj is the volume fraction of the sets " . Since a slight modification of the functionals F " satisfies an equicoerciveness condition, it follows from the basic properties of -convergence that the solutions u" of the mixed problems [11] in the perforated domains [9], extended to the holes so " and u" 2 W 1,2 (), that u" are harmonic on n 0
converge in L2 () to the solution u of the Dirichlet problem
n X
ahom ij Di Dj u ¼ mg
on
i;j¼1
u¼0
on @
Therefore, the asymptotic effect of the small holes with Neumann boundary condition is a change in the coefficients of the elliptic equation. In the case of Dirichlet boundary conditions, it is interesting to consider perforated domains with holes of a different size, namely [ ð"z þ "n=ðn2Þ KÞ ½12 " :¼ n z2Z"
with "n=(n2) replaced by exp (1="2 ) if n = 2, while the case n = 1 gives only trivial results. Given g 2 L2 (), let G" : L2 () ! [0, þ1] be the functionals defined by i (R h 2 1 jruj gu dx if u 2 W01;2 ð" Þ 2 " G" ðuÞ :¼ ½13 þ1 otherwise Minimizing [13] is equivalent to solving the Dirichlet problems u" ¼ g on " ½14 u" ¼ 0 on @" For every sequence "k ! 0 the -limit of the functionals G"k is the functional G : L2 () ! [0, þ1] defined by (R h i 2 1 c 2 jruj þ u gu dx if u 2 W01;2 ðÞ 2 2 GðuÞ :¼ otherwise
þ1 where, for n 3, c :¼ capðKÞ :¼
inf 1
w2Cc ðRn Þ w¼1 on K
Z n
jrwj2 dx
R
Since a slight modification of the functionals G" satisfies an equicoerciveness condition, it follows from the basic properties of -convergence that the solutions u" of the Dirichlet problems [14] in the perforated domains [12], extended as zero on n " , converge in L2 () to the solution u if the Dirichlet problem u þ cu ¼ g u¼0
on on @
½15
In the electrostatic interpretation of these problems, the boundary @" is a conductor kept at potential
G-Convergence and Homogenization
453
zero. The extra term cu in [15] is due to the electric charges induced on @" by the charge distribution g. These results on Dirichlet and Neumann boundary conditions have been extended to more general functionals and also to a wide class of nonperiodic distributions of small holes.
To study the behavior of [17] as " ! 0, it is convenient to change variables, so that the scaled are deformations v(x1 , x2 , x3 ) := u(x1 , x2 , "x3 ) defined on the same domain 1 1 :¼ S ; 2 2
Dimension Reduction Problems
The scaled energy density W" : R33 ! [0, þ1] is then defined as 1 W" ðF1 jF2 jF3 Þ :¼ W F1 jF2 j F3 "
In the study of thin elastic structures, like plates, membranes, rods, and strings, it is customary to approximate the mechanical behavior of a thin threedimensional body by an effective theory for two- or one-dimensional elastic bodies. -convergence provides a useful tool for a rigorous deduction of the lowerdimensional theory. Let us focus on the derivation of plate theory from three-dimensional finite elasticity. The reference configuration of the thin three-dimensional elastic body is a cylinder of the form " " " :¼ S ; 2 2 where " > 0 and S is a bounded open subset of R2 with smooth boundary. We assume that the body is hyperelastic, with stored elastic energy Z WðruÞ dx "
where u : " ! R3 is the deformation. The energy density W : R33 ! [0, þ1], depending on the material, is continuous and frame indifferent; that is, W(QF) = W(F) for every rotation Q and every F 2 R33 , where QF denotes the usual product of 33 matrices. We assume that W vanishes on the set SO(3) of rotations, is of class C2 in a neighborhood SO(3), and satisfies the inequality 2
WðFÞ dist ðF; SOð3ÞÞ for every F 2 R
33
½16
with a constant > 0. Plate theory is obtained in the limit as " ! 0 when the densities of the volume forces applied to the body have the form "2 f (x1 , x2 ), with f 2 L2 (S; R3 ). We assume that f is balanced; that is, Z Z f dx ¼ 0; x ^ f dx ¼ 0 "
"
Stable equilibria are then obtained by minimizing the functionals Z WðruÞ "2 f u dx ½17 " 3
on W 1,2 (" ; R ).
where (F1 jF2 jF3 ) denotes the 33 matrix with columns F1 , F2 , and F3 . This implies that Z WðruÞ "2 f u dx " Z ¼" W" ðrvÞ "2 f v dx
The asymptotic behavior of the minimizers of these functionals can be obtained from the knowledge of the -limit of the functionals F " : L2 (; R3 ) ! [0, þ1] defined by 8 Z 0, the energy functional F " : L1 () ! [0, þ1] has the form 8R h i 2 2 < WðuÞ þ " jruj dx if u 2 AðmÞ ½20 F " ðuÞ :¼ : þ1 otherwise whereR A(m) is the set of all functions u 2 W 1, 2 () with u = m. We assume that W : R ! [0, þ 1) is continuous and that there exist , 2 R, with jj < m < jj, such that W(t) = 0 if and only t = or t = . Moreover, we assume that W(t) ! þ1 as t ! 1. In the minimization of F " , the Gibbs free energy W(u) favors the functions whose values are close to
jj m
½21
From the basic properties of -convergence, we deduce that Z h i min WðuÞ þ "2 jruj2 dx ! 0 ½22 u2AðmÞ
and that there exists a sequence "k ! 0 such that the minimizers u"k of F "k converge in L1 () to a function u which takes only the values and and satisfies [21]. This result can be improved by considering the rescaled functionals 1 G" ðuÞ :¼ F " ðuÞ "
½23
where F " is defined by [20]. Then for every sequence "k ! 0 the sequence G"k -converges to the functional G : L1 () ! [0, þ1] defined by 2cPðE ; Þ if u 2 Mð; ; mÞ GðuÞ :¼ þ1 otherwise where c :¼
Z
pffiffiffiffiffiffiffiffiffiffiffi WðtÞ dt
and PðE; Þ :¼ sup
Z E
div ’ dx : ’ 2
C1c ð; Rn Þ; j’j
1
is the Caccioppoli–De Giorgi perimeter of E in , which coincides with the (n 1)-dimensional measure of \ @E when E is smooth enough. Note that the effective domain A(m) of the functionals G" is disjoint from the effective domain of the limit functional G, which is the set of all functions u 2 M(, , m) with P(E , ) < þ1.
G-Convergence and Homogenization
As the functionals [20] and [23] have the same minimizers, we deduce that there exists a sequence "k ! 0 such that the minimizers u"k of F "k converge in L1 () to a function u which takes only the values and , satisfies [21], and fulfills the minimal interface criterion PðE ; Þ PðE; Þ for every measurable set E with jEj = jE j. Moreover, [22] can be improved, and we obtain min
u2W 1;2 ðÞ
F " ðuÞ ¼ "2cPðE ; Þ þ oð"Þ
Similar results have been proved when the term jruj2 in [20] is replaced by a general quadratic form like [8], which leads to an anisotropic notion of perimeter.
Free-Discontinuity Problems Free-discontinuity problems are minimum problems for functionals composed of two terms of different nature: a bulk energy, typically given by a volume integral depending on the gradient of an unknown function u; and a surface energy, given by an integral on the unknown discontinuity surface of u. These problems arise in many different fields of science and technology, such as liquid crystals, fracture mechanics, and computer vision. The prototype of free-discontinuity problems is the minimum problem proposed by David Mumford and Jayant Shah: (Z min jruj2 dx þ Hn1 ðK \ Þ ðu;KÞ2A
nK
þ
Z
) 2
ju gj dx
½24
nK
where is a bounded open subset of Rn , Hn1 denotes the (n 1)-dimensional Hausdorff measure, g 2 L1 (), and A is the set of all pairs (u, K) with K compact, K Rn , and u 2 C1 ( n K). In the applications to image segmentation problems the dimension n is 2 and the function g represents the grey level of an image. Given a solution (u, K) of the minimum problem [24], the set K is interpreted as the set of the relevant boundaries of the objects in the image, while u provides a smoothed version of the image. The first term in [24] has a regularizing effect, the purpose of the second term is to avoid oversegmentation, while the last term, called ‘‘fidelity term,’’ forces u to be close to g. Of course, in the applications these terms are multiplied by different coefficients, whose relative values are very important for image
455
segmentation problems, since they determine the strength of the effect of each term. However, the mathematical analysis of the problem can be easily reduced to the case where all coefficients are equal to 1. To solve [24], it is convenient to introduce a weak formulation of the problem based on the space GSBV() of generalized special functions with bounded variation (see Ambrosio et al. (2000)). Without entering into details, here it is enough to say that every u 2 GSBV() has, at almost every point, an approximate gradient ru in the sense of geometric measure theory. This is a measurable map from into Rn which coincides with the usual gradient in the sense of distributions on every open subset U of such that u 2 W 1,1 (U). The functional F : L1 () ! [0, þ1] used for the weak formulation of [24] is defined by (R 2 n1 ðJu Þ if u 2 GSBVðÞ jruj dx þ H F ðuÞ :¼ þ1 otherwise ½25 where Ju is the jump set of u, defined in a measuretheoretical way as the set of points x 2 such that Z 1 lim sup juðyÞ aj dy > 0 !0 jBðx; Þj Bðx;Þ for every a 2 R. For every g 2 L1 (), the functional Z F ðuÞ þ ju gj2 dx
is lower semicontinuous and coercive on L1 (); therefore, the minimum problem Z 2 min F ðuÞ þ ju gj dx ½26 u2L1 ðÞ
has a solution. The connection with the Mumford– Shah problem is given by the following regularity result, proved by Ennio De Giorgi and his collaborators: if u is a solution of [26] and Ju is its closure, then Hn1 ( \ (Ju nJu )) = 0, u 2 C1 (n Ju ), and (u, Ju ) is a solution of [24]. Since the numerical treatment of [24] and [26] is quite difficult, -convergence has been used to approximate [26] by means of minimum problems for integral functionals, whose minimizers can be obtained by standard numerical techniques. Let us consider the nonlocal functionals F " : L1 () ! [0, þ1] defined by 8 Z 0 let u" be a solution of the minimum problem Z 1 min f " Avðjruj2 ; x; "Þ dx u2W 1;2 ðÞ " Z 2 þ ju gj dx
From the basic properties of -convergence it follows that there exists a sequence "k ! 0 such that u"k converges in L1 () to a solution u of [26], so that (u, Ju ) is a solution of [24]. Other approximations by nonlocal functionals use finite differences instead of averages of gradients. A different approximation can be obtained by using the local functionals G" : (L1 ())2 ! [0, þ1] defined by
8Z " 1 2 2 > > jrvj hðvÞ dx g ðvÞjruj þ þ > < " 2 2" G" ðu; vÞ :¼ > if ðu; vÞ 2 ðW 1;2 ðÞÞ2 > > : þ1 otherwise where g" (t) := " þ t2 , 0 < " 0 let (u" , v" ) be a solution of the minimum problem Z h " min g" ðvÞjruj2 þ jrvj2 2 ðu;vÞ2ðW 1;2 ðÞÞ2
1 2 þ hðvÞ þ ju gj dx ½27 2" From the basic properties of -convergence it follows that there exists a sequence "k ! 0 such that u"k converges in L1 () to a solution u of [26], so that (u, Ju ) is a solution of [24]. The approximation of the solutions of [24] based on [27] has been used to construct numerical algorithms for image segmentation.
Free discontinuity problems similar to [24] appear in the mathematical treatment of Griffith’s model in fracture mechanics. In this case, u is a vector-valued function, which represents the deformation of an elastic body, the first term in [24] is replaced by a more general integral functional which represents the energy stored in the elastic region nK, while the second term is interpreted as the energy dissipated to produce the crack K. An approximation based on minimum problems similar to [27] has been used to construct numerical algorithms to study the process of crack growth in brittle materials. An important research line, connected with these problems, has been developed in the last years to derive the macroscopic theories of fracture mechanics from the microscopic theories of interatomic interactions. Using -convergence, some theories expressed in the language of continuum mechanics can be obtained as limits of discrete variational models on lattices, as the distance between neighboring points tends to zero. See also: Convex Analysis and Duality Methods; Elliptic Differential Equations: Linear Theory; Free Interfaces and Free Discontinuities: Variational Problems; Geometric Measure Theory; Image Processing: Mathematics; Variational Techniques for Ginzburg– Landau Energies; Variational Techniques for Microstructures.
Further Reading Allaire G (2002) Shape Optimization by the Homogenization Method. Berlin: Springer. Ambrosio L, Fusco N, and Pallara D (2000) Functions of Bounded Variation and Free Discontinuity Problems. Oxford: Oxford University Press. Bakhvalov N and Panasenko G (1989) Homogenisation: Averaging Processes in Periodic Media. Mathematical Problems in the Mechanics of Composite Materials (translated from the Russian by D. Leıtes). Dordrecht: Kluwer. Bensoussan A, Lions JL, and Papanicolaou GC (1978) Asymptotic Analysis for Periodic Structures. Amsterdam: North-Holland. Braides A (1998) Approximation of Free-Discontinuity Problems, Lecture Notes in Mathematics, vol. 1694. Berlin: Springer. Braides A (2002) -Convergence for Beginners. Oxford: Oxford University Press. Braides A and Defranceschi A (1998) Homogenization of Multiple Integrals. Oxford: Oxford University Press. Cherkaev A and Kohn RV (eds.) (1997) Topics in the Mathematical Modelling of Composite Materials. Boston: Birkha¨user. Christensen RM (1979) Mechanics of Composite Materials. New York: Wiley. Cioranescu D and Donato P (1999) An Introduction to Homogenization. New York: The Clarendon Press, Oxford University Press. Dal Maso G (1993) An Introduction to -Convergence. Boston: Birkha¨user. Friesecke G, James RD, and Mu¨ller S (2002) A theorem on geometric rigidity and the derivation of nonlinear plate theory
Gauge Theoretic Invariants of 4-Manifolds from three-dimensional elasticity. Communications on Pure and Applied Mathematics 55: 1461–1506. Jikov VV, Kozlov SM, and Oleınik OA (1994) Homogenization of Differential Operators and Integral Functionals (translated from the Russian by G. A. Yosifian). Berlin: Springer. Milton GW (2002) The Theory of Composites. Cambridge: Cambridge University Press.
457
Oleınik OA, Shamaev AS, and Yosifian GA (1992) Mathematical Problems in Elasticity and Homogenization. Amsterdam: North-Holland. Panasenko G (2005) Multi-scale Modelling for Structures and Composites. Dordrecht: Springer. Sanchez-Palencia E (1980) Nonhomogeneous Media and Vibration Theory, Lecture Notes in Physics, vol. 127. Berlin: Springer.
Gauge Theoretic Invariants of 4-Manifolds S Bauer, Universita¨t Bielefeld, Bielefeld, Germany ª 2006 Elsevier Ltd. All rights reserved.
Introduction Poincare´ duality is fundamental in the study of manifolds. In the case of an orientable closed manifold X, this duality appears as an isomorphism : H k ðX; ZÞ ! Hnk ðX; ZÞ between integral cohomology and homology. The map is defined by cap product with a chosen orientation class. This article focuses on dimension n = 4, where Poincare´ duality induces a bilinear form Q on H2 (X; Z) by use of the Kronecker pairing Qð; 0 Þ ¼ h
1
ðÞ; 0 i 2 Z
One of the outstanding achievements of modern topology, the classification of simply connected topological 4-manifolds by Freedman (1982), can be phrased in terms of the intersection pairing Q. Indeed, two simply connected differentiable 4-manifolds X and X0 are orientation preservingly homeomorphic if and only if the associated pairings Q and Q0 are equivalent. Freedman’s classification scheme has been extended to also cover a wide range of fundamental groups, resulting in a fair understanding of topological 4-manifolds (Freedman and Quinn 1990). When it comes to differentiable 4-manifolds, the situation changes drastically. On the one hand, there is an abundance of topological 4-manifolds which do not admit a differentiable structure at all. On the other hand, there also are topological 4-manifolds supporting infinitely many distinct differentiable structures. A classification of differentiable 4-manifolds up to differentiable equivalence seems out of reach of current technology, even in the most simple cases. The discrepancy between topological and differentiable 4-manifolds was uncovered by gauge-theoretic methods, applying the concepts of instantons and of monopoles. In order to study these, one has to equip a 4-manifold both with a Riemannian metric and some
additional structure: a Hermitian rank-2 bundle in the case of instantons and a spinc -structure in the case of monopoles. Given such data, instantons and monopoles arise as solutions to partial differential equations the gauge equivalence classes of which form finite-dimensional moduli spaces. As it turns out, these moduli spaces encode significant information about the differentiable structures of the underlying 4-manifolds. A decoding of such information contained in the instanton moduli and in the monopole moduli is achieved through Donaldson invariants and Seiberg– Witten invariants, respectively. This article outlines these theories from a mathematical point of view.
Instantons and Donaldson Invariants Let X denote a closed, connected, oriented differentiable Riemannian 4-manifold. We will consider a principal bundle P over X with fiber a compact Lie group G with Lie algebra g. Connections on P form an infinite-dimensional affine space A(P) = A0 þ 1 (X; gP ) modeled on the vector space of 1-forms with values in the adjoint bundle gP ¼ P AdðGÞ g The curvature FA 2 2 (X, gP ) of a connection A is a gP -valued 2-form satisfying the Bianchi identity DA FA = 0. The group G of principal bundle automorphisms of P acts in a natural way on the space of connections with quotient space BðPÞ ¼ AðPÞ=G The Yang–Mills functional YM : AðPÞ ! R 0 associates to a connection A the norm square Z 2 kFA k ¼ trðFA ^ FA Þ X
of its curvature. Here denotes the Hodge star operator defined by the metric on X and the orientation. The metric tr: g g ! R is Ad(G)invariant and hence YM is invariant under the
458 Gauge Theoretic Invariants of 4-Manifolds
action of G. In particular, the Yang–Mills functional descends to a function on the space B(P) of a gauge equivalence class of connections. The Euler–Lagrange equations for the critical points of YM, called Yang–Mills equations, are of the form DA ðFA Þ ¼ 0 and can be derived easily from the formula FAþa ¼ FA þ DA ðaÞ þ ½a ^ a
into (1)-eigenspaces of dimension bþ and b . Unless specified differently, cohomology groups are meant with real coefficients. In order to simplify the exposition, we will assume X to be simply connected. The Donaldson invariants then are defined if bþ is odd and greater than 1. A ‘‘homology orientation’’ consists of an orienta2 tion of Hþ (X) and an integral homology class c 2 H2 (X; Z). The Donaldson invariant DX, c = Dc is defined after fixing such a homology orientation. It is a linear function
Satisfying the equations DA ðFA Þ = 0
Dc : AðXÞ ! R and DA ðFA Þ = 0
a Yang–Mills connection is characterized by the fact that it is harmonic with respect to its own Laplacian. The bundle ^2 T X of 2-forms on X decomposes into (1)-eigenbundles of the Hodge operator. This orthogonal splitting leads to a decomposition of curvature forms FA ¼ FAþ þ FA into self-dual and anti-self-dual components. The differential form (1=42 )tr(FA ^ FA ) represents a characteristic class of the principal bundle P. In particular, the integral Z ðPÞ ¼ trðFA ^ FA Þ X
¼ kFAþ k2 kFA k2 is independent of the connection A. The Yang–Mills functional therefore is bounded YMðPÞ jðPÞj and attains this minimum at connections A which satisfy the equation FA ¼ FA Such connections are either self-dual, anti-self-dual or both, that is, flat, depending on whether (P) is negative, positive, or zero. The moduli space of instantons on P is the subset of minima of the Yang– Mills functional 1
MðPÞ ¼ YM ðjðPÞjÞ BðPÞ The moduli space thus consists of gauge equivalence classes of connections which are either self-dual or anti-self-dual. Donaldson theory indeed considers anti-self-dual connections on principal bundles with structure group PU(2) = SO(3). The Hodge operator induces a decomposition of the second cohomology 2 2 H 2 ðXÞ ¼ Hþ ðXÞ H ðXÞ
where A(X) is the graded algebra AðXÞ ¼ Sym ðH0 ðXÞ H2 ðXÞÞ in which Hi (X) has degree (1=2)(4 i). The significance of Dc is its functoriality DX0 ; f ðcÞ ðf ðÞÞ ¼ DX; c ðÞ under diffeomorphisms f : X ! X0 which preserve both orientation and homology orientation. Switch2 ing the orientation of Hþ (X; R) reverses the sign of Dc . Similarly, 0
2
Dc0 ¼ ð1Þððcc Þ=2Þ Dc if c c0 2 2H2 (X, Z) H2 (X; Z). The construction of this invariant makes use of the following facts: 1. An SO(3) principal bundle P over X is determined by its first Pontrjagin number p1 (P) and its Stiefel–Whitney class w2 (P) 2 H 2 (X; Z=2). As X is simply connected, this Stiefel–Whitney class admits integer lifts. Let c be such a lift and let c2 be shorthand for the intersection pairing Q(c, c). A pair (p1 , w2 ) is realized by a principal bundle provided it satisfies the relation p1 c2 modulo 4. 2. If bþ is nonvanishing, then for generic metrics on X, the moduli space M(P) is a manifold of dimension 2p1 ðPÞ 3ð1 þ bþ Þ This follows from a transversality theorem whose main ingredient in the Sard–Smale theorem. The dimension is computed by use of the Atiyah–Singer index theorem: to an anti-self-dual connection A on P there is an associated elliptic complex DA
0 ! 0 ðX; gP Þ ! 1 ðX; gP Þ Dþ A
! 2þ ðX; gP Þ ! 0 where i (X; gP ) denotes gP -valued i-forms on X. This complex describes the tangential structure of
Gauge Theoretic Invariants of 4-Manifolds
the moduli space at the equivalence class of A. The space 1 (X; gP ) is the tangent space of A(P) at A, 0 (X; gP ) is the tangent space of the group G at the identity, and DA is the differential of the orbit map. The differential operator Dþ A is the linearization of the anti-self-duality map þ þ ¼ Dþ a 7! FAþa A ðaÞ þ ½a ^ a
3. The moduli space M(P) can be oriented if it is a manifold. The orientation depends on an orienta2 tion of Hþ (X) and on a U(2)-principal bundle which has P as its PU(2)-quotient bundle. It is determined by an integer lift of w2 (P). The elliptic complex above then can be compared with a corresponding elliptic complex where the differentials are given by a complex Dirac operator. This leads to an almostcomplex structure on the tangent space for each point in the moduli space and in particular to an orientation on the moduli space itself. 4. Over the product M(P) X there is a universal PU(2)-bundle P with first Pontrjagin class p1 (P). Taking slant product with the class (1=4)p1 (P) results in a homomorphism : Hi ðXÞ ! H 4i ðMðPÞÞ 5. The moduli space M(P) in general is noncompact. There is an Uhlenbeck compactification M(P) describing ‘‘ideal instantons.’’ Such an ideal instanton consists of an element (x1 , . . . , xn ) 2 Symn (X) and an anti-self- dual connection A0 on the principal bundle P0 on X with w2 (P0 ) = w2 (P) for which the equality p1 ðP0 Þ p1 ðPÞ ¼ 4n of Pontrjagin numbers holds. Uhlenbeck’s compactness theorem describes what happens if a sequence of anti-self-dual connections has no convergent subsequence: after passing to a subsequence, the sequence converges to an anti-self-dual connection on the restriction of P to Xn{x1 , . . . , xn }. This limit connection extends to a connection A0 on the principal bundle P0 . The functions jFAn j2 on X converge to the measure 2
jFA0 j þ
n X
459
projective plane CP2 as a complex manifold carries 2 a natural orientation. The notation CP indicates a reversed orientation. 6. The classes () 2 H2 (M(P)) for 2 H 2 (X) extend over the compactification. The same holds for the class (x), where x 2 H0 (X; Z) is the generator corresponding to the orientation, as long as w2 (P) 6¼ 0. Otherwise, there are certain dimension restrictions. However, the same blow-up trick as mentioned above allows to handle the case w2 (P) = 0 as well. Now fix an element c 2 H2 (X; Z) and let Mc ¼ t MðPc;d Þ d0
denote the disjoint union of all moduli spaces of anti-self-dual connections on principal PU(2)bundles Pc, d whose second Stiefel–Whitney class is Poincare´-dual to c modulo 2 and whose Pontrjagin number equals d (3=2)(bþ þ 1). Our assumption of bþ being odd corresponds to the fact that the dimension 2d of the moduli space M(c, d) is even and congruent to c2 þ (1=2)(1 þ bþ ) modulo 4. Neglecting the difficulties in the case w2 (P) = 0 mentioned above, we may use the cup product on H (Mc ) to extend to an algebra homomorphism : AðXÞ ! H ðMc Þ The Donaldson invariant Dc is nonzero only on elements z of A(X) whose total degree d is congruent to c2 þ (1=2)(1 þ bþ ) modulo 4. For such an element it is defined by Z Dc ðzÞ ¼ hðzÞ; MðPc;d Þi ¼ ðzÞ MðPc;d Þ
The Donaldson series Dc is defined as a formal power series ^ ¼ Dc ðÞ ¼ Dc ðexpðÞÞ
1 X ^d Þ D c ð d¼0
d!
for 2 H2 (X) and ˆ = (1 þ (x=2)). 2
8 xi
i¼1
The compactification M(P) is a stratified space and not usually a manifold. If w2 (P) 6¼ 0, then the singular set of codimension at least 2 and thus the space M(X) carries a fundamental class. In the case w2 (P) = 0, such a fundamental class in general can only be defined if p1 (P) > 4 þ 3bþ . In practice, this problem can be circumvented by blowing up X and considering bundles with w2 (P) 6¼ 0 over the 2 connected sum X#CP . Note that the complex
Computations and Structure Theorems The first results about these invariants are due to S Donaldson. He proved both a vanishing and a nonvanishing theorem (Donaldson and Kronheimer 1990): Theorem 1 If both bþ (X) > 0 and bþ (Y) > 0, then all Donaldson invariants vanish for the connected sum X#Y.
460 Gauge Theoretic Invariants of 4-Manifolds
Theorem 2 If c represents a divisor on a complex algebraic surface X and represents an ample divisor, then Dc ðr Þ 6¼ 0
for r 0
The second theorem is a consequence of the fact that in the case of an algebraic surface the instanton moduli can be described in algebraic geometric terms: the moduli space M(Pc, d ) associated to the metric induced from the Fubini–Study metric on CPn by an embedding X ,! CPn carries the structure of a projective variety. This variety is reduced and of complex dimension d, as soon as d is large enough. Furthermore, (d) is the first Chern class of an ample line bundle. The translation of instanton moduli into algebraic geometry uses two steps: suppose the first Chern class of a U(r)-principal bundle P on a Ka¨hler surface is also the first Chern class of a holomorphic line bundle. Then the absolute minima of the Yang–Mills functional are achieved by Hermite–Einstein connections. These are connections for which the Ricci curvature is a constant multiple of the identity. The second step, the translation from differential geometry into algebraic geometry, is called the Kobayashi–Hitchin correspondence, which again was proved by Donaldson. The Donaldson invariants have been computed for a number of 4-manifolds. A simply connected 4-manifold is said to have simple type, if the relation Dc ðx2 zÞ ¼ 4Dc ðzÞ is satisfied by its Donaldson invariant for all z 2 A(X) and c 2 H2 (X; Z). It is known that this simple type condition holds for many 4-manifolds. Indeed, it is an open question whether there are 4-manifolds which are not of simple type. For manifolds of simple type the Donaldson series Dc completely determines the Donaldson invariant Dc . A main result is due to Kronheimer and Mrowka (1995): Theorem 3 Let X be a simply connected 4-manifold of simple type. Then, there exist finitely many basic classes 1 , . . . , n 2 H2 (X; Z) such that Dc ¼ expðQ=2Þ
n X 2 ð1Þðc i cÞ=2 ai expði Þ i¼1
as analytic functions on H2 (X). The numbers ai are rational and each basic class i is characteristic, that is, it satisfies 2 Q(, i ) modulo 2 for all 2 H2 (X; Z). The homology class i in this formula acts on an arbitrary homology class by intersection. The geometric significance of the basic classes is underlined by the following theorem (Kronheimer and Mrowka 1995):
Theorem 4 If 2 H2 (X : Z) is represented by an embedded surface of genus g with self-intersection 2 2, then for each basic class the following adjunction inequality is satisfied: 2g 2 2 þ jQð; Þj There are many 4-manifolds for which the Donaldson series have been computed (Friedman and Morgan 1997). The basic classes for complete intersections, for example, are the canonical divisor and its negative. Another example is given by elliptic surfaces. Let E(n; p, q) be a minimal elliptic surface, that is, a holomorphic surface admitting a holomorphic map to CP1 with generic fiber f an elliptic curve. For any numbers n, p, and q with p < q coprime, there exists such a simply connected elliptic surface with Euler characteristic 12n and two multiple fibers of multiplicity p and q, respectively. The Donaldson series of E(n; p, q) for c = 0 then is given by Q sinhn ðf Þ D ¼ exp 2 sinhðf =pÞ sinhðf =qÞ Another important formula relates the Donaldson series D a manifold X of simple type and the ^ of the blow-up X#CP2 : Donaldson series D ^ c ¼ Dc expðe2 =2Þ coshðeÞ D ^ cþe ¼ Dc expðe2 =2Þ sinhðeÞ D Here e 2 H2 (CP2 ; Z) denotes a generator. Indeed, a more general blow-up formula is known which relates the Donaldson invariants for X and its blow-up even in case X is not of simple type. This formula, due to Fintushel and Stern (1996), involves Weierstraß sigma-functions. The instanton moduli space carries nontrivial information about 4-manifolds even in the case bþ (X) 1. However, one has to deal with singularities in the moduli space. Let us first consider the case bþ (X) = 0. If the intersection form on X is negative definite, the instanton moduli spaces in general are bound to have singularities. Indeed, Donaldson examined the case with the Pontrjagin number p1 (P) = 4 and w2 (P) = 0. In this case, the moduli space for a generic metric on X will be an orientable smooth manifold except at isolated singular points. The singularities are cones over CP2 and they correspond to reducible connections, that is, reductions of the structure group of P to U(1). These reductions are in bijective correspondence to pairs 2 H2 (X; Z) with 2 = 1. The Uhlenbeck compactification of the moduli space thus leads to an oriented cobordism between X and the disjoint union t CP2 over all pairs in H2 (X; Z) of square 1. As the signature of a
Gauge Theoretic Invariants of 4-Manifolds
manifold is an invariant of oriented cobordism, there have to be b many pairs of square (1) in H2 (X; Z) and, in particular, the intersection form Q is represented by the negative of the identity matrix (Donaldson 1983): Theorem 5 The intersection form on a differentiable manifold with negative-definite intersection form is diagonal. Indeed, from rank 8 on there are lots of definite unimodular forms which are not diagonal. By Freedman’s (1982) classification, any unimodular form is realized as the intersection form of a simply connected topological manifold. This theorem shows that most of these manifolds do not support differentiable structures. The case bþ (X) = 1 is also interesting. Here, the moduli space is a smooth manifold for a generic metric, giving rise to Donaldson invariants. However, over a smooth path of metrics, there is in general no smooth cobordism of moduli spaces. So the invariants depend on the chosen metric. The singularities in the cobordisms again correspond to classes in H2 (X; Z) with negative square. An analysis of these singularities leads to wall-crossing formulas describing how different choices of the metric do affect Donaldson invariants. The case of CP2 is special, as there are no elements of negative square in H2 (CP2 ; Z). The Donaldson invariants for CP2 as well as the wall-crossing formulas turn out to be closely related to modular forms (Go¨ttsche 2000).
In case the first Betti number vanishes, this map – after suitable Sobolev completion – becomes a map between Hilbert spaces : A ! C which is a compact deformation of a linear Fredholm map. The Weitzenbo¨ck formula can be used to show that preimages under of bounded sets in C are bounded in A. Furthermore, is U(1)-equivarant, where U(1) acts by complex multiplication on spinors and trivially on forms. If b1 (X) > 0, the monopole map is a map between Hilbert space bundles over the torus H 1 (X)=H 1 (X; Z). These properties of the monopole map allow for an interpretation in terms of stable homotopy (Bauer 2004): Theorem 6 If the first Betti number of X vanishes, then defines an element Uð1Þ
½ 2 i
ðS0 Þ
in an equivariant stable homotopy group of spheres. The index i = ind DA H2þ (X) as an element of the real representation ring RO(U(1)) is determind by the analytic index of the linearization of . In the case bþ (X) > 1, these equivariant stable homotopy groups can be identified with nonequivariant stable cohomotopy groups bþ 1 st (CPd1 ). Here, d denotes the index of the complex Dirac operator ind DA . Fixing an orienta2 tion of Hþ (X) results in a Hurewicz homomorphism þ
h : bst
1
ðCPd1 Þ ! H b
þ
1
A spinc -structure on an oriented Riemannian 4-manifold is a Spinc (4)-principal bundle P projecting to the orthonormal tangent frame bundle P over X through the group homomorphism Spinc (4) ! SO(4) with kernel U(1). The group H 2 (X; Z) acts freely and transitively on the set of all spinc -structures. A spinc -connection is a lift to P of the Levi-Civita connection on P. Fixing a background spinc connection A0 , the monopole map : ðA; Þ 7! DA ; FAþ ; d a is defined (Witten 1994) for spinc -connections A 2 A0 þ 1 (X; iR) and positive spinors . Here, DA denotes the complex Dirac operator associated to A and d a for a 2 1 (X; iR) is the adjoint of the de Rham differential on forms. The section of the traceless endomorphism bundle of positive spinors is viewed as a self-dual 2-form on X.
ðCPd1 ; ZÞ
If bþ (X) is odd, the image hð½Þ ¼ SWðXÞtðb
Monopoles and Seiberg–Witten Invariants
461
þ
1Þ=2
is an integer multiple of a power of the generator t 2 H 2 (CPd1 ; Z). This integer SW(X) is known as the Seiberg–Witten invariant (Witten 1994). This invariant alternatively can be defined by considering the moduli space M(a) = 1 (a). Assuming bþ > 0, this is a smooth oriented manifold with a free U(1)-action for generic a 2 1 (X; iR). The Seiberg–Witten invariant is the characteristic number obtainable by these data. In general, the stable homotopy invariant [] encodes global information about the monopole map, which cannot be recovered by only considering the moduli space. In case the spinc -structure is associated to an almostcomplex structure, however, there is a fortunate coincidence: the Hurewicz homomorphism in this case is an isomorphism. So for almost-complex spinc -structures, the invariants [] and SW carry the same information. The Seiberg–Witten invariants turn out to be directly computable for Ka¨hler manifolds and to some degree also for symplectic manifolds (Taubes 1994). Indeed,
462 Gauge Theoretic Invariants of 4-Manifolds
the following theorem follows from arguments of Witten and of Taubes:
of the maximal torus SU(2). Methods from equivariant K-theory lead to Furuta’s (2001) theorem:
Theorem 7 Let X be a 4-manifold with bþ > 1 and b1 = 0 which can be equipped with a Ka¨hler or a symplectic structure. If [] is nonvanishing for a spinc -structure on X, then the spinc -structure is associated to an almost-complex structure. For the canonical spinc -structure on X the Seiberg–Witten invariant is 1.
Theorem 9
Seiberg–Witten invariants and Donaldson invariants are closely related: Witten gave physical arguments that an equality of the form X D ¼ 2k expðQ=2Þ SWðÞ expðÞ
Both Donaldson invariants and Seiberg–Witten invariants to some extent satisfy formal properties which fit into a general conceptual framework known as ‘‘topological quantum field theories (TQFTs).’’ Such a TQFT in 3 þ 1 dimensions is a functor on the cobordism category of oriented 3-manifolds to the category of, say, vector spaces over a ground field: it assigns to an oriented 3-manifold Y a vector space h(Y). To a disjoint union it assigns
should hold for the Donaldson series for c = 0 of a simply connected manifold of simple type. Here, 2 H 2 (X; Z) denotes the first Chern class of the complex determinant line bundle. This first Chern class characterizes spinc -structures in the simply connected case. The number k is related to the signature and the Euler characteristic of the manifold X by the formula 4k ¼ 11 þ 7 þ 2 A mathematical proof of this formula is known in special cases (Feehan and Leness 2003). As is the case for Donaldson invariants, the Seiberg– Witten invariants vanish for connected sums X#Y if both bþ (X) > 0 and bþ (Y) > 0 holds. This is not the case for the stable homotopy refinement as follows from the following theorem (Bauer 2004). Theorem 8 For a connected sum X#Y of 4-manifolds the stable equivariant homotopy invariants are related by smash product ½X#Y ¼ ½X ^ ½Y As an example application, consider connected sums of elliptic surfaces of the form E(2n; p, q). Now suppose X and X0 are each connected sums of at most four copies of such elliptic surfaces. Then X and X0 are diffeomorphic if and only if the summands were already diffeomorphic. This contrasts to the fact that the connected sum E(2n; p, q)#CP2 is diffeomorphic to a connected sum 2 of 4n 1 copies of CP2 and 20n 1 copies of CP , independently of p and q. As a final application, we consider the case of spin manifolds. If the manifold X is spin, then the intersection form Q is even, that is, Q(, ) = 0 mod 2 for 2 H2 (X, Z). According to Rochlin’s theorem, the signature of a spin 4-manifold is divisible by 16. The monopole map for the spin structure admits additional symmetry. It is Pin(2)-equivariant. The nonabelian group Pin(2) appears as the normalizer
Let X be a spin 4-manifold. Then
ðXÞ > 54 j ðXÞj
Manifolds with Boundary
hðY1 t Y2 Þ ¼ hðY1 Þ hðY2 Þ Reversing orientation corresponds to dualizing hðYÞ ¼ hðYÞ Viewing a four-dimensional manifold X with boundary @X = Y 1 t Y2 formally as a morphism from Y1 to Y2 , this functor associates to X a homomorphism HðXÞ : hðY1 Þ ! hðY2 Þ that is, an element H(X) 2 h(Y 1 t Y2 ). The most important feature is the composition law HðX1 [Y X2 Þ ¼ HðX2 Þ HðX1 Þ So if a cobordism X from Y1 to Y2 can be decomposed as a cobordism X1 from Y1 to an intermediate submanifold Y and a cobordism X2 from Y to Y2 , then the homomorphism H(X) can be computed from H(X1 ) and H(X2 ) as their composition. Donaldson invariants and Seiberg–Witten invariants fit neatly into the framework of a TQFT if one restricts to 3-manifolds which are disjoint unions of homology 3-spheres. In both the instanton and the monopole case, the vector spaces h(Y) are Floer homology groups. The construction of Floer homology carries the Morse theory description of the homology of a finite-dimensional manifold over to an infinite-dimensional setting. In the instanton case, one considers the Chern–Simons function Z 1 2 CSðaÞ ¼ 2 tr a ^ da þ a ^ a ^ a 8 Y 3 This function is defined on the space of gauge equivalence classes of SU(2)-connections on Y. Note that for a homology 3-sphere, any SU(2) or PU(2)
Gauge Theories from Strings 463
principal bundle over Y is trival. Choosing a trivialization, a connection becomes identified with a Lie-algebra-valued 1-form a. Critical points for the Chern–Simons functional lead to generaters in a chain complex the homology of which then gives the Floer groups. Such critical points correspond to flat connections on Y. The Floer homology groups HF (Y) are Z=8-graded in the SU(2) case and Z=4-graded in the SO(3) case. If X is a 4-manifold with b1 (X) = 0 and bþ (X) > 1 and such that the boundary @X is a disjoint union of homology 3-spheres, then the Donaldson invariants are linear maps Dc : AðXÞ ! HF ð@XÞ These invariants satisfy a composition law on the subring of A(X) generated by two-dimensional homology classes (Donaldson 2002). In the monopole case, one considers a Chern– Simons–Dirac functional Z Z 1 CSDða; Þ ¼ h ; Da idvol a ^ da 2 Y Y and obtains integer graded Floer homology groups. Details and proofs of the relevant composition laws are announced. See also: Floer Homology; Four-Manifold Invariants and Physics; Gauge Theory: Mathematical Applications; Instantons: Topological Aspects; Moduli Spaces: An Introduction; Several Complex Variables: Basic Geometric Theory; Topological Quantum Field Theory: Overview.
Further Reading Bauer S (2004) Refined Seiberg–Witten invariants. In: Donaldson, et al. (eds.) Different Faces of Geometry. New York, NY: Kluwer Academic/Plenum Publishers. Donaldson SK (1983) An application of gauge theory to fourdimensional topology. Journal of Differential Geometry 18: 279–315. Donaldson SK (2002) Floer Homology Groups in Yang–Mills Theory. Cambridge: Cambridge University Press. Donaldson SK and Kronheimer PB (1990) The Geometry of FourManifolds. Oxford: Oxford University Press. Feehan PMN and Leness TG (2003) On Donaldson and Seiberg– Witten invariants. In: Topology and Geometry of Manifolds, Proc. Sympos. Pure Math. (Athens, GA, 2001), vol. 71. pp. 237–248. Providence, RI: American Mathematical Society. Fintushel R and Stern R (1996) The blow up formula for Donaldson invariants. Annals of Mathematics (2) 143: 529–546. Freedman MH (1982) The topology of 4-dimensional manifolds. Journal of Differential Geometry 17: 357–453. Freedman MH and Quinn F (1990) Topology of 4-Manifolds, Princeton Math. Ser., vol. 39. Princeton, NJ: Princeton University Press. Friedman R and Morgan JW (eds.) (1997) Gauge Theory and the Topology of Four-Manifolds, IAS/Park City Math. Ser., vol. 4. Providence, RI: American Mathematical Society. Furuta M (2001) Monopole equation and the 11/8-conjecture. Mathematical Research Letter 8: 363–400. Go¨ttsche L (2000) Donaldson invariants in algebraic geometry. School on Algebraic Geometry (Trieste 1999), 101–134. ICTP Lecture Notes 1, Trieste, 2000. Kronheimer PB and Mrowka T (1995) Embedded surfaces and the structure of Donaldson’s polynomial invariants. Journal of Differential Geometry 41: 573–735. Taubes CH (1994) The Seiberg–Witten invariants and sympletic forms. Mathematical Research Letter 1: 809–822. Witten E (1994) Monopoles and four-manifolds. Mathematics Research Letter 1: 769–796.
Gauge Theories from Strings P Di Vecchia, Nordita, Copenhagen, Denmark ª 2006 Elsevier Ltd. All rights reserved.
Introduction One of the most exciting properties of string theory, which led ten years ago to the formulation of the M theory as the unique theory unifying all interactions, has been the discovery that type II theories, besides a perturbative spectrum consisting of closed-string excitations, contain also a nonperturbative one consisting of ‘‘solitonic’’ p-dimensional objects called Dp branes. They are characterized by two important properties. They are coupled to closedstring states as the graviton, the dilaton, and the R–R (p þ 1)-form potential, and are described by a classical solution of the low-energy string effective action. Their dynamics is, on the other hand,
described by open strings having the endpoints attached to their world volume and therefore satisfying Dirichlet boundary conditions in the directions transverse to their world volume. This is the reason why they are called D (Dirichlet) branes. Since the lightest open-string excitation corresponds to a gauge field, they have a gauge theory living on their world volume. This twofold description of D-branes has opened the way to study both the perturbative and nonperturbative properties of the gauge theory living on their world volume from their dynamics in terms of closed strings. With the addition of the decoupling limit, these two properties have led to the Maldacena (1998) conjecture of the equivalence between the maximally supersymmetric and conformal N = 4 super Yang–Mills and type IIB string theory on AdS5 S5 . They have also been successfully applied to less supersymmetric and nonconformal gauge theories
464 Gauge Theories from Strings
that live on the world volume of fractional and wrapped branes. For general reviews of various approaches see Bertolini et al. (2000), Herzog et al. (2001), Bertolini (2003), Bigazzi et al. (2002), and Di Vecchia and Liccardo (2003). Also in these cases, one has constructed a classical solution of the supergravity equations of motion corresponding to these more sophisticated branes. These equations contain not only the supergravity fields present in the bulk ten-dimensional action but also boundary terms corresponding to the location of the branes. It turns out that in general the classical solution develops a naked singularity of the repulson type at short distances from the branes. This means that at short distances, it does not provide a reliable description of the branes. In the case of N = 2 supersymmetry, this can be explicitly seen because of the appearance of an enhanc˛on located at distances slightly higher than the naked singularity (Johnson et al. 2000). The enhanc˛on radius corresponds, in supergravity, to the distance where a brane probe becomes tensionless, and, in the gauge theory living on the branes, to the dynamically generated scale QCD . Then, since short distances in supergravity correspond to large distances in the gauge theory, as implied by holography, the presence of the enhanc˛on and of the naked singularity does not allow to get any information on the nonperturbative large-distance behavior of the gauge theory living on the D-branes. Above the radius of the enhanc˛on, instead, the classical solution provides a good description of the branes and therefore it can be used to get information on the perturbative behavior of the gauge theory. This shows that, if we want to use the D-branes for studying the nonperturbative properties of the gauge theory living on their world volume, we must construct a classical solution that has no naked singularity at short distances in supergravity. We will see in a specific example that it will be possible to deform the classical solution, eliminating the naked singularity, and use it to describe nonperturbative properties as the gaugino condensate. In this article, we review some of the results obtained by using fractional D3 branes of some orbifold and D5 branes wrapped on 2-cycles of some Calabi–Yau manifold. The analysis of the supersymmetric gauge theories living on the world volume of these D-branes will be based on the gauge/gravity relations that relate the gauge coupling constant and the -angle to the supergravity fields (see, e.g., reference Di Vecchia et al. (2005) for a derivation of them): Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4 1 2 detðGAB þ BAB Þ ½1 ¼ d e p ffiffiffiffi ffi g2YM gs ð2 0 Þ2 C2
and 1 YM ¼ 20 gs
Z
ðC2 þ C0 B2 Þ
½2
C2
where C2 is the 2-cycle where the branes are wrapped. In the next section, we will describe the case of the fractional D3 branes of the orbifold C2 =Z2 and show that the classical solution corresponding to a system of N D3 and M D7 branes reproduces the perturbative behavior of N = 2 super-QCD. Then, we will consider D5 branes wrapped on 2cycles of a Calabi–Yau manifold described by the Maldacena–Nu´n˜ez classical solution (Maldacena and Nu´n˜ez 2001, Chamseddine and Volkov 1997) and show that in this case we are able to reproduce the phenomenon of gaugino condensate and to construct the complete -function of N = 1 super Yang–Mills.
Fractional D3 Branes of the Orbifold C2 =Z2 and N = 2 Super-QCD In this section, we consider fractional D3 and D7 branes of the noncompact orbifold C2 =Z2 in order to study the properties of N = 2 super-QCD. We group the coordinates of the directions (x4 , . . . , x9 ) transverse to the world volume of the D3 brane where the gauge theory lives, into three complex quantities: z1 = x4 þ ix5 , z2 = x6 þ ix7 , z3 = x8 þ ix9 . The nontrivial generator h of Z2 acts as z2 ! z2 , z3 ! z3 , leaving z1 invariant. This orbifold has one fixed point, located at z2 = z3 = 0 and corresponding to a vanishing 2-cycle. Fractional D3 branes are D5 branes wrapped on the vanishing 2-cycle and therefore are, unlike bulk branes, stuck at the orbifold fixed point. By considering N fractional D3 and M fractional D7 branes of the orbifold C2 =Z2 , we are able to study N = 2 super-QCD with M hypermultiplets. In order to do that, we need to determine the classical solution corresponding to the previous brane configuration. For the case of the orbifold C2 =Z2 , the complete classical solution is found in Bertolini et al. (2002b); see also references therein and Bertolini et al. (2000) for a review on fractional branes. In the following, we write it explicitly for a system of N fractional D3 branes with their world volume along the directions x0 , x1 , x2 , and x3 and M fractional D7 branes containing the D3 branes in their world volume and having the remaining four world-volume directions along the orbifolded ones. The metric,
Gauge Theories from Strings 465
the 5-form field strength, the axion, and the dilaton are given by ds2 ¼ H 1=2 dx dx þ H 1=2 ‘m dx‘ dxm þ e ij dxi dxj eð5Þ ¼ d H 1 dx0 ^ ^ dx3 F þ d H 1 dx0 ^ ^ dx3 Mgs z log
C0 þ ie ¼ i 1 2 4
5
z x þ ix ¼ ye
1 ½3
½4
½5
i
be
pffiffiffiffiffi ð2 0 Þ2 2N M y gs log 1þ ¼ 2
½6
c þ C0 b ¼ 20 gs ð2N MÞ The expression of H (Kirsch and Vaman 2005) shows that the previous solution has a naked singularity of the repulson type at short distances. On the other hand, if we use a brane probe approaching from infinity the stack of branes, described by the previous classical solution, it can also be seen that the tension of the probe vanishes at a distance that is larger than that of the naked singularity. The point where the probe brane becomes tensionless is called ‘‘enhanc˛on’’ (Johnson et al. 2000) and at this point the classical solution does not describe anymore the stack of fractional branes. Let us now use the gauge/gravity relations given in the introduction, to determine the coupling constants of the world-volume theory from the supergravity solution. In the case of fractional D3 branes of the orbifold C2 =Z2 , that is characterized by one single vanishing 2-cycle C2 , the gauge coupling constant given in eqn [1] reduces to 1 g2YM
¼
1
pffiffiffiffiffi 4gs ð2 0 Þ2
Z C2
e B2
g2YM
¼
1 2N M y2 þ log 8gs 162 2
½8
YM ¼ ð2N MÞ
~(5) is given in where the self-dual field strength F terms of the NS–NS and R–R 2-forms B2 and C2 ~(5) = dC4 þ C2 ^ and of the 4-form potential C4 by F dB2 . The warp factor H is a function of the coordinates (x4 , . . . , x9 ) and is an infrared cutoff. We denote by and the four directions corresponding to the world volume of the fractional D3 brane, by ‘ and m those along the four orbifolded directions x6 , x7 , x8 , and x9 , and by i and j the directions x4 and x5 that are transverse to both the D3 and the D7 branes. The twisted fields are instead given by B2 = !2 b, C2 = !2 c where !2 is the volume form of the vanishing 2-cycle and
By inserting the classical solution in eqns [7] and [2], we get the following expressions for the gauge coupling constant and the YM angle (Bertolini et al. 2002b):
½7
Notice that the gauge coupling constant appearing in the previous equation is the ‘‘bare’’ gauge coupling constant computed at the scale m y=0 , while the square of the bare gauge coupling constant computed at the cutoff =0 is equal to 8gs . In the case of an N = 2 supersymmetric gauge theory, the gauge multiplet contains a complex scalar field that corresponds to the complex coordinate z transverse to both the world volume of the D3 brane and the four orbifolded directions: z=20 . This is another example of holographic identification between a quantity, , peculiar of the gauge theory living on the fractional D3 branes and another one, the coordinate z, peculiar of supergravity. It allows one to obtain the gauge theory anomalies from the supergravity background. In fact, since we know how the scale and U(1) transformations act on , from the previous gauge/ gravity relation we can deduce how they act on z, namely ! se2i () z ! se2i z ¼) y ! sy ! þ 2
½9
Those transformations do not leave invariant the supergravity background in eqn [6] and when we use them in eqns [7] and [2], they generate the anomalies of the gauge theory living on the fractional D3 branes. In fact, by acting with those transformations in eqns [8], we get 1 1 2N M ! 2 þ log s 82 g2YM gYM
½10
YM ! YM 2ð2N MÞ The first equation generates the -function of N = 2 super-QCD with M hypermultiplets given by ðgYM Þ ¼
2N M 3 g 162 YM
½11
while the second one reproduces the chiral U(1) anomaly (Klebanov et al. 2002, Bertolini et al. 2002a). In particular, if we choose = 2=(2(2N M)), then YM is shifted by a factor 2. But since YM is periodic of 2, this means that the subgroup Z2(2NM) is not anomalous in perfect agreement with the gauge theory results.
466 Gauge Theories from Strings
Wrapped D5 Branes and N = 1 Super Yang–Mills In this section, we will consider the classical solution corresponding to N D5 branes wrapped on a 2-cycle of a noncompact Calabi–Yau space and we use it to study the properties of the gauge theory living on their world volume that can be shown to be N = 1 super Yang–Mills. We start by writing the classical solution found in Maldacena and Nu´n˜ez (2001) and Chamseddine and Volkov (1997). It has a nontrivial metric:
e2h e 2 2 2 e 2 2 e ds10 ¼ e dx1;3 þ 2 d þ sin d’ 3 X e 2 a a 2 þ 2 d þ ð A Þ ½12 a¼1 a 2-form R–R potential
1 h e Cð2Þ ¼ 2 ð þ 0 Þ sin 0 d0 ^ d sin ede ^ d’ 4 i e cos 0 cos ed ^ d’ i a h e ^ 2 þ 2 de ^ 1 sin ed’ ½13 2 and a dilaton e2 ¼
sinh 2 2eh
½14
where e2h ¼ coth 2
2 2
sinh 2
1 4
sinh 2 e2k ¼ eh 2 2 a¼ sinh 2
½15
and A1 ¼
1 e aðrÞ sin ed’ 2 1 e A3 ¼ cos ed’ 2
½16
with r and 2 = Ngs 0 . The left-invariant 1-forms of S3 are i 1h cos d0 þ sin 0 sin d 2 i 1h 2 ¼ sin d0 sin 0 cos d 2 i 1h 3 ¼ d þ cos 0 d 2
~ ¼ ; ~ ¼ 0 ’
1 ¼
½17
¼0
½18
keeping fixed. If we now compute the gauge couplings on the previous cycle with B2 = C0 = 0, we get 42 ¼ coth 2 þ 12 að Þ cos Ng2YM
½19
and 1 YM ¼ 2gs 0
Z S2
C2 ¼ N ð þ að Þ sin þ
0Þ
½20
where we have kept 6¼ 0 for reasons that will become clear in a moment. Equation [19] shows that the coupling constant is running as a function of the distance from the branes. In order to obtain the correct running of the gauge theory, we have to find a relation between and the renormalization group scale . This can be obtained with the following considerations. If we look at the previous solution, it is easy to see that the metric in eqn [12] is invariant under the following transformations: ! !
1 aðrÞ de 2
A2 ¼
with 0 0 , 0 2, and 0 4. The variables ˜ and ’˜ describe a two-dimensional sphere and vary in the range 0 ˜ and 0 ’˜ 2. Before proceeding, here we want to stress the fact that the presence of the function a( ) 6¼ 0 makes the solution regular everywhere. This will allow us to use it later on to describe the nonperturbative gaugino condensate property of N = 1 super Yang–Mills. We can now use the previous solution for computing the running coupling constant and the parameter of N = 1 super Yang–Mills (see Di Vecchia et al. (2002), Bertolini and Merlatti (2003), and Mu¨ck (2003) reviewed in Bertolini (2003), Di Vecchia and Liccardo (2003), and Imeroni (2003)). In order to do that, we have to fix the cycle on which to perform the integrals in eqns [1] and [2]. It turns out that this 2-cycle is specified by
þ 2 if a 6¼ 0 þ 2 if a ¼ 0
½21
where is an arbitrary constant. On the other hand, C2 is not invariant under the previous transformations, but its flux, that is exactly equal to YM in eqn [20], changes by an integer multiple of 2: Z 1 C2 ! YM YM ¼ 20 gs C2 ( 2N; if a 6¼ 0 þ ½22 2N; if a ¼ 0; ¼ k N But since the physics does not change when YM ! YM þ 2, one gets that the transformation in eqn [22] is an invariance. Notice that also eqn [19] for
Gauge Theories from Strings 467
the gauge coupling constant is invariant under the transformation in eqn [21]. The previous considerations show that the classical solution and also the gauge couplings are invariant under the Z2 transformation if a 6¼ 0, while this symmetry becomes Z2N if a is taken to be zero. As a consequence, since in the ultraviolet a( ) is exponentially small, we can neglect it and we have a Z2N symmetry, while in the infrared we cannot neglect a( ) anymore and we have only a Z2 symmetry left. This fits very well with the fact that N = 1 super Yang–Mills has a nonzero gaugino condensate < > that is responsible for the breaking of Z2N into Z2 . Therefore, it is natural to identify the gaugino condensate precisely with the function a( ) 6¼ 0 that makes the classical solution regular also at short distances in supergravity (Di Vecchia et al. 2002, Apreda et al. 2002): < > 3 ¼ 3 að Þ
½23
This provides the relation between the renormalization group scale and the supergravity spacetime parameter . In the ultraviolet (large ) a( ) is exponentially suppressed and in eqns [19] and [20] we can neglect it obtaining 42 ¼ coth 2 Ng2YM YM ¼ Nð þ
½24 0Þ
The chiral anomaly can be obtained by performing the transformation ! þ 2 and getting YM ! YM 2N
½25
This implies that the Z2N transformations corresponding to = k=N are symmetries because they shift YM by multiples of 2. In general, however, eqns [19] and [20] are only invariant under the Z2 subgroup of Z2N corresponding to the transformation !
þ 2
½26
that changes YM in eqn [20] as follows: YM ! YM 2N
½27
leaving invariant the gaugino condensate: < 2 > ¼ 3
162 82 =Ng2 iYM =N YM e e 3Ng2YM
½28
Therefore, the chiral anomaly and the breaking of Z2N to Z2 are encoded in eqns [19] and [20]. Finally, if we put = 0 in eqn [19], we get 42 ¼ coth 2 12 að Þ ¼ tanh Ng2YM
½29
This equation taken together with eqn [23] allows us to determine the running coupling constant as a function of . From it, we get (Di Vecchia et al. 2002, Di Vecchia and Liccardo 2003) the Novikov–Shifman–Vainshtein– Zacharov (NSVZ) -function plus nonperturbative corrections due to fractional instantons:
ðgYM Þ ¼
3 Ng3YM 162
2 2 1þ 42 sinh2 42 NgYM NgYM 2 2 NgYM 1 2 4 1 2 þ sinh 2 2 8 NgYM
½30
where in the ultraviolet we have approximated
with 42 =(Ng2YM ) coth 42 =(Ng2YM ). See also: AdS/CFT Correspondence; Anomalies; BF Theories; Brane Construction of Gauge Theories; Gauge Theory: Mathematical Applications; Noncommutative Geometry from Strings; Nonperturbative and Topological Aspects of Gauge Theory; Perturbation Theory and its Techniques; Seiberg–Witten Theory; Superstring Theories.
Further Reading Apreda R, Bigazzi F, Cotrone AL, Petrini M, and Zaffaroni A (2002) Some comments on N = 1 gauge theories from wrapped branes. Physics Letters B 536: 161–168 (hep-th/0112236). Bertolini M (2003) Four lectures on the gauge/gravity correspondence. International Journal of Modern Physics A18: 5647–5712 (hep-th/0303160). Bertolini M, Di Vecchia P, Frau M, Lerda A, and Marotta R (2002a) More anomalies from fractional branes. Physics Letters B 540: 104–110 (hep-th/0202195). Bertolini M, Di Vecchia P, Frau M, Lerda A, and Marotta R (2002b) N = 2 gauge theories on systems of fractional D3/D7 branes. Nuclear Physics B 621: 157–178 (hep-th/0107057). Bertolini M, Di Vecchia P, and Marotta R (2000) N = 2 fourdimensional gauge theories from fractional branes. In: Olshanetsky M and Vainshtein A (eds.) Multiple Facets of Quantization and Supersymmetry, pp. 730–773. (hep-th/ 0112195). Singapore: World Scientific. Bertolini M and Merlatti P (2003) A note on the dual of N = 1 super Yang–Mills theory. Physics Letters B 556: 80–86 (hep-th/ 0211142). Bigazzi F, Cotrone AL, Petrini M, and Zaffaroni A (2002) Supergravity duals of supersymmetric four dimensional gauge theories. Rivista del Nuovo Cimento 25N12: 1–70 (hep-th/0303191). Chamseddine AH and Volkov MS (1997) Non-abelian BPS monopoles in N = 4 gauged supergravity. Physical Review Letters 79: 3343–3346 (hep-th/9707176). Di Vecchia P, Lerda A, and Merlatti P (2002) N = 1 and N = 2 super Yang–Mills theories from wrapped branes. Nuclear Physics B 646: 43–68 (hep-th/0205204). Di Vecchia P and Liccardo A (2003) Gauge theories from D branes, Lectures given at the School ‘‘Frontiers in Number Theory, Physics and Geometry,’’ Les Houches, hep-th/0307104. Di Vecchia P, Liccardo A, Marotta R, and Pezzella F (2005) On the gauge/gravity correspondence and the open/closed string duality. International Journal of Modern Physics A 20: 4699–4796. hep-th/0503156.
468 Gauge Theory: Mathematical Applications Herzog CP, Klebanov IR, and Ouyang P (2001) D-branes on the conifold and N = 1 gauge/gravity dualities. Based on I.R.K.’s Lectures at the Les Houches Summer School Session 76, ‘‘Gravity, Gauge Theories, and Strings,’’ hep-th/0205100. Imeroni E (2003) Studies of the Gauge/String Theory Correspondence. Ph.D. thesis, hep-th/0312070. Johnson CV, Peet AW, and Polchinski J (2000) Gauge theory and the excision of repulson singularities. Physical Review D 61: 086001 (hep-th/9911161). Kirsch I and Vaman D (2005) The D3/D7 background and flavour dependence of Regge trajectories, hep-th/0505164.
Klebanov IR, Ouyang P, and Witten E (2002) A gravity dual of the chiral anomaly. Physical Review D 65: 105007 (hep-th/0202056). Maldacena J (1998) The large N limit of superconformal field theories and supergravity. Advances in Theoretical and Mathematical Physics 2: 231–252 (hep-th/9711200). Maldacena J and Nu´n˜ez C (2001) Towards the large N limit of N = 1 super Yang Mills. Physical Review Letters 86: 588–591 (hep-th/0008001). Mu¨ck W (2003) Perturbative and nonperturbative aspects of pure N = 1 super Yang–Mills theory from wrapped branes. Journal of High Energy Physics 0302: 013 (hep-th/0301171).
Gauge Theory: Mathematical Applications S K Donaldson, Imperial College, London, UK ª 2006 Elsevier Ltd. All rights reserved.
Introduction This article surveys some developments in pure mathematics which have, to varying degrees, grown out of the ideas of gauge theory in mathematical physics. The realization that the gauge fields of particle physics and the connections of differential geometry are one and the same has had wide-ranging consequences, at different levels. Most directly, it has led mathematicians to work on new kinds of questions, often shedding light later on well-established problems. Less directly, various fundamental ideas and techniques, notably the need to work with the infinite-dimensional gauge symmetry group, have found a place in the general world-view of many mathematicians, influencing developments in other fields. Still less directly, the work in this area – between geometry and mathematical physics – has been a prime example of the interaction between these fields which has been so fruitful since the 1970s. The body of this article is divided into three sections: roughly corresponding to analysis, geometry, and topology. However, the different topics come together in many different ways: indeed the existence of these links between the topics is one of the most attractive features of the area. Gauge Transformations
For a review of the usual foundational material on connections, curvature, and related differential geometric constructions, the reader is referred to standard texts. We will, however, briefly recall the notions of gauge transformations and gauge fixing. The simplest case is that of abelian gauge theory – connections on a U(1)-bundle, say over R3 . In that case the connection form, representing the connection in a local trivialization, is a pure
imaginary 1-form A, which can also be identified with a vector field A. The curvature of the connection is the 2-form dA. Changing the local trivialization by a U(1)-valued function g = ei changes the connection form to ~ ¼ A dgg1 ¼ A id A The forms A, A˜ are two representations of the same geometric object: just as the same metric can be represented by different expressions in different coordinate systems. One may want to fix this choice of representation, usually by choosing A to satisfy the Coulomb gauge condition d A = 0 (equivalently div A = 0), supplemented by appropriate boundary conditions. Here we are using the standard Euclidean metric on R3 . (Throughout this article we will work with positive-definite metrics, regardless of the fact that – at least at the classical level – the Lorentzian signature may have more obvious bearing on physics.) Arranging this choice of gauge involves solving a linear partial differential equation (PDE) for . The case of a general structure group G is not much different. The connection form A now takes values in the Lie algebra of G and the curvature is given by the expression F ¼ dA þ 12½A; A The change of bundle trivialization is given by a G-valued function and the resulting change in the connection form is A~ ¼ gAg1 dgg1 (Our notation here assumes that G is a matrix group, but this is not important.) Again, we can seek to impose the Coulomb gauge condition d A = 0, but now we cannot linearize this equation as before. We can carry the same ideas over to a global problem, working on a G-bundle P over a general
Gauge Theory: Mathematical Applications
Riemannian manifold M. The space of connections on P is an affine space A: any two connections differ by a bundle-valued 1-form. Now the gauge group G of automorphisms of P acts on A and, again, two connections in the same orbit of this action represent essentially the same geometric object. Thus, in a sense we would really like to work on the quotient space A=G. Working locally in the space of connections, near to some A0 , this is quite straightforward. We represent the nearby connections as A0 þ a, where a satisfies the analog of the coulomb condition dA a ¼ 0 Under suitable hypotheses, this condition picks out a unique representative of each nearby orbit. However, this gauge-fixing condition need not single out a unique representative if we are far away from A0 : indeed, the space A=G typically has, unlike A, a complicated topology which means that it is impossible to find any such global gauge-fixing condition. As noted above, this is one of the distinctive features of gauge theory. The gauge group G is an infinitedimensional group, but one of a comparatively straightforward kind – much less complicated than the diffeomorphism groups relevant in Riemannian geometry for example. One could argue that one of the most important influences of gauge theory has been to accustom mathematicians to working with infinite-dimensional symmetry groups in a comparatively simple setting.
Analysis and Variational Methods The Yang–Mills Functional
A primary object brought to mathematicians attention by physics is the Yang–Mills functional Z YMðAÞ ¼ jFA j2 d M
Clearly, YM(A) is non-negative and vanishes if and only if the connection is flat: it is broadly analogous to functionals such as the area functional in minimal submanifold theory, or the energy functional for maps. As such, one can fit into a general framework associated with such functionals. The Euler– Lagrange equations are the Yang–Mills equations dA FA ¼ 0 For any solution (a Yang–Mills connection), there is a ‘‘Jacobi operator’’ HA such that the second variation is given by YMðA þ taÞ ¼ YMðAÞ þ t2 hHA a; ai þ Oðt3 Þ
469
The omnipresent phenomenon of gauge invariance means that Yang–Mills connections are never isolated, since we can always generate an infinitedimensional family by gauge transformations. Thus, as explained in the last section, one imposes the gauge-fixing condition dA a = 0. Then the operator HA can be written as HA a ¼ A a þ ½FA ; a where A is the bundle-valued ‘‘Hodge Laplacian’’ dA dA þ dA dA and the expression [FA , a] combines the bracket in the Lie algebra with the action of 2 on 1 . This is a self-adjoint elliptic operator and, if M is compact, the span of the negative eigenspaces is finite dimensional, the dimension being defined to be the index of the Yang–Mills connection A. In this general setting, a natural aspiration is to construct a ‘‘Morse theory’’ for the functional. Such a theory should relate the topology of the ambient space to the critical points and their indices. In the simplest case, one could hope to show that for any bundle P there is a Yang–Mills connection with index 0, giving a minimum of the functional. More generally, the relevant ambient space here is the quotient A=G and one might hope that the rich topology of this is reflected in the solutions to the Yang–Mills equations. Uhlenbeck’s Theorem
The essential foundation needed to underpin such a ‘‘direct method’’ in the calculus of variations is an appropriate compactness theorem. Here the dimension of the base manifold M enters in a crucial way. Very roughly, when a connection is represented locally in a Coulomb gauge, the Yang–Mills action combines the L2 -norm of the derivative of the connection form A with the L2 -norm of the quadratic term [A, A]. The latter can be estimated by the L4 -norm of A. If dim M 4, then the Sobolev inequalities allow the L4 -norm of A to be controlled by the L2 -norm of its derivative, but this is definitely not true in higher dimensions. Thus, dim M = 4 is the ‘‘critical dimension’’ for this variational problem. This is related to the fact that the Yang–Mills equations (and Yang–Mills functional) are conformally invariant in four dimensions. For any nontrivial Yang–Mills connection over the 4-sphere, one generates a one-parameter family of Yang–Mills connections, on which the functional takes the same value, by applying conformal transformations corresponding to dilations of R4 . In such a family of connections the integrand jFA j2 – the ‘‘curvature density’’ – converges to a -function
470 Gauge Theory: Mathematical Applications
at the origin. More generally, one can encounter sequences of connections over 4-manifolds for which YM is bounded but which do not converge, the Yang–Mills density converging to -functions. There is a detailed analogy with the theory of the harmonic maps energy functional, where the relevant critical dimension (for the domain of the map) is 2. The result of Uhlenbeck (1982), which makes these ideas precise, considers connections over a ball Bn Rn . If the exponent p 2n, then there are positive constants (p, n), C(p, n) > 0 such that any connection with kFkLp (Bn ) can be represented in Coulomb gauge over the ball, by a connection form which satisfies the condition d A = 0, together with certain boundary conditions, and kAkLp CkFkLp 1
In this Coulomb gauge, the Yang–Mills equations are elliptic and it follows readily that, in this setting, if the connection A is Yang–Mills one can obtain estimates on all derivatives of A. Instantons in Four Dimensions
This result of Uhlenbeck gives the analytical basis for the direct method of the calculus of variations for the Yang–Mills functional over base manifolds M of dimension 3. For example, any bundle over such a manifold must admit a Yang–Mills connection, minimizing the functional. Such a statement is definitely false in dimensions 5. For example, an early result of Bourguignon and Lawson (1981) and Simons asserts that there is no minimizing connection on any bundle over Sn for n 5. The proof exploits the action of the conformal transformations of the sphere. In the critical dimension 4, the situation is much more complicated. In four dimensions, there are the renowned ‘‘instanton’’ solutions of the Yang–Mills equation. Recall that if M is an oriented 4-manifold the Hodge -operation is an involution of 2 T M which decomposes the two forms into self-dual and anti-self-dual parts, 2 T M = þ . The curvature of a connection can then be written as FA ¼ FAþ þ FA and a connection is a self-dual (respectively anti-self-dual) instanton if FA (respectively FAþ ) is 0. The Yang–Mills functional is YMðAÞ ¼ kFAþ k2 þ kFA k2 while the difference kFAþ k2 kFA k2 is a topological invariant (P) of the bundle P, obtained by evaluating a four-dimensional characteristic class
on [M]. Depending on the sign of (P), the selfdual or anti-self-dual connections (if any exist) minimize the Yang–Mills functional among all connections on P. These instanton solutions of the Yang–Mills equations are analogous to the holomorphic maps from a Riemann surface to a Ka¨hler manifold, which minimize the harmonic maps energy functional in their homotopy class. Moduli Spaces
The instanton solutions typically occur in ‘‘moduli spaces.’’ To fix ideas, let us consider bundles with structure group SU(2), in which case (P) = 82 c2 (P). For each k > 0, we have a moduli space Mk of anti-self-dual instantons on a bundle Pk ! M4 , with c2 (Pk ) = k. It is a manifold of dimension 8k 3. The general goal of the calculus of variations in this setting is to relate three things: 1. the topology of the space A=G of equivalence classes of connections on Pk ; 2. the topology of the moduli space Mk of instantons; and 3. the existence and indices of other, nonminimal, solutions to the Yang–Mills equations on Pk . In this direction, a very influential conjecture was made by Atiyah and Jones (1978). They considered the case when M = S4 and, to avoid certain technicalities, work with spaces of ‘‘framed’’ connections, dividing by the restricted group G0 of gauge transformations equal to the identity at infinity. Then, for any k, the quotient A=G0 is homotopy equivalent to the third loop space 3 S3 of based maps from the 3-sphere to itself. The ~ k is a corresponding ‘‘framed’’ moduli space M manifold of dimension 8k (a bundle over Mk with fiber SO(3)). Atiyah and Jones conjectured that ~ k ! A=G0 induces an isomorphism the inclusion M of homotopy groups l in a range of dimensions l l(k), where l(k) increases with k. This would be consistent with what one might hope to prove by the calculus of variations if there were no other Yang– Mills solutions, or if the indices of such solutions increased with k. The first result along these lines was due to Bourguignon and Lawson (1981), who showed that the instanton solutions are the only local minima of the Yang–Mills functional over the 4-sphere. Subsequently, Taubes (1983) showed that the index of an non-instanton Yang–Mills connection Pk is at least k þ 1. Taubes’ proof used ideas related to the action of the quaternions and the hyper-Ka¨hler structure on ~ k (see the section on hyper-Ka¨hler quotients). the M Contrary to some expectations, it was shown by
Gauge Theory: Mathematical Applications
Sibner et al. (1989) that nonminimal solutions do exist; some later constructions were very explicit (Sadun and Segert 1992). Taubes’ index bound gave ground for hope that an analytical proof of the Atiyah–Jones conjecture might be possible, but this is not at all straightforward. The problem is that in the critical dimension 4 a mini–max sequence for the Yang–Mills functional in a given homotopy class may diverge, with curvature densities converging to sums of -functions as outlined above. This is ~ k are not compact. In related to the fact that the M a series of papers culminating in a framework for Morse theory for Yang–Mills functional, Taubes (1998) succeeded in proving a partial version of the Atiyah–Jones conjecture, together with similar results for general base manifolds M4 . Taubes showed that, if the homotopy groups of the moduli spaces stabilize as k ! 1, then the limit must be that predicted by Atiyah and Jones. Related analytical techniques were developed for other variational problems at the critical dimension involving ‘‘critical points at infinity.’’ The full Atiyah–Jones conjecture was established by Boyer et al. but using geometrical techniques: the ‘‘explicit’’ description of the moduli spaces obtained from the Atiyah–Drinfeld–Hitchin– Manin (ADHM) construction (see below). A different geometrical proof was given by Kirwan (1994), together with generalizations to other gauge groups. There was a parallel story for the solutions of the Bogomolony equation over R3 , which we will not recount in detail. Here the base dimension is below the critical case but the analytical difficulty arises from the noncompactness of R3 . Taubes succeeded in overcoming this difficulty and obtained relations between the topology of the moduli space, the appropriate configuration space and the higher critical points. Again, these higher critical points exist but their index grows with the numerical parameter corresponding to k. At about the same time, Donaldson (1984) showed that the moduli spaces could be identified with spaces of rational maps (subsequently extended to other gauge groups). The analog of the Atiyah–Jones conjecture is a result on the topology of spaces of rational maps proved earlier by Segal, which had been one of the motivations for Atiyah and Jones. Higher Dimensions
While the scope for variational methods in Yang– Mills theory in higher dimensions is very limited, there are useful analytical results about solutions of the Yang–Mills equations. An important monotonicity result was obtained by Price (1983). For simplicity, consider a Yang–Mills connection over
471
the unit ball Bn Rn . Then Price showed that the normalized energy Z 1 jFj2 d EðA; BðrÞÞ ¼ n4 r jxjr decreases with r. Nakajima (1988) and Uhlenbeck used this monotonicity to show that for each n there is an such that if A is a Yang–Mills connection over a ball with E(A, B(r)) then all derivatives of A, in a suitable gauge, can be controlled by E(A, B(r)). Tian (2000) showed that if Ai is a sequence of Yang–Mills connections over a compact manifold M with bounded Yang–Mills functional, then there is a subsequence which converges away from a set Z of Haussdorf codimension at least 4 (extending the case of points in a 4-manifold). Moreover, the singular set Z is a minimal subvariety, in a suitably generalized sense. In higher dimensions, important examples of Yang–Mills connections arise within the framework of ‘‘calibrated geometry.’’ Here, we consider a Riemannian n-manifold M with a covariant constant calibrating form 2 n4 M . There is then an analog of the instanton equation FA ¼ ð ^ FA Þ whose solutions minimize the Yang–Mills functional. This includes the Hermitian Yang–Mills equation over a Ka¨hler manifold (see the section on moment maps) and also certain equations over manifolds with special holonomy groups (Donaldson and Thomas 1998). For these ‘‘higher-dimensional instantons,’’ Tian shows that the singular sets Z that arise are calibrated varieties. Gluing Techniques
Another set of ideas from PDEs and analysis which has had great impact in gauge theory involves the construction of solutions to appropriate equations by the following general scheme: 1. constructing an ‘‘approximate solution,’’ formed from some standard models using cutoff functions; 2. showing that the approximate solution can be deformed to a true solution by means of an implicit function theorem. The heart of the second step usually consists of estimates for the relevant linear differential operator. Of course, the success of this strategy depends on the particular features of the problem. This approach, due largely to Taubes, has been particularly effective in finding solutions to the first-order instanton equations and their relatives. (The applicability of the approach is connected to the fact that
472 Gauge Theory: Mathematical Applications
such solutions typically occur in moduli spaces and one can often ‘‘see’’ local coordinates in the moduli space by varying the parameters in the approximate solution.) Taubes applied this approach to the Bogomolny monopole equation over R3 (Jaffe and Taubes 1980) and to construct instantons over general 4-manifolds (Taubes 1982). In the latter case, the approximate solutions are obtained by transplanting standard solutions over R4 – with curvature density concentrated in a small ball – to small balls on the 4-manifold, glued to the trivial flat connection over the remainder of the manifold. These types of techniques have now become a fairly standard part of the armory of many differential geometers, working both within gauge theory and other fields. An example of a problem where similar ideas have been used is Joyce’s construction of constant of manifolds with exceptional holonomy groups (Joyce 1996). (Of course, it is likely that similar techniques have been developed over the years in many other areas, but Taubes’ work in gauge theory has done a great deal to bring them into prominence.)
Geometry: Integrability and Moduli Spaces The Ward Correspondence
Suppose that S is a complex surface and ! is the 2-form corresponding to a Hermitian metric on S. Then S is an oriented Riemannian 4-manifold and ! is a self-dual form. The orthogonal complement of ! in þ can be identified with the real parts of forms of type (0, 2). Hence, if A is an anti-self-dual instanton connection on a principle U(r)-bundle over S the (0, 2) part of the curvature of A vanishes. This is the integrability condition for the @-operator defined by the connection, acting on sections of the associated vector bundle E ! S. Thus, in the presence of the connection, the bundle E is naturally a holomorphic bundle over S. The Ward correspondence (Ward 1877) builds on this idea to give a complete translation of the instanton equations over certain Riemannian 4-manifolds into holomorphic geometry. In the simplest case, let A be an instanton on a bundle over R4 . Then, for any choice of a linear complex structure on R4 , compatible with the metric, A defines a holomorphic structure. The choices of such a complex structure are parametrized by a 2-sphere; in fact, the unit sphere in þ (R4 ). So, for any 2 S2 we have a complex surface S and a holomorphic bundle over S . These data can be viewed in the following way. We consider the
projection : R4 S2 ! R4 and the pull-back (E) to R4 S2 . This pullback bundle has a connection which defines a holomorphic structure along each fiber S R4 S2 of the other projection. The product R4 S2 is the twistor space of R4 and it is in a natural way a three-dimensional complex manifold. It can be identified with the complement of a line L1 in CP3 where the projection R4 S2 ! S2 becomes the fibration of CP3 nL1 by the complex planes through L1 . One can see then that (E) is naturally a holomorphic bundle over CP3 n L1 . The construction extends to the conformal compactification S4 of R4 . If S4 is viewed as the quaternionic projective line HP1 and we identify H 2 with C4 in the standard way, we get a natural map : CP3 ! HP1 . Then CP3 is the twistor space of S4 and an anti-self-dual instanton on a bundle E over S4 induces a holomorphic structure on the bundle (E) over CP3 . In general, the twistor space Z of an oriented Riemannian 4-manifold M is defined to be the unit sphere bundle in þ M . This has a natural almostcomplex structure which is integrable if and only if the self-dual part of the Weyl curvature of M vanishes (Atiyah et al. 1978). The antipodal map on the 2-sphere induces an antiholomorphic involution of Z. In such a case, an anti-self-dual instanton over M lifts to a holomorphic bundle over Z. Conversely, a holomorphic bundle over Z which is holomorphically trivial over the fibers of the fibration Z ! M (projective lines in Z), and which satisfies a certain reality condition with respect to the antipodal map, arises from a unitary instanton over M. This is the Ward correspondence, part of Penrose’s twistor theory. The ADHM Construction
The problem of describing all solutions to the Yang–Mills instanton equation over S4 is thus reduced to a problem in algebraic geometry, of classifying certain holomorphic vector bundles. This was solved by Atiyah et al. (1978). The resulting ADHM construction reduces the problem to certain matrix equations. The equations can be reduced to the following form. For a bundle Chern class k and rank r, we require a pair of k k matrices 1 , 2 , a k r matrix a, and an r k matrix b. Then the equations are ½ 1 ; 2 ¼ ab 1 ; 1 þ 2 ; 2 ¼ aa b b
½1
We also require certain open, nondegeneracy conditions. Given such matrix data, a holomorphic
Gauge Theory: Mathematical Applications
bundle over CP3 is constructed via a ‘‘monad’’: a pair of bundle maps over CP3 Ck Oð1Þ@ > D1 >> Ck Ck Cr @ > D2 >> Ck Oð1Þ with D2 D1 = 0. That is, the rank-r holomorphic bundle we construct is Ker D2 =Im D1 . The bundle maps D1 , D2 are obtained from the matrix data in a straightforward way, in suitable coordinates. It is this matrix description which was used by Boyer et al. to prove the Atiyah–Jones conjecture on the topology of the moduli spaces of instantons. The only other case when the twistor space of a compact 4-manifold is an algebraic variety is the complex projective plane, with the nonstandard orientation. An analog of the ADHM description in this case was given by Buchdahl (1986). Integrable Systems
The Ward correspondence can be viewed in the general framework of integrable systems. Working with the standard complex structure on R4 , the integrability condition for the @-operator takes the shape ½r1 þ ir2 ; r3 þ ir4 ¼ 0 where ri are the components of the covariant derivative in the coordinate directions. So, the instanton equation can be viewed as a family of such commutator equations parametrized by 2 S2 . One obtains many reductions of the instanton equation by imposing suitable symmetries. Solutions invariant under translation in one variable correspond to the Bogomolny ‘‘monopole equation’’ (Jaffe and Taubes 1980). Solutions invariant under three translations correspond to solutions of Nahm’s equations, dTi ¼ ijk ½Tj ; Tk dt for matrix-valued functions T1 , T2 , T3 of one variable t. Nahm (1982) and Hitchin (1983) developed an analog of the ADHM construction relating these two equations. This is now seen as a part of a general ‘‘Fourier–Mukai–Nahm transform’’ (Donaldson and Kronheimer 1990). The instanton equations for connections invariant under two translations, Hitchin’s equations (Hitchin 1983), are locally equivalent to the harmonic map equation for a surface into the symmetric space dual to the structure group. Changing the signature of the metric on R4 to (2, 2), one gets the harmonic mapping equations into Lie groups (Hitchin 1990). More complicated reductions yield almost all the known examples of
473
integrable PDEs as special forms of the instanton equations (Mason and Woodhouse 1996). Moment Maps: the Kobayashi–Hitchin Conjecture
Let be a compact Riemann surface. The Jacobian of is the complex torus H 1 (, O)=H 1 (, Z): it parametrizes holomorphic line bundles of degree 0 over . The Hodge theory (which was, of course, developed long before Hodge in this case) shows that the Jacobian can also be identified with the torus H 1 (, R)=H 1 (, Z) which parametrizes flat U(1)-connections. That is, any holomorphic line bundle of degree 0 admits a unique compatible flat unitary connection. The generalization of these ideas to bundles of higher rank began with Weil. He observed that any holomorphic vector bundle of degree 0 admits a flat connection, not necessarily unitary. Narasimhan and Seshadri (1965) showed that (in the case of degree 0) the existence of a flat, irreducible, unitary connection was equivalent to an algebro-geometric condition of stability which had been introduced shortly before by Mumford, for quite different purposes. Mumford introduced the stability condition in order to construct separated moduli spaces of holomorphic bundles – generalizing the Jacobian – as part of his general geometric invariant theory. For bundles of nonzero degree, the discussion is slightly modified by the use of projectively flat unitary connections. The result of Narasimhan and Seshadri asserts that there are two different descriptions of the same moduli space Md, r (Sigma): either as parametrizing certain irreducible projectively flat unitary connections (representations of 1 ()), or parametrizing stable holomorphic bundles of degree d and rank r. While Narasimhan and Seshadri probably did not view the ideas in these terms, another formulation of their result is that a certain nonlinear PDE for a Hermitian metric on a holomorphic bundle – analogous to the Laplace equation in the abelian case – has a solution when the bundle is stable. Atiyah and Bott (1982) cast these results in the framework of gauge theory. (The Yang–Mills equations in two dimensions essentially reduce to the condition that the connection be flat, so they are rather trivial locally but have interesting global structure.) They made the important observation that the curvature of a connection furnishes a map F : A ! LieðGÞ which is an equivariant moment map for the action of the gauge group on A. Here the symplectic form on the affine space A and the map from the adjoint
474 Gauge Theory: Mathematical Applications
bundle-valued 2-forms to the dual of the Lie algebra of G are both given by integration of products of forms. From this point of view, the Narasimhan– Seshadri result is an infinite-dimensional example of a general principle relating symplectic and complex quotients. At about the same time, Hitchin and Kobayashi independently proposed an extension of these ideas to higher dimensions. Let E be a holomorphic bundle over a complex manifold V. Any compatible unitary connection on E has curvature F of type (1,1). Let ! be the (1,1)-form corresponding to a fixed Hermitian metric on V. The Hermitian Yang–Mills equation is the equation F ! ¼ 1E where is a constant (determined by the topological invariant c1 (E)). The Kobayashi–Hitchin conjecture is that, when ! is Ka¨hler, this equation has an irreducible solution if and only if E is a stable bundle in the sense of Mumford. Just as in the Riemann surface case, this equation can be viewed as a nonlinear second-order PDE of Laplace type for a metric on E. The moment map picture of Atiyah and Bott also extends to this higher-dimensional version. In the case when V has complex dimension 2 (and is zero), the Hermitian Yang–Mills connections are exactly the anti-self-dual instantons, so the conjecture asserts that the moduli spaces of instantons can be identified with certain moduli spaces of stable holomorphic bundles. The Kobayashi–Hitchin conjecture was proved in the most general form by Uhlenbeck and Yau (1986), and in the case of algebraic manifolds in Donaldson (1987). The proofs in Donaldson (1985, 1987) developed some extra structure surrounding these equations, connected with the moment map point of view. The equations can be obtained as the Euler–Lagrange equations for a nonlocal functional, related to the renormalized determinants of Quillen and Bismut. The results have been extended to nonKa¨hler manifolds and certain noncompact manifolds. There are also many extensions to equations for systems of data comprising a bundle with additional structure such as a holomorphic section or Higgs’ field (Bradlow et al. 1995), or a parabolic structure along a divisor. Hitchin’s equations (Hitchin 1987) are a particularly rich example. Topology of Moduli Spaces
The moduli spaces Mr, d () of stable holomorphic bundles/projectively flat unitary connections over Riemann surfaces have been studied intensively from many points of view. They have natural Ka¨hler structures: the complex structure being visible in the
holomorphic bundles guise and the symplectic form as the ‘‘Marsden–Weinstein quotient’’ in the unitary connections guise. In the case when r and d are coprime, they are compact manifolds with complicated topologies. There is an important basic construction for producing cohomology classes over these (and other) moduli spaces. One takes a universal bundle U over the product M with Chern classes ci ðUÞ 2 H2i ðM Þ Then, for any class 2 Hp (), we get a cohomology class ci (U)= 2 H 2ip (M). Thus, if R is the graded ring freely generated by such classes, we have a homomorphism : R ! H (M). The questions about the topology of the moduli spaces which have been studied include: 1. finding the Betti numbers of the moduli space M; 2. identifying the kernel of ; 3. giving an explicit system of generators and relations for the ring H (M); 4. identifying the Pontrayagin and Chern classes of M within H (M); and 5. evaluating the pairings Z
ðWÞ M
for elements W of the appropriate degree in R. All of these questions have now been solved quite satisfactorily. In early work, Newstead (1967) found the Betti numbers in the rank-2 case. The main aim of Atiyah and Bott was to apply the ideas of Morse theory to the Yang–Mills functional over a Riemann surface and they were able to reproduce Newstead’s results in this way and extend them to higher rank. They also showed that the map is a surjection, so the universal bundle construction gives a system of generators for the cohomology. Newstead made conjectures on the vanishing of the Pontrayagin and Chern classes above a certain range which were established by Kirwan and extended to higher rank by Earl and Kirwan (1999). Knowing that R maps on to H (M), a full set of relations can (by Poincare´ duality) be deduced in principle from a knowledge of the integral pairings in (5) above, but this is not very explicit. A solution to (5) in the case of rank 2 was found by Thaddeus (1992). He used results from the Verlinde theory (see section on 3-manifolds below) and the Riemann–Roch formula. Another point of view was developed by Witten (1991), who showed that the volume of the moduli space was related to the theory of torsion in algebraic topology and satisfied simple gluing axioms. These different
Gauge Theory: Mathematical Applications
points of view are compared in Donaldson (1993). Using a nonrigorous localization principle in infinite dimensions, Witten (1992) wrote down a general formula for the pairings (5) in any rank, and this was established rigorously by Jeffrey and Kirwan, using a finite-dimensional version of the same localization method. A very simple and explicit set of generators and relations for the cohomology (in the rank-2 case) was given by King and Newstead (1998). Finally, the quantum cohomology of the moduli space, in the rank-2 case, was identified explicitly by Munoz (1999). Hyper-Ka¨hler Quotients
Much of this story about the structure of moduli spaces extends to higher dimensions and to the moduli spaces of connections and Higgs fields. A particularly notable extension of the ideas involves hyper-Ka¨hler structures. Let M be a hyperKa¨hler 4-manifold, so there are three covariantconstant self-dual forms !1 , !2 , !3 on M. These correspond to three complex structures I1 , I2 , I3 obeying the algebra of the quaternions. If we single out one structure, say I1 , the instantons on M can be viewed as holomorphic bundles with respect to I1 satisfying the moment map condition (Hermitian Yang–Mills equation) defined by the form !1 . Taking a different complex structure interchanges the role of the moment map and integrability conditions. This can be put in a general framework of hyper-Ka¨hler quotients due to Hitchin et al. (1987). Suppose initially that M is compact (so either a K3 surface or a torus). Then the !i components of the curvature define three maps Fi : A ! LieðGÞ The structures on M make A into a flat hyperKa¨hler manifold and the three maps Fi are the moment maps for the gauge group action with respect to the three symplectic forms on A. In this situation, it is a general fact that the hyper-Ka¨hler quotient – the quotient by G of the common zero set of the three moment maps – has a natural hyperKa¨hler structure. This hyper-Ka¨hler quotient is just the moduli space of instantons over M. In the case when M is the noncompact manifold R4 , the same ideas apply except that one has to work with the based gauge group G0 . The conclusion is that ~ of instantons over R4 the framed moduli spaces M are naturally hyper-Ka¨hler manifolds. One can also see this hyper-Ka¨hler structure through the ADHM matrix description. A variant of these matrix equations was used by Kronheimer to construct ‘‘gravitational instantons.’’ The same ideas also
475
apply to the moduli spaces of monopoles, where the hyper-Ka¨hler metric, in the simplest case, was studied by Atiyah and Hitchin (1989).
Low-Dimensional Topology Instantons and 4-Manifolds
Gauge theory has had unexpected applications in low-dimensional topology, particularly the topology of smooth 4-manifolds. The first work in this direction, in the early 1980s, involved the Yang– Mills instantons. The main issue in 4-manifold theory at that time was the correspondence between the diffeomorphism classification of simply connected 4-manifolds and the classification up to homotopy. The latter is determined by the intersection form, a unimodular quadratic form on the second integral homology group (i.e., a symmetric matrix with integral entries and determinant 1, determined up to integral change of basis). The only known restriction was that Rohlin’s theorem, which asserts that if the form is even the signature must be divisible by 16. The achievement of the first phase of the theory was to show that 1. There are unimodular forms which satisfy the hypotheses of Rohlin’s theorem but which do not appear as the intersection forms of smooth 4-manifolds. In fact, no nonstandard definite form, such as a sum of copies of the E8 matrix, can arise in this way. 2. There are simply connected smooth 4-manifolds which have isomorphic intersection forms, and hence are homotopy equivalent, but which are not diffeomorphic. These results stand in contrast to the homeomorphism classification which was obtained by Freedman shortly before and which is almost the same as the homotopy classification. The original proof of item (1) above argued with the moduli space M of anti-self-dual instantons SU(2) instantons on a bundle with c2 = 1 over a simply connected Riemannian 4-manifold M with a negativedefinite intersection form (Donaldson 1983). In the model case when M is the 4-sphere the moduli space M can be identified explicitly with the open 5-ball. Thus the 4-sphere arises as the natural boundary of the moduli space. A sequence of points in the moduli space converging to a boundary point corresponds to a sequence of connections with curvature densities converging to a -function, as described earlier. One shows that in the general case (under our hypotheses on the 4-manifold M) the moduli space M has a similar behavior, it contains a collar M (0, )
476 Gauge Theory: Mathematical Applications
formed by instantons made using Taubes’ gluing construction, described previously. The complement of this collar is compact. In the interior of the moduli space, there are a finite number of special points corresponding to U(1)-reductions of the bundle P. This is the way in which the moduli space ‘‘sees’’ the integral structure of the intersection form since such reductions correspond to integral homology classes with self-intersection 1. Neighborhoods of these special points are modeled on quotients C3 =U(1); that is, cones on copies of CP2 . The upshot is that (for generic Riemannian metrics on M) the moduli space gives a cobordism from the manifold M to a set of copies of CP2 which can be counted in terms of the intersection form, and the result follows easily from standard topology. More sophisticated versions of the argument extended the results to rule out some indefinite intersection forms. On the other hand, the original proofs of item (2) used ‘‘invariants’’ defined by instanton moduli spaces (Donaldson 1990). The general scheme exploits the same construction outlined in the previous section. We suppose that M is a simply connected 4-manifold with bþ (M) = 1 þ 2p, where p > 0 is an integer. (Here bþ (M) is, as usual, the number of positive eigenvalues of the intersection matrix.) Ignoring some technical restrictions, there is a map
: RM ! H ðMk Þ where RM is a graded ring freely generated by the homology (below the top dimension) of the 4-manifold M and Mk is the moduli space of anti-self-dual SU(2)-instantons on a bundle with c2 = k > 0. For an element W in RM of the appropriate degree, one obtains a number by evaluating, or integrating, (W) on Mk . The main technical difficulty here is that the moduli space Mk is rarely compact, so one needs to make sense of this ‘‘evaluation.’’ With all the appropriate technicalities in place, these invariants could be shown to distinguish various homotopy-equivalent, homeomorphic 4-manifolds. All these early developments are described in detail in the book by Donaldson and Kronheimer (1990). Basic Classes
Until the early 1990s, these instanton invariants could only be calculated in isolated favorable cases (although the calculations which were made, through the work of many mathematicians, led to a large number of further results about 4-manifold topology). Deeper understanding of their structure came with the work of Kronheimer and Mrowka. This work was, in large part,
motivated by a natural question in geometric topology. Any homology class 2 H2 (M; Z) can be represented by an embedded, connected, smooth surface. One can define an integer g( ) to be the minimal genus of such a representative. The problem is to find g( ), or at least bounds on it. A well-known conjecture, ascribed to Thom, was that when M is the complex projective plane the minimal genus is realized by a complex curve; that is, gð dHÞ ¼ 12 ðd 1Þðd 2Þ where H is the standard generator of H2 (CP2 ) and d 1. The new geometrical idea introduced by Kronheimer and Mrowka was to study instantons over a 4-manifold M with singularities along a surface M. For such connections, there is a real parameter: the limit of the trace of the holonomy around small circles linking the surface. By varying this parameter, they were able to interpolate between moduli spaces of nonsingular instantons on different bundles over M and obtain relations between the different invariants. They also found that if the genus of is suitably small then some of the invariants are forced to vanish, thus, conversely, getting information about g for 4-manifolds with nontrivial invariants. For example, they showed that if M is a K3 surface then g( ) = (1/2)( þ 2). The structural results of Kronheimer and Mrowka (1995) introduced the notion of a 4-manifold of ‘‘simple type.’’ Write the invariant defined above by the moduli space Mk as Ik : RM ! Q. Then Ik vanishes except on terms of degree 2d(k), where d(k) = 4k 3(1P þ p). We can put all these together to define I = Ik : RM ! Q. The ring RM is a polynomial ring generated by classes 2 H2 (M), which have degree 2 in RM , and a class X of degree 4 in RM , corresponding to the generator of H0 (M). The 4-manifold is of simple type if IðX2 WÞ ¼ 4IðWÞ for all W 2 RM . Under this condition, Kronheimer and Mrowka showed that all the invariants are determined by a finite set of ‘‘basic’’ classes K1 , . . . , Ks 2 H2 (M) and rational numbers 1 , . . . , s . To express the relation, they form a generating function X e DM ð Þ ¼ Iðe Þ þ I 2 This is a priori a formal power series in H2 (M) but a posteriori the series converges and can be regarded
Gauge Theory: Mathematical Applications
as a function on H2 (M). Kronheimer and Mrowka’s result is that DM ð Þ ¼ exp
s X
2
r eKr
r¼1
It is not known whether all simply connected 4-manifolds are of simple type, but Kronheimer and Mrowka were able to show that this is the case for a multitude of examples. They also introduced a weaker notion of ‘‘finite type,’’ and this condition was shown to hold in general by Munoz and Froyshov. The overall result of this work of Kronheimer and Mrowka was to make the calculation of the instanton invariants for many familiar 4-manifolds a comparatively straightforward matter. 3-Manifolds: Casson’s Invariant
Gauge theory has also entered into 3-manifold topology. In 1985, Casson introduced a new integer-valued invariant of oriented homology 3-spheres which ‘‘counts’’ the set Z of equivalence classes of irreducible flat SU(2)-connections, or equivalently irreducible representations 1 (Y) ! SU(2). Casson’s approach (Akbulut and McCarthy 1990) was to use a Heegard splitting of a 3-manifold Y into two handle bodies Y þ , Y with a surface as common boundary. Then 1 () maps onto 1 (Y) and a flat SU(2) connection on Y is determined by its restriction to . Let M be the moduli space of irreducible flat connections over (as discussed in the last section) and let L M be the subsets which extend over Y . Then L are submanifolds of half the dimension of M and the set Z can be identified with the intersection Lþ \ L . The Casson invariant is one-half the algebraic intersection number of Lþ and L . Casson showed that this is independent of the Heegard splitting (and is also, in fact, an integer, although this is not obvious). He showed that when Y is changed by Dehn surgery along a knot, the invariant changes by a term computed from the Alexander polynomial of the knot. This makes the Casson invariant computable in examples. (For a discussion of Casson’s formula see Donaldson (1999).) Taubes showed that the Casson invariant could also be obtained in a more differential-geometric fashion, analogous to the instanton invariants of 4-manifolds (Taubes 1990). 3-Manifolds: Floer Theory
Independently, at about the same time, Floer (1989) introduced more sophisticated invariants – the Floer homology groups – of homology 3-spheres, using gauge theory. This development
477
ran parallel to his introduction of similar ideas in symplectic geometry. Suppose, for simplicity, that the set Z of equivalence classes of irreducible flat connections is finite. For pairs , þ in Z, Floer considered the instantons on the tube Y R asymptotic to at 1. There is an infinite set of moduli spaces of such instantons, labeled by a relative Chern class, but the dimensions of these moduli spaces agree modulo 8. This gives a relative index ( , þ ) 2 Z=8. If ( , þ ) = 1 there is a moduli space of dimension 1 (possibly empty), but the translations of the tube act on this moduli space and, dividing by translations, we get a finite set. The number of points in this set, counted with suitable signs, gives an integer n( , þ ). Then, Floer considers the free abelian groups M C ¼ Zh i 2Z
generated by the set Z and a map @ : C ! C defined by X @ðh iÞ ¼ nð ; þ Þh þ i Here the sum runs over the þ with ( , þ ) = 1. Floer showed that @ 2 = 0 and the homology HF (Y) = ker @=Im @ is independent of the metric on Y (and various other choices made in implementing the construction in detail). The chain complex C and hence the Floer homology can be graded by Z=8, using the relative index, so the upshot is to define 8 abelian groups HFi (Y): invariants of the 3-manifold Y. The Casson invariant appears now as the Euler characteristic of the Floer homology. There has been extensive work on extending these ideas to other 3-manifolds (not homology spheres) and gauge groups, but this line of research does not yet seem to have reached a clear-cut conclusion. Part of the motivation for Floer’s work came from Morse theory, and particularly the approach to this theory expounded by Witten (1982). The Chern–Simons functional is a map CS : A=G ! R=Z from the space of SU(2)-connections over Y. Explicitly, in a trivialization of the bundle Z CSðAÞ ¼ A ^ dA þ 32 A ^ A ^ A Y
It appears as a boundary term in the Chern–Weil theory for the second Chern class, in a similar way as holonomy appears as a boundary term in the Gauss–Bonnet theorem. The set Z can be identified with the critical points of CS and the instantons on the tube as integral curves of the gradient vector
478 Gauge Theory: Mathematical Applications
field of CS. Floer’s definition mimics the definition of homology in ordinary Morse theory, taking Witten’s point of view. It can be regarded formally as the ‘‘middle-dimensional’’ homology of the infinite-dimensional space A=G. See Atiyah (1988) and Cohen et al. (1995) for discussions of these ideas. The Floer theory interacts with 4-manifold invariants, making up a structure approximating to a (3 þ 1)-dimensional topological field theory (Atiyah 1988). Roughly, the numerical invariants of closed 4-manifolds generalize to invariants for a 4-manifold M with boundary Y taking values in the Floer homology of Y. If two such manifolds are glued along a common boundary, the invariants of the result are obtained by a pairing in the Floer groups. There are, however, at the moment, some substantial technical restrictions on this picture. This theory, as well as Floer’s original construction, is developed in detail by Donaldson (2002). At the time of writing, the Floer homology groups are still difficult to compute in examples. One important tool is a surgery-exact sequence found by Floer (Braam and Donaldson 1995), related to Casson’s surgery formula. 3-Manifolds: Jones–Witten Theory
There is another, quite different, way in which ideas from gauge theory have entered 3-manifold topology. This is the Jones–Witten theory of knot and 3-manifold invariants. This theory falls outside the main line of this article, but we will say a little about it since it draws on many of the ideas we have discussed. The goal of the theory is to construct a family of (2 þ 1)-dimensional topological field theories indexed by an integer k, assigning complex vector space Hk () to a surface and an invariant in Hk (@Y) to a 3-manifold-with-boundary Y. If @Y is empty, the vector space Hk (@Y) is taken to be C, so one seeks numerical invariants of closed 3-manifolds. Witten’s (1989) idea is that these invariants of closed 3-manifolds are Feynmann integrals Z ei2kCSðAÞ DA A=G
This functional integral is probably a schematic rather than a rigorous notion. The data associated with surfaces can, however, be defined rigorously. If we fix a complex structure I on , we can define a vector space Hk (, I) to be Hk ð; IÞ ¼ H 0 ðMðÞ; Lk Þ where M() is the moduli space of stable holomorphic bundles/flat unitary connections over and L is a certain holomorphic line bundle over
M(). These are the spaces of ‘‘conformal blocks’’ whose dimension is given by the Verlinde formulas. Recall that M(), as a symplectic manifold, is canonically associated with the surface , without any choice of complex structure. The Hilbert spaces Hk (, I) can be regarded as the quantization of this symplectic manifold, in the general framework of geometric quantization: the inverse of k plays the role of Planck’s constant. What is not obvious is that this quantization is independent of the complex structure chosen on the Riemann surface: that is, that there is a natural identification of the vector spaces (or at least the associated projective spaces) formed by using different complex structures. This was established rigorously by Hitchin (1990) and Axelrod et al. (1991), who constructed a projectively flat connection on the bundle of spaces Hk (, I) over the space of complex structures I on . At a formal level, these constructions are derived from the construction of the metaplectic representation of a linear symplectic group, since the M are symplectic quotients of an affine symplectic space. The Jones–Witten invariants have been rigorously established by indirect means, but it seems that there is still work to be done in developing Witten’s point of view. If Y þ is a 3-manifold with boundary, one would like to have a geometric definition of a vector in Hk (@Y þ ). This should be the quantized version of the submanifold Lþ (which is Lagrangian in M ) entering into the Casson theory. Seiberg–Witten Invariants
The instanton invariants of a 4-manifold can be regarded as the integrals of certain natural differential forms over the moduli spaces of instantons. Witten (1988) showed that these invariants could be obtained as functional integrals, involving a variant of the Feynman integral, over the space of connections and certain auxiliary fields (insofar as this latter integral is defined at all). A geometric explanation of Witten’s construction was given by Atiyah and Jeffrey (1990). Developing this point of view, Witten made a series of predictions about the instanton invariants, many of which were subsequently verified by other means. This line of work culminated in 1994 where, applying developments in supersymmetric Yang–Mills QFT, Seiberg and Witten introduced a new system of invariants and a precise prediction as to how these should be related to the earlier ones. The Seiberg–Witten invariants (Witten 1994) are associated with a Spinc structure on a 4-manifold M. If M is simply connected this is specified by a class K 2 H 2 (M; Z) lifting w2 (M). One has spin bundles
Gauge Theory: Mathematical Applications
Sþ , S ! M with c1 (S ) = K. The Seiberg–Witten equation is for a spinor field – a section of Sþ and a connection A on the complex line bundle 2 Sþ . This gives a connection on Sþ and hence a Dirac operator DA : ðSþ Þ ! ðS Þ The Seiberg–Witten equations are DA ¼ 0;
FAþ ¼ ð Þ
where : Sþ ! þ is a certain natural quadratic map. The crucial differential-geometric feature of these equations arises from the Weitzenbock formula DA DA ¼ rA rA þ
R
þ ðFþ Þ 4
where R is the scalar curvature and is a natural map from þ to the endomorphisms of Sþ . Then is adjoint to and h ðð ÞÞ ; i ¼ j j4 It follows easily from this that the moduli space of solutions to the Seiberg–Witten equation is compact. The most important invariants arise when K is chosen so that K K ¼ 2ðMÞ þ 3 signðMÞ where (M) is the Euler characteristic and sign(M) is the signature. (This is just the condition for K to correspond to an almost-complex structure on M.) In this case, the moduli space of solutions is zero dimensional (after generic perturbation) and the Seiberg–Witten invariant SW(K) is the number of points in the moduli space, counted with suitable signs. Witten’s conjecture relating the invariants, in its simplest form, is that when M has simple type the classes K for which SW(K) is nonzero are exactly the basic classes Kr of Kronheimer and Mrowka and that r ¼ 2CðMÞ SWðKr Þ where C(M) = 2 þ (1/4)(7(M) þ 11 sign(M)). This asserts that the two sets of invariants contain exactly the same information about the 4-manifold. The evidence for this conjecture, via calculations of examples, is very strong. A somewhat weaker statement has been proved rigorously by Feehan and Leness (2003). They use an approach suggested by Pidstragatch and Tyurin, studying moduli spaces of solutions to a nonabelian version of the Seiberg– Witten equations. These contain both the instanton and abelian Seiberg–Witten moduli spaces, and the strategy is to relate the topology of these two sets by standard localization arguments. (This approach is related to ideas introduced by Thaddeus (1994) in the
479
case of bundles over Riemann surfaces.) The serious technical difficulty in this approach stems from the lack of compactness of the nonabelian moduli spaces. The more general versions of Witten’s conjecture (Moore and Witten 1997) (e.g., when bþ (M) = 1) contain very complicated formulas, involving modular forms, which presumably arise as contributions from the compactification of the moduli spaces. Applications
Regardless of the connection with the instanton theory, one can go ahead directly to apply the Seiberg–Witten invariants to 4-manifold topology, and this has been the main direction of research since the 1990s. The features of the Seiberg–Witten theory which have led to the most prominent developments are the following. 1. The reduction of the equations to two dimensions is very easy to understand. This has led to proofs of the Thom conjecture and wide-ranging generalizations (Ozsvath and Szabo 2000). 2. The Weitzenboch formula implies that, if M has positive scalar curvature, then solutions to the Seiberg–Witten equations must have = 0. This has led to important interactions with fourdimensional Riemannian geometry (Lebrun 1996). 3. In the case when M is a symplectic manifold, there is a natural deformation of the Seiberg– Witten equations, discovered by Taubes (1996), who used it to show that the Seiberg–Witten invariants of M are nontrivial. More generally, Taubes showed that for large values of the deformation parameter the solutions of the deformed equation localize around surfaces in the 4-manifold and used this to relate the Seiberg–Witten invariants to the Gromov theory of pseudoholomorphic curves. These results of Taubes have completely transformed the subject of four-dimensional symplectic geometry. Bauer and Furuta (2004) have combined the Seiberg–Witten theory with more sophisticated algebraic topology to obtain further results about 4-manifolds. They consider the map from the space of connections and spinor fields defined by the formulas on the left-hand side of the equations. The general idea is to obtain invariants from the homotopy class of this map, under a suitable notion of homotopy. A technical complication arises from the gauge group action, but this can be reduced to the action of a single U(1). Ignoring this issue, Bauer and Furuta have obtained invariants in the stable homotopy groups limN ! 1 Nþr (SN ), which reduce to the ordinary numerical invariants when r = 1. Using these invariants, they obtain results about connected sums of
480 Gauge Theory: Mathematical Applications
4-manifolds, for which the ordinary invariants are trivial. Using refined cobordism invariants ideas, Furuta made great progress towards resolving the question of which intersection forms arise from smooth, simply connected 4-manifolds. A wellknown conjecture is that, if such a manifold is spin, then the second Betti number satisfies b2 ðMÞ 11 8 jsignðMÞj Furuta (2001) proved that b2 (M) (10/8)jsign(M)j þ 2. An important and very recent achievement, bringing together many different lines of work, is the proof of ‘‘Property P’’ in 3-manifold topology by Kronheimer and Mrowka (2004). This asserts that one cannot obtain a homotopy sphere (counter-example to the Poincare´ conjecture) by þ1-surgery along a nontrivial knot in S3 . The proof uses work of Gabai and Eliashberg to show that the manifold obtained by 0-framed surgery is embedded in a symplectic 4-manifold; Taubes’ results to show that the Seiberg– Witten invariants of this 4-manifold are nontrivial; Feehan and Leness’ partial proof of Witten’s conjecture to show that the same is true for the instanton invariants; and the gluing rule and Floer’s exact sequence to show that the Floer homology of the þ1-surgered manifold is nontrivial. It follows then from the definition of Floer homology that the fundamental group of this manifold is not trivial; in fact, it must have an irreducible representation in SU(2). See also: Cotangent Bundle Reduction; Floer Homology; Gauge Theories from Strings; Gauge Theoretic Invariants of 4-Manifolds; Instantons: Topological Aspects; Knot Homologies; Moduli Spaces: An Introduction; Nonperturbative and Topological Aspects of Gauge Theory; Seiberg–Witten Theory; Topological Quantum Field Theory: Overview; Variational Techniques for Ginzburg–Landau Energies.
Further Reading Akbulut S and McCarthy J (1990) Casson’s Invariant for Homology 3-Spheres. Princeton, NJ: Princeton University Press. Atiyah (1988) New invariants for 3 and 4 dimensional manifolds. In: The Mathematical Heritage of Hermann Weyl, Proceedings of Symposia in Pure Mathematics, vol. 48, pp. 285–299. American Mathematical Society. Atiyah MF (1988) Topological Quantum Field Theories, vol. 68, pp. 135–186. Math. Publ. IHES. Atiyah MF and Bott R (1982) The Yang–Mills equations over Riemann surfaces. Philosophical Transactions of the Royal Society of London, Series A 308: 523–615. Atiyah MF, Drinfeld V, Hitchin NJ, and Manin YuI (1978) Construction of instantons. Physics Letters A 65: 185–187. Atiyah MF and Hitchin NJ (1989) The Geometry and Dynamics of Magnetic Monopoles. Princeton, NJ: Princeton University Press. Atiyah MF, Hitchin NJ, and Singer IM (1978) Self-duality in four-dimensional Riemannian geometry. Proceedings of the Royal Society of London, Series A 362: 425–461.
Atiyah MF and Jeffrey L (1990) Topological Lagrangians and cohomology. Journal of Geometry and Physics 7: 119–136. Atiyah MF and Jones JDS (1978) Topological aspects of Yang–Mills theory. Communications in Mathematical Physics 61: 97–118. Axelrod S, Della Pietra S, and Witten E (1991) Geometric quantisation of Chern–Simons gauge theories. Journal of Differential Geometry 33: 787–902. Bauer S and Furuta M, A stable cohomotopy refinement of the Seiberg–Witten invariants, I. Inventiones Mathematicae 155: 1–19. Bourguignon J-P and Lawson HB (1981) Stability and isolation phenomena for Yang–Mills fields. Communications in Mathematical Physics 79: 189–230. Boyer C, Mann B, Hurtubise J, and Milgram R (1993) The topology of instanton moduli spaces. I: the Atiyah–Jones conjecture. Annals of Mathematics 137: 561–609. Braam PJ and Donaldson SK (1995) Floer’s work on instanton homology, knots and surgery. In: Hofer et al. (eds.) The Floer Memorial Volume, vol. 133, Progress in Mathematics. Basle: Birkha¨user. Bradlow S, Daskalopoulos G, Garcia-Prada O, and Wentworth R (1995) Stable augmented bundles over Riemann surfaces. In: Hitchin et al. (eds.) Vector Bundles in Algebraic Geometry, pp. 15–67. Cambridge: Cambridge University Press. Buchdahl NP (1986) Instantons on CP2 . Journal of Differential Geometry 24: 19–52. Cohen RL, Jones JDS, and Segal GB (1995) Floer’s infinite-dimensional Morse theory and homotopy theory. In: Hofer et al. (eds.) The Floer Memorial Volume, pp. 297–325. Basle: Birkha¨user. Donaldson SK (1983) An application of gauge theory to fourdimensional topology. Journal of Differential Geometry 18: 279–315. Donaldson SK (1984) Nahm’s equations and the classification of monopoles. Communications in Mathematical Physics 96: 387–407. Donaldson SK (1985) Anti-self-dual Yang–Mills connections on complex algrebraic surfaces and stable vector bundles. Proceedings of the London Mathematical Society 3: 1–26. Donaldson SK (1987) Infinite determinants, stable bundles and curvature. Duke Mathematical Journal 54: 231–247. Donaldson SK (1990) Polynomial invariants of smooth fourmanifolds. Topology 29: 257–315. Donaldson SK (1993) Gluing techniques in the cohomology of moduli spaces. In: Goldberg and Phillips (eds.) Topological Methods in Modern Mathematics, pp. 137–170. Houston: Publish or Perish. Donaldson SK (1999) Topological Field Theories and Formulae of Casson and Meng-Taubes, Geometry and Topology Monographs, vol. 2, pp. 87–102. Donaldson SK (2002) Floer Homology Groups in Yang–Mills Theory. Cambridge: Cambridge University Press. Donaldson SK and Kronheimer PB (1990) The geometry of fourmanifolds. Oxford: Oxford University Press. Donaldson SK and Thomas R (1998) Gauge theory in higher dimensions. In: Hugget et al. (eds.) The Geometric Universe, pp. 31–47. Oxford: Oxford University Press. Earl R and Kirwan FC (1999) Pontrayagin rings of moduli spaces of arbitrary rank bundles over Riemann surfaces. Journal of London Mathematical Society 60: 835–846. Feehan PMN and Leness TG (2003) A general SO(3)-monopole cobordism formula relating Donaldson and Seiberg–Witten invariants, arXiv:math.DG/0203047. Floer A (1989) An instanton invariant for 3-manifolds. Communications in Mathematical Physics 118: 215–240. Furuta M (2001) Monopole equation and the 11/8 conjecture. Mathematical Research Letters 8: 279–291. Hitchin NJ (1983) On the construction of monopoles. Communications in Mathematical Physics 89: 145–190.
General Relativity: Experimental Tests Hitchin NJ (1987) The self-duality equations over Riemann surfaces. Proceedings of the London Mathematical Society 55: 59–126. Hitchin NJ (1990) Flat connections and geometric quantisation. Communications in Mathematical Physics 131: 347–380. Hitchin NJ (1990) Harmonic maps from the 2-torus to the 3-sphere. Journal of Differential Geometry 31: 627–710. Hitchin NJ, Karhlede A, Lindstro¨m U, and Roceck M (1987) Hyper-Ka¨hler metrics and supersymmetry. Communications in Mathematical Physics 108: 535–589. Jaffe A and Taubes CH (1980) Vortices and Monopoles, Progress in Physics, vol. 2. Basle: Birkha¨user. Jeffrey L and Kirwan FC (1998) Intersection theory on moduili spaces of holomorphic bundles of arbitrary rank on a Riemann surface. Annals of Mathematics 148: 109–196. Joyce D (1996) Compact Riemannian 7-manifolds with holonomy G2 ; I. Journal of Differential Geometry 43: 291–328. King AD and Newstead PE (1998) On the cohomology ring of the moduli space of stable rank 2 bundles over a curve. Topology 37: 407–418. Kirwan FC (1994) Geometric invariant theory and the Atiyah–Jones conjecture. In: Laudel and Jahren (eds.) The Sophus Lie Memorial Volume Pages, pp. 161–186. Oslo: Scandinavian University Press. Kronheimer PB and Mrowka TS (1995) embedded surfaces and the structure of Donaldson’s polynomial invariants. Journal of Differential Geometry 41: 573–734. Kronheimer PB and Mrowka TS, Witten’s conjecture and property P. Geometry and Topology 8: 295–300. Lebrun C (1996) Four-manifolds without Einstein metrics. Mathematical Research Letters 3: 133–147. Mason L and Woodhouse N (1996) Integrability, Self-duality and Twistor Theory. Advances in Theoretical and Mathematical Physics. Oxford: Oxford University Press. Moore G and Witten E (1997) Integration over the u-plane in Donaldson Theory. Adv. Theor. Mathematical Phys 1: 298–387. Munoz V (1999) Quantum cohomology of the moduli space of stable bundles over a Riemann surface. Duke Mathematics Journal 98: 525–540. Nahm W (1982) The construction of all self-dual monopoles by the ADHM method. In: Monopoles in Quantum Field Theory, pp. 87–94. Singapore: World Scientific. Nakajima H (1988) Compactness of the moduli space of Yang– Mills connections in higher dimensions. Journal of the Mathematical Society of Japan 40: 383–392. Narasimhan MS and Seshadri CS (1965) Stable and unitary vector bundles on compact Riemann surfaces. Annals of Mathematics 65: 540–567. Newstead PE (1967) Topology of some spaces of stable bundles. Topology 6: 241–262. Ozsvath P and Szabo Z (2000) The sympletic Thom conjecture. Annals of Mathematics 151: 93–124.
481
Price P (1983) A monotonicity formula for Yang–Mills fields. Manuscripta Mathematica 43: 131–166. Quillen D (1985) Determinants of Cauchy–Riemann operators over a Riemann surface. Functional Analysis and its Applications 14: 31–34. Sadun L and Segert J (1992) Non self-dual Yang–Mills connections with quadropole symmetry. Communications in Mathematical Physics 145: 363–391. Sibner L, Sibner R, and Uhlenbeck K (1989) Solutions to the Yang–Mills equations that are not self-dual. Proceedings of the National Academy of Sciences of the USA 86: 8610–8613. Taubes CH (1982) Self-dual Yang–Mills connections over non selfdual 4-manifolds. Journal of Differential Geometry 17: 139–170. Taubes CH (1983) Stability in Yang–Mills theories. Communications in Mathematical Physics 91: 235–263. Taubes CH (1990) Casson’s invariant and gauge theory. Journal of Differential Geometry 31: 363–430. Taubes CH (1998) A framework for Morse Theory for the Yang– Mills functional. Inventiones Mathematical 94: 327–402. Taubes CH (1996) SW ) Gr: from the Seiberg–Witten equations to pseudo-holomorphic curves. Journal of American Mathematical Society 9: 819–918. Thaddeus M (1992) Conformal Field Theory and the cohomology of moduli spaces of stable bundles. Journal of Differential Geometry 35: 131–149. Thaddeues M (1994) Stable pairs, linear systems and the Verlinde formulae. Inventiones Mathematicae 117: 317–353. Tian G (2000) gauge theory and calibrated geometry, I. Annals of Mathematics 151: 193–268. Uhlenbeck KK (1982) Connections with Lp bounds on curvature. Communications in Mathematical Physics 83: 11–29. Uhlenbeck KK and Yau S-T (1986) On the existence of hermitian Yang–Mills connections on stable bundles over compact hyper-Ka¨hler manifolds. Communications on Pure and Applied Mathematics 39: 257–293. Ward RS (1877) On self-dual gauge fields. Physics Letters A 61: 81–82. Witten E (1982) Supersymmetry and Morse Theory. Journal of Differential Geometry 16: 661–692. Witten E (1988) Topological quantum field theory. Communications in Mathematical Physics 117: 353–386. Witten E (1989) Quantum field theory and the Jones polynomial. Communications in Mathematical Physics 121: 351–399. Witten E (1991) On quantum gauge theory in two dimensions. Communications in Mathematical Physics 141: 153–209. Witten E (1992) Two dimensional quantum gauge theory revisited. Journal of Geometry and Physics 9: 303–368. Witten E (1994) Monopoles and four-manifolds. Mathematical Research Letters 1: 769–796.
General Relativity: Experimental Tests C M Will, Washington University, St. Louis, MO, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Einstein’s general theory of relativity has become the foundation for our understanding of the gravitational interaction. Four decades of high-precision
experiments have verified the theory with everincreasing precision, with no confirmed evidence of a deviation from its predictions. The theory is now the standard framework for much of astronomy, with its searches for black holes, neutron stars, gravitational waves, and the origin and fate of the universe. Yet modern developments in particle theory suggest that it may not be the entire story, and that
482 General Relativity: Experimental Tests
modification of the basic theory may be required at some level. String theory generally predicts a proliferation of gravity-like fields that could result in alterations of general relativity (GR) reminiscent of the Brans–Dicke theory of the 1960s. In the presence of extra dimensions, the gravity of the fourdimensional ‘‘brane’’ of a higher-dimensional world could be somewhat different from a pure fourdimensional GR. However, any theoretical speculation along these lines must still abide by the best current empirical bounds. This article will review experimental tests of GR and the theoretical implications of the results.
The Einstein Equivalence Principle The Einstein equivalence principle is a modern generalization of Einstein’s 1907 idea of an equivalence between gravity and acceleration, or between free fall and an absence of gravity. It states that: (1) test bodies fall with the same acceleration independently of their internal structure or composition (weak equivalence principle, or WEP); (2) the outcome of any local nongravitational experiment is independent of the velocity of the freely falling reference frame in which it is performed (local Lorentz invariance, or LLI); and (3) the outcome of any local nongravitational experiment is independent of where and when in the universe it is performed (local position invariance, or LPI). This principle is fundamental to gravitational theory, for it is possible to argue that, if EEP is valid, then gravitation and geometry are synonymous. In other words, gravity must be described by a ‘‘metric theory of gravity,’’ in which (1) spacetime is endowed with a symmetric metric, (2) the trajectories of freely falling bodies are geodesics of that metric, and (3) in local freely falling reference frames, the nongravitational laws of physics are those written in the language of special relativity (see Will (1993) for further details). GR is a metric theory of gravity, but so are many others, including the scalar–tensor theory of Brans and Dicke and many of its modern descendents, some of which are inspired by string theory. Tests of the Weak Equivalence Principle
To test the WEP, one compares the acceleration of two laboratory-sized bodies of different composition in an external gravitational field. Although legend suggests that Galileo may have demonstrated this principle to his students at the Leaning Tower of Pisa, and Newton tested it by means of pendulum experiments, the first true high-precision experiments
were done at the end of the nineteenth century by the Hungarian physicist Baron Roland von Eo¨tvo¨s and colleagues. Eo¨tvo¨s employed a torsion balance, in which (schematically) two bodies of different composition are suspended at the ends of a rod that is supported horizontally by a fine wire or fiber. One then looks for a difference in the horizontal accelerations of the two bodies as revealed by a slight rotation of the rod. The source of the horizontal gravitational force could be the Sun, a large mass in or near the laboratory, or, as Eo¨tvo¨s recognized, the Earth itself. A measurement or limit on the fractional difference in acceleration between two bodies yields a quantity 2ja1 a2 j=ja1 þ a2 j, called the ‘‘Eo¨tvo¨s ratio.’’ Eo¨tvo¨s’ experiments showed that was smaller than a few parts in 109 , and later classic experiments in the 1960s and 1970s by Dicke and Braginsky improved the bounds by several orders of magnitude. Additional experiments were carried out during the 1980s as part of a search for a putative ‘‘fifth force,’’ that was motivated in part by a re-analysis of Eo¨tvo¨s’ original data. The best limit on currently comes from experiments carried out during the 1985–2000 period at the University of Washington (called the ‘‘Eo¨tWash’’ experiments), which used a sophisticated torsion balance tray to compare the accelerations of bodies of different composition toward the Earth, the Sun, and the galaxy. Another strong bound comes from ongoing laser ranging to reflectors deposited on the Moon during the Apollo program in the 1970s (lunar laser ranging, LLR), which routinely determines the Earth–Moon distance to millimeter accuracies. The data may be used to check the equality of acceleration of the Earth and Moon toward the Sun. The results from laboratory and LLR experiments are (Will 2001): E€ot – Wash < 4 1013 ;
LLR < 5 1013
½1
LLR also shows that gravitational binding energy falls with the same acceleration as ordinary matter to 1.3 103 (test of the Nordtvedt effect – see the section ‘‘Bounds on the PPN parameters’’ and Table 1). Many of the high-precision, low-noise methods that were developed for tests of WEP have been adapted to laboratory tests of the inverse-square law of Newtonian gravitation at millimeter scales and below. The goal of these experiments is to search for additional gravitational interactions involving massive particles or for the presence of large extra dimensions. The challenge of these experiments is to distinguish gravitation-like interactions from electromagnetic and quantum-mechanical effects. No deviations from
General Relativity: Experimental Tests
483
Table 1 Current limits on the PPN parameters Parameter
Effect
Limit
Remarks
1
(i) Shapiro delay (ii) Light deflection (i) Perihelion shift (ii) Nordtvedt effect Anisotropy in Newton’s G Orbit polarization for moving systems Anomalous spin precession for moving bodies Anomalous self-acceleration for spinning moving bodies Nordtvedt effect
2.3 105 4 104 3 103 2.3 104 103 104 4 107 2 1020 9 104 2 102 4 105 108
Cassini tracking VLBI J2 = 107 from helioseismology LLR plus bounds on other parameters Gravimeter bounds on anomalous Earth tides Lunar laser ranging Alignment of solar axis relative to ecliptic Pulsar spindown timing data Lunar laser ranging Combined PPN bounds Timing data for PSR 1913 þ 16 Lunar laser ranging Not independent
1 1 2 3 a 1 2 3 4
Anomalous self-acceleration for binary systems Violation of Newton’s third law
a
Here = 4 3 10=3 1 þ 22 =3 21 =3 2 =3.
Newton’s inverse-square law have been found to date at distances between 10 mm and 10 mm. Tests of Local Lorentz Invariance
Although special relativity itself never benefited from the kind of ‘‘crucial’’ experiments, such as the perihelion advance of Mercury and the deflection of light, that contributed so much to the initial acceptance of GR and to the fame of Einstein, the steady accumulation of experimental support, together with the successful integration of special relativity into quantum mechanics, led to its being accepted by mainstream physicists by the late 1920s, ultimately to become part of the standard toolkit of every working physicist. But in recent years new experiments have placed very tight bounds on any violations of the Lorentz invariance, which underlies special relativity. A simple way of interpreting this new class of experiments is to suppose that a coupling of some external gravitation-like field (not the metric) to the electromagnetic interactions results in an effective change in the speed of electromagnetic radiation, c, relative to the limiting speed of material test particles, c0 ; in other words, c 6¼ c0 . It can be shown that such a Lorentz-noninvariant electromagnetic interaction would cause shifts in the energy levels of atoms and nuclei that depend on the orientation of the quantization axis of the state relative to our velocity relative to the rest of the universe, and on the quantum numbers of the state, resulting in orientation dependences of the fundamental frequencies of such atomic clocks. The magnitude of these ‘‘clock anisotropies’’ would be proportional to j(c0 =c)2 1j, which vanishes if Lorentz invariance holds (see Will (1993) and Haugan and Will (1987) for details).
The earliest clock anisotropy experiments were carried out around 1960 independently by Hughes and Drever, although their original motivation was somewhat different. Dramatic improvements were made in the 1980s using laser-cooled trapped atoms and ions. This technique made it possible to reduce the broadening of resonance lines caused by collisions, leading to the impressive bound jj > 1021 (Will 2001). Other recent tests of Lorentz invariance violation include comparisons of resonant cavities with atomic clocks, tests of dispersion and birefringence in the propagation of high-energy photons from astrophysical sources, threshold effects in elementary particle collisions, and anomalies in neutrino oscillations. Mattingly (2005) gives a thorough and up-to-date review of both the theoretical frameworks for studying these effects and the experimental results. Tests of Local Position Invariance
LPI requires, among other things, that the internal binding energies of atoms and nuclei be independent of location in space and time, when measured against some standard atom. This means that a comparison of the rates of two different kinds of atomic clocks should be independent of location or epoch, and that the frequency shift between two identical clocks at different locations is simply a consequence of the apparent Doppler shift between a pair of inertial frames momentarily comoving with the clocks at the moments of emission and reception, respectively. The relevant parameter appears in the formula for the frequency shift, f =f ¼ ð1 þ Þ=c2
½2
484 General Relativity: Experimental Tests
where is the Newtonian gravitational potential. If LPI holds, = 0. An early test of this was the Pound–Rebka experiment of 1960, which measured the frequency shift of gamma rays from radioactive iron nuclei in a tower at Harvard University. The best bounds come from a 1976 experiment in which a hydrogen maser atomic clock was launched to 10 000 km altitude on a Scout rocket and its frequency compared via telemetry with an identical clock on the ground, and a 1993 experiment in which two different kinds of atomic clocks were intercompared as a function of the varying solar gravitational field as seen on Earth (a ‘‘null’’ redshift experiment). The results are (Will 2001): Maser < 2 104 ;
Null < 103
½3
Recent ‘‘clock comparison’’ tests of LPI include experiments done at the National Institute of Standards and Technology (NIST) in Boulder and at the Observatory of Paris, to look for cosmological variations in clock rates. The NIST experiment compared laser-cooled mercury ions with neutral cesium atoms over a two-year period, while the Paris experiment compared laser-cooled cesium and rubidium atomic fountains over five years; the results showed that the fine-structure constant is constant in time to a part in 1015 per year. A better bound of 6 1017 yr1 comes from analysis of fission yields of the Oklo natural reactor, which occurred in Africa two billion years ago.
Solar-System Tests The Parametrized Post-Newtonian Framework
It was once customary to discuss experimental tests of GR in terms of the ‘‘three classical tests,’’ the gravitational redshift (which is really a test of the EEP, not of GR itself; see the section on tests of LPI), the perihelion advance of Mercury (the first success of the theory), and the deflection of light (whose measurement in 1919 made Einstein a celebrity). However, the proliferation of additional experimental tests and of well-motivated alternative metric theories of gravity made it desirable to develop a more general theoretical framework for analyzing both experiments and theories. This ‘‘parametrized post-Newtonian (PPN) framework’’ dates back to Eddington in 1922, but was fully developed by Nordtvedt and Will in the period 1968–72 (see Will (1993) for details). When attention is confined to metric theories of gravity and, further, the focus is on the slow-motion, weak-field limit appropriate to the solar system and similar systems, it turns out that, in a broad class of metric theories, only the numerical values of a set of
coefficients in the spacetime metric vary from theory to theory. The resulting PPN framework contains ten parameters: , related to the amount of spatial curvature generated by mass; , related to the degree of nonlinearity in the gravitational field; , 1 , 2 , and 3 , which determine whether the theory violates LPI or LLI in gravitational experiments; and 1 , 2 , 3 , and 4 , which describe whether the theory has appropriate momentum conservation laws. In GR, = 1, = 1, and the remaining parameters all vanish. In the scalar– tensor theory of Brans–Dicke, = (1 þ !BD )=(2 þ !BD ), where !BD is an adjustable parameter. A number of well-known relativistic effects can be expressed in terms of these PPN parameters: Deflection of light 1 þ 4GM ¼ 2 dc2 1þ R arcsec ¼ 1:7505 2 d
½4
where d is the distance of closest approach of a ray of light to a body of mass M, and where the second line is the deflection by the Sun, with radius R . Shapiro time delay t ¼
1 þ 4GM ðr1 þ x1 nÞðr2 x2 nÞ ln ½5 2 c3 d2
where t is the excess travel time of a round-trip electromagnetic tracking signal, x1 and x2 are the locations relative to the body of mass M of the emitter and receiver of the round-trip signal (r1 and r2 are the respective distances), and n is the direction of the outgoing tracking signal. Perihelion advance d! 2 þ 2 GM ¼ dt 3 Pað1 e2 Þc2 2 þ 2 ¼ 42:98 arcsec=100 yr 3
½6
where P, a, and e are the period, semimajor axis, and eccentricity of the planet’s orbit, respectively; the second line is the value for Mercury. Nordtvedt effect mG mI 2 ¼ 4 3 10 3 1 þ 3 2 mI jE j g 23 1 13 2 mI c2
½7
where mG and mI are, respectively, the gravitational and inertial masses of a body such as the Earth or
General Relativity: Experimental Tests
Moon, and Eg is its gravitational binding energy. A nonzero Nordtvedt effect would cause the Earth and Moon to fall with a different acceleration toward the Sun. In GR, this effect vanishes.
1.10
Radio
485
Deflection of light
Optical 1.05
2 × 10–4 VLBI
Precession of a gyroscope
(1 + γ)/2
1.00
dS ¼ ðFD þ Geo Þ S dt 1 1 G ð J 3nn JÞ FD ¼ 1 þ þ 2 4 r3 c2 1 1 1þþ ¼ 0:041 arcsec yr1 2 4 1 Gmn Geo ¼ ð1 þ 2Þv 2 2 2 r c 1 ¼ ð1 þ 2Þ 6:6 arcsec yr1 3
Hipparcos
0.95
PSR 1937 + 21
1.05
Shapiro time delay
Voyager
1.00
½8
where S is the spin of the gyroscope, and FD and Geo are, respectively, the precession angular velocities caused by the dragging of inertial frames (Lense–Thirring effect) and by the geodetic effect, a combination of Thomas precession and precession induced by spatial curvature; J is the angular momentum of the Earth, and v, n, and r are, respectively, the velocity, direction, and distance of the gyroscope. The second line in each case is the corresponding value for a gyroscope in polar Earth orbit at about 650 km altitude (Gravity Probe B). Bounds on the PPN Parameters
Four decades of high-precision experiments, ranging from the standard light-deflection and perihelionshift tests, to LLR, planetary and satellite tracking tests of the Shapiro time delay, and geophysical and astronomical observations, have placed bounds on the PPN parameters that are consistent with GR. The current bounds are summarized in Table 1 (Will 2001). To illustrate the dramatic progress of experimental gravity since the dawn of Einstein’s theory, Figure 1 shows a history of results for (1 þ )=2, from the 1919 solar eclipse measurements of Eddington and his colleagues (which made Einstein a celebrity), to modern-day measurements using very long baseline radio interferometry (VLBI), advanced radar tracking of spacecraft, and the astrometry satellite Hipparcos. The most recent results include a 2003 measurement of the Shapiro delay, performed by tracking the ‘‘Cassini’’ spacecraft on its way to Saturn, and a 2004 measurement of the bending of light via analysis of VLBI data on 541 quasars and compact radio galaxies distributed over the entire sky.
Cassini (1 × 10–5)
Viking 0.95 1920
1940
1960
1970
1980
1990
2000
Year of experiment Figure 1 Measurements of the coefficient (1 þ )=2 from observations of the deflection of light and of the Shapiro delay in propagation of radio signals near the Sun. The GR prediction is unity. ‘‘Optical’’ denotes measurements of stellar deflection made during solar eclipse, and ‘‘Radio’’ denotes interferometric measurements of radio-wave deflection. ‘‘Hipparcos’’ denotes the European optical astrometry satellite. Arrows denote values well off the chart from one of the 1919 eclipse expeditions and from others through 1947. Shapiro delay measurements using the Cassini spacecraft on its way to Saturn yielded tests at the 0.001% level, and light deflection measurements using VLBI have reached 0.02%.
The perihelion advance of Mercury, the first of Einstein’s successes, is now known to agree with observation to a few parts in 103 . During the 1960s there was controversy about this test when reports of an excess solar oblateness implied an unacceptably large Newtonian contribution to the perihelion advance. However, it is now known from helioseismology, the study of short-period vibrations of the Sun, that the oblateness is of the order of a part in 107 , as expected from standard solar models, much too small to affect Mercury’s orbit, within the observational errors. Gravity Probe B
The NASA Relativity Mission called Gravity Probe B (GPB) recently completed its mission to measure the Lense–Thirring and geodetic precessions of gyroscopes in Earth’s orbit. Launched on 20 April 2004 for a 16-month mission, it consisted of four spherical rotors coated with a thin layer of superconducting niobium, spinning at 70–100 Hz, in a spacecraft filled with liquid helium, containing a telescope continuously pointed
486 General Relativity: Experimental Tests
toward a distant guide star (IM Pegasi). Superconducting current loops encircling each rotor were designed to measure the change in direction of the rotors by detecting the change in magnetic flux through the loop generated by the London magnetic moment of the spinning superconducting film. The spacecraft was in a polar orbit at 650 km altitude. The primary science goal of GPB was a 1% measurement of the 41 marcsec yr1 frame dragging or Lense–Thirring effect caused by the rotation of the Earth; its secondary goal was to measure to six parts in 105 the larger 6.6 arcsec yr1 geodetic precession caused by space curvature.
The Binary Pulsar The binary pulsar PSR 1913 þ 16, discovered in 1974, provided important new tests of GR. The pulsar, with a pulse period of 59 ms, was observed to be in orbit about an unseen companion (now generally thought to be a dead pulsar), with a period of 8 h. Through precise timing of apparent variations in the pulsar ‘‘clock’’ caused by the Doppler effect, the important orbital parameters of the system could be measured with exquisite precision. These included nonrelativistic ‘‘Keplerian’’ parameters, such as the eccentricity e, and the orbital period (at a chosen epoch) Pb , as well as a set of relativistic ‘‘post-Keplerian’’ (PK) parameters. _ is the mean rate of The first PK parameter, h!i, advance of periastron, the analog of Mercury’s perihelion shift. The second, denoted 0 , is the effect of special relativistic time dilation and the gravitational redshift on the observed phase or arrival time of pulses, resulting from the pulsar’s orbital motion and the gravitational potential of its companion. The third, P_ b , is the rate of decrease of the orbital period; this is taken to be the result of gravitational radiation damping (apart from a small correction due to the acceleration of the system in our rotating galaxy). Two other parameters, s and r, are related to the Shapiro time delay of the pulsar signal if the orbital inclination is such that the signal passes in the vicinity of the companion; s is a direct measure of the orbital
inclination sin i. According to GR, the first three PK effects depend only on e and Pb , which are known, and on the two stellar masses, which are unknown. By combining the observations of PSR 1913 þ 16 (see Table 2) with the GR predictions, one obtains both a measurement of the two masses and a test of GR, since the system is overdetermined. The results are m1 ¼ 1:4414 0:0002M ;
m2 ¼ 1:3867 0:0002M
GR OBS P_ b =P_ b ¼ 1:0013 0:0021
½9
Other relativistic binary pulsars may provide even more stringent tests. These include the relativistic neutron star/white dwarf binary pulsar J1141-6545, with a 0.19 day orbital period, which may ultimately lead to a very strong bound on the phenomenon of dipole gravitational radiation, predicted by many alternative theories of gravity, but not by GR; and the remarkable ‘‘double pulsar’’ J0737-3039, a binary system with two detected pulsars, in a 0.10 day orbit seen almost edge on and a periastron advance of 178 per year. For further discussion of binary pulsar tests, see Stairs (2003).
Gravitational-Wave Tests The detection of gravitational radiation by either laser interferometers or resonant cryogenic bars will usher in a new era of gravitational-wave astronomy (Barish and Weiss 1999). Furthermore, it will yield new and interesting tests of GR in its radiative regime (Will 1999). GR predicts that gravitational waves possess only two polarization modes independently of the source; they are transverse to the direction of propagation and quadrupolar in their effect on a detector. Other theories of gravity may predict up to four additional modes of polarization. A suitable array of gravitational antennas could delineate or limit the number of modes present in a given wave. If distinct evidence were found of any mode other than the two transverse quadrupolar modes of GR, the result would be disastrous for the theory.
Table 2 Parameters of the binary pulsars PSR 1913 þ 16 and J0737-3039 Parameter
Symbol
Valuea in PSR1913 þ 16
Valuea in J0737-3039
Keplerian parameters Eccentricity Orbital period
e Pb (day)
0.6171338(4) 0.322997448930(4)
0.087779(5) 0.102251563(1)
Post-Keplerian parameters Periastron advance Redshift/time dilation Orbital period derivative Shapiro delay ( sin i)
_ yr1 ) h!i( 0 (ms) P_ b (1012 ) s
4.226595(5) 4.2919(8) 2.4184(9)
16.90(1) 0.382(5)
a
Numbers in parentheses denote errors in last digit.
0.9995(4)
General Relativity: Overview
According to GR, gravitational waves propagate with the same speed, c, as light. In other theories, the speed could differ from c because of coupling of gravitation to ‘‘background’’ gravitational fields, or propagation of the waves into additional spatial dimensions. Another way in which the speed of gravitational waves could differ from c is if gravitation were propagated by a massive field (a massive graviton), in which case vg would be given by, in a local inertial frame, v2g c2
¼1
m2g c4 E2
1
c2 f 2 2g
½10
where mg , E, and f are the graviton rest mass, energy, and frequency, respectively, and g = h=mg c is the graviton Compton wavelength (it is assumed that g c=f ). The most obvious way to measure the speed of gravitational waves is to compare the arrival times of a gravitational wave and an electromagnetic wave from the same event (e.g., a supernova). For a source at a distance of 600 million light years (a typical distance for the currently operational detectors), and a difference in times on the order of seconds, the bound on the difference j1 vg =cj could be as small as a part in 1017 . It is worth noting that a 2002 report that the speed of gravity had been measured by studying light from a quasar as it propagated past Jupiter was fundamentally flawed. That particular measurement was not sensitive to the speed of gravity.
Conclusions The past four decades have witnessed a systematic, high-precision experimental verification of Einstein’s theories. Relativity has passed every test with flying colors. A central theme of future work will be to test strong-field gravity in the vicinity of black holes and
487
neutron stars, and to see how well GR works on cosmological scales. Gamma-ray, X-ray, microwave, infrared, neutrino, and gravitational-wave astronomy will all play a critical role in probing these largely unexplored aspects of GR. GR is now the ‘‘standard model’’ of gravity. But, as in particle physics, there may be a world beyond the standard model. Quantum gravity, strings, and branes may lead to testable effects beyond Einstein’s GR. Searches for such effects using laboratory experiments, particle accelerators, space instrumentation, and cosmological observations are likely to continue for some time to come. See also: Cosmology: Mathematical Aspects; Einstein Equations: Exact Solutions; General Relativity: Overview; Geometric Flows and the Penrose Inequality; Gravitational Lensing; Gravitational Waves; Standard Model of Particle Physics.
Further Reading Barish BC and Weiss R (1999) LIGO and the detection of gravitational waves. Physics Today 52: 44. Haugan MP and Will CM (1987) Modern tests of special relativity. Physics Today 40: 69. Mattingly D (2005) Modern tests of Lorentz invariance. Living Reviews in Relativity 8: 5 (online article cited on 1 April 2005, http://www.livingreviews.org/lrr-2005-5). Stairs IH (2003) Testing general relativity with pulsar timing. Living Reviews in Relativity 6: 5 (online article cited on 1 April 2005, http://www.livingreviews.org/lrr-2003-5). Will CM (1993) Theory and Experiment in Gravitational Physics. Cambridge: Cambridge University Press. Will CM (1999) Gravitational radiation and the validity of general relativity. Physics Today 52: 38. Will CM (2001) The confrontation between general relativity and experiment. Living Reviews in Relativity 4: 4 (online article cited on 1 April 2005, http://www.livingreviews.org/ lrr-2001-4).
General Relativity: Overview R Penrose, University of Oxford, Oxford, UK ª 2006 Elsevier Ltd. All rights reserved.
The Principle of Equivalence The special theory of relativity is founded on two basic principles: that the laws of physics should be independent of the uniform motion of an inertial frame of reference, and that the speed of light should have the same constant value in any such frame. In the years between 1905 and 1915, Einstein pondered deeply on what was, to him, a
profound enigma, which was the issue of why these laws retain their proper form only in the case of an inertial frame. In special relativity, as had been the case in the earlier dynamics of Galilei–Newton, the laws indeed retain their basic form only when the reference frame is unaccelerated (which includes it being nonrotating). It demonstrated a particular prescience on the part of Einstein that he should have demanded the seemingly impossible requirement that the very same dynamical laws should hold also in an accelerating (or even rotating) reference frame. The key realization came to him late in 1907, when sitting in his chair in the Bern
488 General Relativity: Overview
patent office he had the ‘‘happiest thought’’ in his life, namely that if a person were to fall freely in a gravitational field, then he would not notice that field at all while falling. The physical point at issue is Galileo’s early insight (itself having roots even earlier from Simon Stevin in 1586 or Ioannes Philiponos in the fifth or sixth century) that the acceleration induced by gravity is independent of the body upon which it acts. Accordingly, if two neighboring bodies are accelerated together in the same gravitational field, then the motion of one body, in the (nonrotating) reference frame of the other, will be as though there were no gravitational field at all. To put this another way, the effect of a gravitational force is just like that of an accelerating reference system, and can be eliminated by free fall. This is now known as the ‘‘principle of equivalence.’’ It should be made clear that this is a particular feature of only the gravitational field. From the perspective of Newtonian dynamics, it is a consequence of the seemingly accidental fact that the concept of (passive) ‘‘mass’’ m that features in Newton’s law of gravitational attraction, where the attractive force due to the gravitational field of another body, of mass M, has the form GmM r2 is the same as – or, at least, proportional to – the inertial mass m of the body which is being acted upon. Thus, the impedance to acceleration of a body and the strength of the attractive force on that body are, in the case of gravity (and only in the case of gravity), in proportion to one another, so that the acceleration of a body in a gravitational field is independent of its mass (or, indeed, of any other localized magnitude) possessed by it. (The fact that the active gravitational mass, here given by the quantity M, is also in proportion to its own passive gravitational mass – from Newton’s third law – may be regarded as a feature of the general Lagrangian/ Hamiltonian framework of physics. But see Bondi (1957).) Other forces of nature do not have this property. For example, the electrostatic force on a charged body, by an electric field, acts in proportion to the electric charge on that body, whereas, the impedance to acceleration is still the inertial mass of that body, so the acceleration induced depends on the charge-to-mass ratio. Accordingly, it is the gravitational field alone which is equivalent to an acceleration. Einstein’s fundamental idea, therefore, was to take the view that the ‘‘relativity principle’’ could as well be applied to accelerating reference frames as to inertial ones, where the same physical laws would apply in each, but where now the perceived
gravitational field would be different in the two frames. In accordance with this perspective, Einstein found it necessary to adopt a different viewpoint from the Newtonian one, both with regard to the notion of ‘‘gravitational force’’ and to the very notion of an ‘‘inertial frame.’’ According to the Newtonian perspective, it would be appropriate to describe the action of the Earth’s gravitational field, near some specific place on the Earth’s surface, in terms of a ‘‘Newtonian inertial frame’’ in which the Earth is ‘‘fixed’’ (here we ignore the Earth’s rotation and the Earth’s motion about the Sun), and we consider that there is a constant gravitational field of force (directed towards the Earth’s center). But the Einsteinian perspective is to regard that frame as noninertial where, instead, it would be a frame which falls freely in the Earth’s (Newtonian) gravitational field that would be regarded as a suitable ‘‘Einsteinian inertial frame.’’ Generally, to be inertial in Einstein’s sense, the frame would refer to free fall under gravity, so that the Newtonian field of gravitational force would appear to have disappeared – in accordance with his ‘‘happiest thought’’ that Einstein had had in the Bern patent office. We see that the concept of a gravitational field must also be changed in the passage from Newton’s to Einstein’s viewpoint. For in Newton’s picture we indeed have a ‘‘gravitational force’’ directed towards the ground with a magnitude of gm, where m is the mass of the body being acted upon and g is the ‘‘acceleration due to gravity’’ at the Earth’s surface, whereas in Einstein’s picture we have specifically eliminated this ‘‘gravitational force’’ by the choice of ‘‘Einsteinian inertial frame.’’ It might at first seem puzzling that the gravitational field has appeared to have been removed altogether by this device, and it is natural to wonder how gravitational effects can have any physical role to play at all from this point of view! However, this would be to go too far, as the Newtonian gravitational field may vary from place to place – as it does, indeed, in the case of the Earth’s field, since it is directed towards the Earth’s center, which is a different spatial direction at different places on the Earth’s surface. Our considerations up to this point really refer only to a small neighborhood of a point. One might well take the view that a ‘‘frame’’ ought really to describe things also at widely separated places at once, and the considerations of the paragraphs above do not really take this into consideration.
The Tidal Effect To proceed further, it will be helpful to consider an astronaut A in free fall, high above the Earth’s surface. Let us first adopt a Newtonian perspective.
General Relativity: Overview
We shall be concerned only with the instantaneous accelerations due to gravity in the neighborhood of A, so it will be immaterial whether we regard the astronaut as falling to the ground or – more comfortably! – in orbit about the Earth. Let us imagine that the astronaut is initially surrounded, nearby, by a sphere of particles, with A at the centre, which are taken to be initially at rest with respect to A (see Figure 1). To a first approximation, all the particles will share the same acceleration as the astronaut, so they will seem to the astronaut to hover motionless all around. But now let us be a little more precise about the accelerations. Those particles which are initially located in a vertical line from A, that is, either directly below A, at B, or directly above A, at T, will have, like A, an acceleration which is in the direction AO, where O is the Earth’s center. But for the bottom point B, the acceleration will be slightly greater than that at A, and for the top point T, the acceleration will be slightly less than the acceleration at A, because of the slightly differing distances from O. Thus, relative to A, both will initially accelerate away from A. With regard to particles in the sphere which are initially in a circle in the horizontal plane through A, the direction to O will now be somewhat inwards, so that the particles
489
at these points Hi will accelerate, relative to A, slightly inwards. Accordingly, the entire sphere of particles will begin to get distorted into a prolate spheroid (elongated ellipsoid of revolution). This is referred to as the tidal distortion, for the good reason that it is precisely the same physical effect which is responsible for the tides in the Earth’s oceans, where for this illustration we are to think of the Earth’s center as being at A, the Moon (or Sun) to be situated at O, and the sphere of particles to represent the surface of the water of the Earth’s oceans. It is not hard to calculate (reverting, now, to our original picture) that, as a reflection of Newton’s inverse-square law of gravitational attraction, the amount of (small) outward vertical displacement from A (at B and T) will be twice the inward horizontal displacement (over the circle of points Hi ); accordingly, the sphere will initially be distorted into an ellipsoid of the same volume. This depends upon there being no gravitating matter inside the sphere. The presence of such matter would contribute a volume-reducing effect in proportion to the total mass surrounded. (An extreme case illustrating this would occur if we take our sphere of particles to surround the entire Earth, where the volumereducing effect would be manifest in the accelerations towards the ground at all points of the surrounding sphere.)
Gravity as Curved Spacetime A
E
It is appropriate to take a spacetime view of these phenomena (Figure 2). The distortions that we have been considering are, in fact, direct manifestations
E
(a)
(b)
Figure 1 (a) Tidal effect. The astronaut A surrounded by a sphere of nearby particles initially at rest with respect to A. In Newtonian terms, they have an acceleration towards the Earth’s center E, varying slightly in direction and magnitude (singleshafted arrows). By subtracting A’s acceleration from each, we obtain the accelerations relative to A (double-shafted arrows); this relative acceleration is slightly inward for those particles displaced horizontally from A, but slightly outward for those displaced vertically from A. Accordingly, the sphere becomes distorted into a (prolate) ellipsoid of revolution, with symmetry axis in the direction AE. The initial distortion preserves volume. (b) Now move A to the Earth’s center E and the sphere of particles to surround E just above the atmosphere. The acceleration (relative to A = E) is inward all around the sphere, with an initial volume reduction acceleration 4GM, where M is the total mass surrounded. Reproduced with permission from Penrose R (2004) The Road to Reality: A Complete Guide to the Laws of the Universe. London: Jonathan Cape.
A E
(a)
E
(b)
Figure 2 Spacetime versions of Figure 1 in terms of the relative distortion of neighboring geodesics. (a) Geodesic deviation in empty space (basically Weyl curvature) as seen in the world lines of A and surrounding particles (one spatial dimension suppressed), as might be induced from the gravitational field of a nearby body E. (b) The corresponding inward acceleration (basically Ricci curvature) due to the mass density within the bundle of geodesics. Reproduced with permission from Penrose R (2004) The Road to Reality: A Complete Guide to the Laws of the Universe. London: Jonathan Cape.
490 General Relativity: Overview
of spacetime curvature, according to Einstein’s viewpoint. We are to think of the world line of a particle, falling freely under gravity (Einsteinian inertial motion), as described as some kind of geodesic in spacetime. We shall be coming to this more completely shortly, but for the moment it will be helpful to picture the behavior of geodesics within an ordinary curved 2-surface S (Figure 3). If S has positive (Gaussian) curvature, then there will be a tendency for geodesics on S to bend towards each other, so that a pair of infinitesimally separated geodesics which are initially parallel will begin to get closer together as we move along them; if S has negative (Gaussian) curvature, then there will be a corresponding tendency for geodesics on S to bend away from each other. This is what happens in two dimensions, where the intrinsic curvature at a point is given by a single number. However, we are now concerned with a four-dimensional space, where the notion of curvature requires many more components. We see in Figure 2 that we are indeed to expect mixtures of convergence and divergence of geodesics, which suggests that there are both positive and negative curvature components involved, the positive curvature being in the horizontally displaced directions from A and the negative curvature in the vertically displaced directions. In a curved space of dimension 4, as is the case for a curved spacetime, we can expect 20 independent components of curvature at each point altogether. In the present situation, the others would be called into play when differing velocities of A are considered. Let us see how we are to accommodate the above considerations within the standard framework of differential geometry. So far, we have not really deviated from Newtonian theory, even though we have been considering ‘‘geodesics’’ in a four-dimensional spacetime. In fact, it is perfectly legitimate to view Newtonian theory in this way (see Newtonian
γ
(a)
γ′
γ
γ′
(b)
Figure 3 Geodesic deviation when M is a 2-surface (a) of positive (Gaussian) curvature, when the geodesics , bend towards each other, and (b) of negative curvature, when they bend apart. Reproduced with permission from Penrose R (2004) The Road to Reality: A Complete Guide to the Laws of the Universe. London: Jonathan Cape.
Limit of General Relativity), although the 4-geometry description is somewhat more complicated than one might wish. This is due to the fact that the infinite speed at which gravitation is taken to act in Newtonian theory demands that the ‘‘metric’’ of Newtonian spacetime is degenerate. (In effect, one would have a degenerate ‘‘dual metric’’ Gab , of matrix rank 3, which plays a role in defining spatial displacements and a very degenerate ‘‘metric’’ Gab , of matrix rank 1, which defines temporal differences, where Gab Gbc = 0; see Newtonian Limit of General Relativity.) Accordingly, there is no unique notion of ‘‘geodesic’’ defined by the metric in Newtonian theory. It is striking that although the insights provided by the principle of equivalence are to some considerable extent independent of special relativity (since we see from the paragraphs prior to the preceding one that a curved-spacetime-geometry view of gravity is natural in the light of the equivalence principle alone), it is the nondegenerate metric gab , (and its inverse g ab ) that special relativity gives us locally, which leads to an elegant spacetime theory of gravity. Although the metric gab is Lorentzian (with preferred choice of signature þ here) rather than positive definite, so that the spacetime is not strictly a Riemannian one, the change of signature makes little difference to the local formalism. In particular, the fact that the metric defines a unique (torsion-free) connection preserving it is unaffected by the signature. This connection is the one defined by Christoffel’s symbols ac b ¼ 12 gdb ð@c gda þ @a gcd @d gca Þ where @a stands for coordinate derivative @=@xa , so that the covariant derivative of a vector V a is given by ra V b ¼ @a V b þ V c ac b (Here the standard ‘‘physicist’s conventions’’ are being used, whereby notation such as ‘‘gab ’’ and ‘‘V a ’’ can be used interchangeably either for the sets of components of the metric tensor g and the vector V, respectively, or alternatively for the entire geometrical metric tensor g or vector V, in each case; moreover, the summation convention is being assumed, or this can alternatively be understood in terms of abstract indices. (For the abstract-index notation for tensors, see Penrose and Rindler (1984), especially Chapters 2 and 4. Sign and index-ordering conventions used here follow those given in that book. Many other authors use conventions which differ from these in various, usually minor respects.))
General Relativity: Overview
Physical Interpretation of the Metric Some words of clarification are needed, as to the meaning of the metric tensor gab in relativity theory. In the early discussions by Einstein and others, the spacetime metric tended to be interpreted in terms of little ‘‘rulers’’ placed on a curved manifold. Although this is natural in the Riemannian (positive-definite) case, it is not quite so appropriate for the Lorentzian geometry of spacetime manifolds. An ordinary physical ruler has a spacetime description as a timelike strip, and it does not naturally express the spatial separation between two spacelike-separated events. In order for a ruler to measure such a spacelike separation, it would be necessary for the two events to be simultaneous in the ruler’s rest frame, and for this to be assured, some further mechanism would be needed, such as Einstein’s procedure for ensuring simultaneity by the use of light signals from the two events to be received simultaneously at their midpoint on the ruler. Clearly this complicates the issue, and it turns out to be much preferable to concentrate on temporal displacements rather than spatial ones. The idea that spacetime geometry should really be regarded as ‘‘chronometry,’’ in this way, has been stressed by a number of distinguished expositors of relativity theory, most notably John L Synge (1956, 1960) and Hermann Bondi (1961, 1964, 1967). Where needed, spatial displacements can then be defined by the use of temporal ones together with light signals. This has the additional advantage that in modern technology, the measurement of (proper) time far surpasses that of distance in accuracy, to the extent that the meter is now defined simply by the requirement that there are exactly 299792458 of them in a light-second! The proper time interval between two nearby events is, indeed, measured by a clock which encounters both events, moving inertially between the two, and very precise atomic and nuclear clocks are now a common feature of current technology. The physical role of the metric gab is most clearly seen in the formula Z q ¼ ðgab dxa dxb Þ p
which measures the (proper) time interval between an event p and a later event q on its world line, the integral being taken along this curve, and where now that curve need not be a geodesic, so that accelerating (noninertial) motion of the clock is allowed. The metric (with choice of signature þ so that it is the timelike displacements that are directly provided as real numbers) is very precisely specified by this physical requirement, and this tells us that the
491
pseudo-Riemannian (Lorentzian) structure of spacetime is far from being an arbitrary construction, but is given to us by Nature with enormous precision. (Some theorists prefer to use the alternative spacetime signature þ þ þ , because this more directly relates to familiar Newtonian concepts, these being normally described in spatial terms. The difference is essentially just a notational one, however. It may be remarked that the 2-spinor formalism (see Spinors and Spin Coefficients) fits in much more readily with the þ signature being used here.) It may be noted, also that this time measure is ultimately fixed by quantum principles and the masses of the elementary ingredients involved (e.g., particle masses) via the Einstein and Planck relations E = mc2 and E = h, so that there is a natural frequency associated with a given mass, via = mc2 =h (c being the speed of light and h being Planck’s constant).
Riemann Curvature and Geodesic Deviation The unique torsion-free (Christoffel–Levi-Civita) connection ra is, via this physically determined metric, also fixed accordingly by these physical considerations, as is the notion of a geodesic, and therefore so also is the curvature. The 20-independentcomponent Riemann curvature tensor Rabcd may be defined by ðra rb rb ra ÞV d ¼ Rabc d V c with normal index-raising/lowering conventions, so that Rabcd = Reabc ged , etc., and we have the standard classical formula Rabc d ¼ @a cb d @b ca d þ cb e ea d ca e eb d The symmetries Rabcd = Rcdab = Rbacd , Rabcd þ Rbcad þ Rcabd = 0 reduce the number of independent components of Rabcd to 20 (from a potential 44 = 64). Of these, 10 are locally fixed by the kind of physical requirement indicated above, that in order to express something that agrees closely with Newton’s inverse-square law we require that there should be a net inward curving of free world lines (the timelike geodesics that represent local inertial motions, or ‘‘free fall’’ under gravity). Let us see how this requirement is satisfied in Einstein’s general relativity. What we find, from Newton’s theory, is that a system of test particles which, at some initial time constitutes a closed 2-surface at rest surrounding some gravitating matter, will begin to accelerate in such a way that the volume surrounded is initially reduced in proportion to the total mass surrounded.
492 General Relativity: Overview
This volume reduction is a direct consequence of Poisson’s equation r2 = 4 ( being the gravitational potential and the mass density) and of Newton’s second law, which tell us that the second time derivative of the free-fall volume of our initially stationary closed surface of test particles is indeed 4GM, where M is the total gravitating mass surrounded (and G is Newton’s constant, as above). In Einstein’s theory, we can basically carry this over to our four-dimensional Lorentzian spacetime. We do, however, find that such a general statement as this does not exactly hold. Instead of referring to 3-volumes of any size, we must restrict attention to infinitesimal volumes. The basic mathematical tool is the equation of ‘‘geodesic deviation,’’ namely the ‘‘Jacobi equation’’: D2 ud ¼ Rabc d t a ub tc
direction ta . In (special) relativity theory we expect to identify mass density with c2 energy density (by E = mc2 ) and to take energy density as just one component (the time–time component) of a symmetric tensor Tab , called the ‘‘energy tensor,’’ and for simplicity we now take c = 1. The tensor quantity Tab is to incorporate the contributions to the local mass/energy density of all particles and fields other than gravity itself. Since we would require this to work for all choices of timedirection ta , it would be natural, accordingly, to make the identification Rab ¼ 4GTab Indeed, this was Einstein’s initial choice for a gravitational field equation. However, this will actually not do, as Einstein later realized. The trouble comes from the Bianchi identity
where D describes ‘‘propagation derivative’’ D ¼ t a ra along a timelike geodesic , where ta is a unit timelike tangent vector to (so ta ta = gab ta tb = 1) which is (consequently) parallel-propagated along , Dt a ¼ 0 (When acting on a scalar quantity defined along , we can read ‘‘D’’ as ‘‘d=d,’’ where measures proper time along .) The vector ua is what is called a connecting vector between the geodesic and some ‘‘neighboring geodesic’’ 0 . We think of the vector ua as ‘‘connecting’’ a point p on to some neighboring point p0 on 0 , where it is usual to take ua to be orthogonal to ta (i.e., ua ta = 0). The derivative Dua measures the rate of change of ua , as p and p0 move together into the future along . Mathematically, we express this as the vanishing of the Lie derivative of ua with respect to ta (with ta extended to a unit vector field which is tangent both to and to 0 ). By taking three independent vectors ua at p, we can form a spatial 3-volume element W and investigate how this propagates along . We find D2 W ¼ WRab ta tb where the Ricci tensor Rab (= Rba ) is here defined by Rab ¼ Racb c
The Einstein Field Equations In view of what has been said above, with regard to the way that the acceleration of volume behaves in Newtonian theory, it would be natural to ‘‘identify’’ Rab ta tb with (4G ) the (active gravitational) mass density, with respect to the time
ra Rbcde þ rb Rcade þ rc Rabde ¼ 0 from which we deduce ra Rab 12Rgab ¼ 0 where R ¼ Ra a This causes trouble in connection with the standard requirement on the energy tensor, that it satisfy the local ‘‘conservation law’’ ra Tab ¼ 0 The latter equation is an essential requirement in special relativity, since it expresses the conservation of energy and momentum for fields in flat spacetime. In standard Minkowski coordinates, each of Ta0 , Ta1 , Ta2 , Ta3 satisfies an equation just like the ra Ja = 0 of the charge–current vector Ja of Maxwell’s theory of electromagnetism, with now ra = @a = @=@xa , which expresses global conservation of charge. Similarly to the way that Ja encapsulates density and flux of electric charge, Ta0 encapsulates density and flux of energy, and Ta1 , Ta2 , Ta3 encapsulate the same for the three components of momentum. So the equation ra Tab = 0 is essential in special relativity, for similarly expressing global conservation of energy and momentum. We find (referring to a local inertial frame) that, when we pass to general relativity, this equation should still hold, with ra now standing for covariant derivative. But the initially proposed field equation Rab = 4GTab would now give us ra Rab = 0, which combined with the geometrically necessary ra (Rab ( 12 )Rgab ) = 0, tells us that R is constant. In turn this implies the physically unacceptable requirement that T = Taa is constant (since we have R = 4GT).
General Relativity: Overview
Einstein eventually became convinced (by 1915) of the modified field equations Rab 12Rgab ¼ 8GTab (the ‘‘8’’ rather than ‘‘4’’ being now needed to fit in with the Newtonian limit) and it is these that are now commonly referred to as ‘‘Einstein’s field equations.’’ (Some authors prefer to use the singular form ‘‘field equation,’’ especially if the formula is to be read as an abstract-index expression rather than a family of component equations, since the tensors involved are really single entities.) It may be noted that the formula can be rewritten as Rab ¼ 8G Tab 12Tgab from which we deduce that in Einstein’s theory the source of gravity is not simply the mass (or equivalently energy) density, but there is an additional contribution from the pressure (momentum flux, i.e., space–space components of Tab ). This can have significant implications for the instability of very large and massive stars in highly relativistic regimes, where increases in pressure can, paradoxically, actually increase the tendency for a star to collapse, owing to its contribution to the attractive effect of its gravity. In 1917, Einstein put forward a slight modification of his field equations – basically the only modification that can be made without fundamentally changing the foundations of his theory – by introducing the very tiny cosmological constant . The modified equations are Rab 12Rgab þ gab ¼ 8GTab and the source of gravity, or active gravitational mass is now þ P1 þ P 2 þ P3
4G
where (with respect to a local Lorentzian orthonormal frame, units being chosen so that c = 1) = T00 is the mass/energy density and P1 = T11 , P2 = T22 , P3 = T33 are the principal pressures. The -term, for positive , provides a repulsive contribution to the gravitational effect, but it is extremely tiny (and totally ignorable) on all ordinary scales, beginning to show itself only at the most vast of observed cosmological distances (since the effect of adds up relentlessly at larger and larger distances). Einstein originally introduced the term in order to have the possibility of a static universe, where the attractive gravitational effect of the totality of ordinary matter would be balanced, overall, by . But the discovery of the expansion of the universe (by Hubble and others) led Einstein to abandon the cosmological term. However, since 1998 (initially
493
from the supernova observations of Brian Schmidt and Robert Kirschner, and Saul Perlmutter, see Perlmutter et al. (1998)), cosmological evidence has mounted in favor of the presence of a very small positive -term, which has resulted in the expansion of the universe beginning to accelerate. While the presence of Einstein’s constant -term is consistent with observations, and remains the simplest explanation of this observed acceleration, many cosmologists prefer to allow for what would amount to a ‘‘varying ,’’ and refer to it as ‘‘dark energy.’’
Energy Conservation and Related Matters One of the features of Einstein’s general relativity theory that had been deeply puzzling to a good many of Einstein’s contemporaries, and which may be said to be still not fully resolved, even today, is ‘‘energy conservation,’’ in the presence of a dynamical gravitational field. We have noted that the energy tensor Tab is to incorporate the contributions of all particles and fields other than gravity. But what about gravity itself? There are many physical situations in which energy can be transferred back and forth from gravitational systems to nongravitational ones (most strikingly in the example of the emission of gravitational waves; see Gravitational Waves). The conservation of energy would make no sense without an understanding of how energy can be stored in a gravitational field. At first sight we seem to see no role for a gravitational contribution to energy in Einstein’s theory, since the conservation law ra Tab = 0 seems to be a self-contained expression of energy conservation with no direct contribution from the gravitational field in the tensor Tab . However, this is illusory, since the formulation of a global conservation law from the local covariant expression ra Tab = 0 does not work in curved spacetime (basically because, unlike the charge–current quantity Ja of Maxwell’s electrodynamical theory, the extra index on Tab prevents it from being regarded as a 1-form). We may take the view that the energy of gravitation enters nonlocally into the equation, so that the failure of Tab to provide a global conservation law on its own is an expression of the gravitational contributions of energy not being taken into account. This is no doubt a correct attitude to take, but it is a difficult one to express comprehensively in a mathematical form. Einstein himself provided a partial understanding, but at the expense of introducing concepts known as ‘‘pseudotensors’’ whose meaning was too tied up with arbitrary choices of coordinate systems to provide an overall picture. In modern approaches, the most clear-cut results come from the study of asymptotically flat or asymptotically de Sitter spacetimes (de Sitter space being the empty universe which takes over the role of Minkowski space when
494 Generic Properties of Dynamical Systems
there is a positive cosmological constant ; see Cosmology: Mathematical Aspects). The important role of the ‘‘Weyl conformal tensor’’ Cabcd ¼ Rabcd 12ðRac gbd Rbc gad þ Rbd gac Rad gbc Þ þ 16Rðgac gbd gbc gad Þ should also be pointed out. This tensor retains all the symmetries of the full Riemann tensor, but has the Ricci tensor contribution removed, so that all its contractions vanish, as is exemplified by Cabc a ¼ 0 It describes the conformal part of the curvature, that is, that part that survives under conformal rescalings of the metric; gab 7! 2 gab where is a smooth (positive) function of position. The tensor Cabc d is itself invariant under these conformal rescalings. This has importance in the asymptotic analysis of gravitational fields (see Asymptotic Structure and Conformal Infinity). We may take the view that Cabcd describes the degrees of freedom in the free gravitational field, whereas Rab contains the information of the sources of gravity. This is analogous to the Maxwell tensor Fab describing the degrees of freedom in the free electromagnetic field, whereas Ja contains the information of the sources of electromagnetism. From the observational point of view, general relativity stands in excellent shape, with full agreement with all known relevant data, starting with the anomalous perihelion advance of the planet Mercury observed by LeVerrier in the mid-nineteenth century, through clock-slowing, light-bending (lensing) and time-delay effects, and the necessary corrections to GPS positioning systems, to the precise orbiting of double neutron-star systems, with energy loss due to the emission of gravitational waves. The effects of gravitational lensing now play vital roles in modern cosmology.
To get some idea of the precision in Einstein’s theory, we may take note of the fact that the double neutron-star system PSR 1913þ16 has been observed for some 30 years, and the agreement between observation and theory overall is to about one part in 1014 . See also: Asymptotic Structure and Conformal Infinity; Canonical General Relativity; Computational Methods in General Relativity: the Theory; Cosmology: Mathematical Aspects; Einstein Equations: Exact Solutions; Einstein Equations: Initial Value Formulation; Einstein–Cartan Theory; Einstein’s Equations with Matter; General Relativity: Experimental Tests; Geometric Flows and the Penrose Inequality; Gravitational Lensing; Gravitational Waves; Hamiltonian Reduction of Einstein’s Equations; Lorentzian Geometry; Newtonian Limit of General Relativity; Noncommutative geometry and the Standard Model; Spacetime Topology, Causal Structure and Singularities; Spinors and Spin Coefficients; Symmetries and Conservation Laws; Twistor Theory: Some Applications [in Integrable Systems, Complex Geometry and String Theory].
Further Reading Bondi H (1957) Negative mass in general relativity. Reviews of Modern Physics 29: 423 (Mathematical Reviews 19: 814). Bondi H (1961) Cosmology. Cambridge: Cambridge University Press. Bondi H (1964) Relativity and Common Sense. London: Heinemann. Hartle JB (2003) Gravity: An Introduction to Einstein’s General Relativity. San Francisco: Addison Wesley. Penrose R (2004) The Road to Reality: A Complete Guide to the Laws of the Universe. London: Jonathan Cape. Penrose R and Rindler W (1984) Spinors and Space-Time, Vol. 1: Two-Spinor Calculus and Relativistic Fields. Cambridge: Cambridge University Press. Perlmutter S et al. (1998) Cosmology from type Ia supernovae. Bulletin of the American Astronomical Society 29 (astro-ph/9812473). Rindler W (2001) Relativity: Special, General, and Cosmological. Oxford: Oxford University Press. Synge JL (1960) Relativity: The General Theory. Amsterdam: North-Holland. Wald RM (1984) General Relativity. Chicago: University of Chicago Press.
Generic Properties of Dynamical Systems C Bonatti, Universite´ de Bourgogne, Dijon, France ª 2006 Elsevier Ltd. All rights reserved.
Introduction The state of a concrete system (from physics, chemistry, ecology, or other sciences) is described using (finitely many, say n) observable quantities (e.g., positions and velocities for mechanical systems, population densities for echological systems, etc.). Hence, the state of a system may be
represented as a point x in a geometrical space R n . In many cases, the quantities describing the state are related, so that the phase space (space of all possible states) is a submanifold M Rn . The time evolution of the system is represented by a curve xt , t 2 R drawn on the phase space M, or by a sequence xn 2 M, n 2 Z, if we consider discrete time (i.e., every day at the same time, or every January 1st). Believing in determinism, and if the system is isolated from external influences, the state x0 of the system at the present time determines its evolution. For continuous-time systems, the infinitesimal
Generic Properties of Dynamical Systems 495
evolution is given by a differential equation or vector field dx=dt = X(x); the vector X(x) represents velocity and direction of the evolution. For a discrete-time system, the evolution rule is a function F : M ! M; if x is the state at time t, then F(x) is the state at the time t þ 1. The evolution of the system, starting at the initial data x0 , is described by the orbit of x0 , that is, the sequence {(xn )n2Z j xnþ1 = F(xn )} (discrete time) or the maximal solution xt of the differential equation ax=dt = X(x) (continuous time). General problem Knowing the initial data and the infinitesimal evolution rule, what can we tell about the long-time evolution of the system? The dynamics of a dynamical system (differential equation or function) is the behavior of the orbits, when the time tends to infinity. The aim of ‘‘dynamical systems’’ is to produce a general procedure for describing the dynamics of any system. For example, Conley’s theory presented in the next section organizes the global dymamics of a general system using regions concentrating the orbit accumulation and recurrence and splits these regions in elementary pieces: the chain recurrence classes. We focus our study on C r -diffeomorphisms F (i.e., F and F1 are r times continuously derivable) on a compact smooth manifold M (most of the notions and results presented here also hold for vector fields). Even for very regular systems (F algebraic) of a lowdimensional space ( dim (M) = 2), the dynamics may be chaotic and very unstable: one cannot hope for a precise description of all systems. Furthermore, neither the initial data of a concrete system nor the infinitesimal-evolution rule are known exactly: fragile properties describe the evolution of the theoretical model, and not of the real system. For these reasons, we are mostly interested in properties that are persistent, in some sense, by small perturbations of the dynamical system. The notion of small perturbations of the system requires a topology on the space Diffr (M) of C r diffeomorphisms: two diffeomorphisms are close for the C r -topology if all their partial derivatives of order r are close at each point of M. Endowed with this topology, Diff r (M) is a complete metric space. The open and dense subsets of Diff r (M) provide the natural topological notion of ‘‘almost all’’ F. Genericity is a weaker notion: by Baire’s theorem, if OTi , i 2 N, are dense and open subsets, the intersection i2N Oi is a dense subset. A subset is called residual if it contains such a countable intersection of dense open subsets. A property P is generic if it is verified on a residual subset. By a practical abuse of language, one says: ‘‘C r -generic diffeomorphisms verify P’’ A countable intersection of residual sets is a residual set. Hence, if {P i }, i 2 N, is a countable family of
generic properties, generic diffeomorphisms verify simultanuously all the properties P i . A property P is C r -robust if the set of diffeomorphisms verifying P is open in Diff r (M). A property P is locally generic if there is an (nonempty) open set O on which it is generic, that is, there is residual set R such that P is verified on R \ O. The properties of generic dynamical systems depend mostly on the dimension of the manifold M and of the C r -topology considered, r 2 N [ {þ1} (an important problem is that Cr -generic diffeomorphisms are not C rþ1 ):
On very low dimensional spaces (diffeomorphisms of the circle and vector fields on compact surfaces) the dynamics of generic systems (indeed in a open and dense subset of systems) is very simple (called Morse– Smale) and well understood; see the subsection ‘‘Generic properties of the low-dimensional systems.’’ In higher dimensions, for C r -topology, r > 1, one has generic and locally generic properties related to the periodic orbits, like the Kupka–Smale property (see the subsection ‘‘Kupka–Smale theorem’’) and the Newhouse phenomenon (see the subsection ‘‘Local C2 -genericity of wild behavior for surface diffeomorphisms’’). However, we still do not know if the dynamics of Cr -generic diffeomorphisms is well approached by their periodic orbits, so that one is still far from a global understanding of Cr -generic dynamics. For the C1 -topology, perturbation lemmas show that the global dynamics is very well approximated by periodic orbits (see the section ‘‘C1 -generic systems: global dynamics and periodic orbits’’). One then divides generic systems in ‘‘tame’’ systems, with a global dynamics analoguous to hyperbolic dynamics, and ‘‘wild’’ systems, which present infinitely many dynamically independent regions. The notion of dominated splitting (see the section ‘‘Hyperbolic properties of C1 -generic diffeormorphisms’’) seems to play an important role in this division.
Results on General Systems Notions of Recurrence
Some regions of M are considered as the heart of the dynamics:
Per(F) denotes the set of periodic points x 2 M of F, that is, Fn (x) = x for some n > 0.
A point x is recurrent if its orbit comes back arbitrarily close to x, infinitely many times. Rec(F) denotes the set of recurrent points. The limit set Lim(F) is the union of all the accumulation points of all the orbits of F.
496 Generic Properties of Dynamical Systems
A point x is ‘‘wandering’’ if it admits a neighborhood Ux M disjoint from all its iterates F n (Ux ), n > 0. The nonwandering set (F) is the set of the nonwandering points. R(F) is the set of chain recurrent points, that is, points x 2 M which look like periodic points if we allow small mistakes at each iteration: for any " > 0, there is a sequence x = x0 , x1 , . . . , xk = x where d(f (xi ), xiþ1 ) < " (such a sequence is an "-pseudo-orbit). A periodic point is recurrent, a recurrent point is a limit point, a limit point is nonwandering, and a nonwandering point is chain recurrent: PerðFÞ RecðFÞ LimðFÞ ðFÞ RðFÞ All these sets are invariant under F, and (F) and R(F) are compact subsets of M. There are diffeomorphisms F for which the closures of these sets are distinct:
A rotation x 7! x þ with irrational angle 2 1
RnQ on the circle S = R=Z has no periodic points but every point is recurrent. The map x 7! x þ (1=4)(1 þ cos (2x)) induces on the circle S1 a diffeomorphism F having a unique fixed point at x = 1=2; one verifies that (F) = {1=2} and R(F) is the whole circle S1 . An invariant compact set K M is transitive if there is x 2 K whose forward orbit is dense in K. Generic points x 2 K have their forward and backward orbits dense in K: in this sense, transitive sets are dynamically indecomposable. Conley’s Theory: Pairs Attractor/Repeller and Chain Recurrence Classes
A trapping region U M is a compact set whose image F(U) is contained in the T interior of U. By definition, the intersection A = n0 F n (U) is an attractor of F: any orbit in U ‘‘goes to A.’’ Denote by V the complement of the interior of U: T it is a trapping region for F1 and the intersection R = n0 Fn (V) is a repeller. Each orbit either is contained in A [ R, or ‘‘goes from the repeller to the attractor.’’ More precisely, there is a smooth function : M ! [0, 1] (called Lyapunov function) equal to 1 on R and 0 on A, and strictly decreasing on the other orbits: ðFðxÞÞ < ðxÞ
for x 2 =A[R
So, the chain recurrent set is contained in A [ R. Any compact set contained in U and containing the interior of F(U) is a trapping region inducing the same attracter and repeller pair (A,R); hence, the set of attracter/repeller pairs is countable. We denote by (Ai , Ri , i ), i 2 N, the family of these pairs endowed
with an associated Lyapunov function. Conley (1978) proved that \ ðAi [ Ri Þ RðFÞ ¼ i2N
This induces a natural partition of R(F) in equivalence classes: x y if x 2 Ai , y 2 Ai . Conley proved that x y iff, for any " > 0, there are "-pseudo orbits from x to y and vice versa. The equivalence classes for are called chain recurrence classes. Now, considering an average of the Lyapunov functions i one gets the following result: there is a continuous function ’: M ! R with the following properties:
’(F(x)) ’(x) for every x 2 M, (i.e., ’ is a Lyapunov function);
’(F(x)) = ’(x) , x 2 R(F); for x, y 2 R(F), ’(x) = ’(y) , x y; and the image ’(R(F) is a compact subset of R with empty interior. This result is called the ‘‘fundamental theorem of dynamical systems’’ by several authors (see Robinson (1999)). Any orbit is ’-decreasing from a chain recurrence class to another chain reccurence class (the global dynamics of F looks like the dynamics of the gradient flow of a function , the chain recurrence classes supplying the singularities of ). However, this description of the dynamics may be very rough: if F preserves the volume, Poincare´’s recurrence theorem implies that (F) = R(F) = M; the whole M is the unique chain recurrence class and the function ’ of Conley’s theorem is constant. Conley’s theory provides a general procedure for describing the global topological dynamics of a system: one has to characterize the chain recurrence classes, the dynamics in restriction to each class, the stable set of each class (i.e., the set of points whose positive orbits goes to the class), and the relative positions of these stable sets.
Hyperbolicity Smale’s hyperbolic theory is the first attempt to give a global vision of almost all dynamical systems. In this section we give a very quick overview of this theory. For further details, see Hyperbolic Dynamical Systems. Hyperbolic Periodic Orbits
A fixed point x of F is hyperbolic if the derivative DF(x) has no (neither real nor complex) eigenvalue with modulus equal to 1. The tangent space at x
Generic Properties of Dynamical Systems 497
splits as Tx M = Es Eu , where Es and Eu are the DF(x)-invariant spaces corresponding to the eigenvalues of moduli < 1 and > 1, respectively. There are Cr -injectively immersed F-invariant submanifolds W s (x) and W u (x) tangent at x to Es and Eu ; the stable manifold W s (x) is the set of points y whose forward orbit goes to x. The implicit-function theorem implies that a hyperbolic fixed point x varies (locally) continuously with F; (compact parts of) the stable and unstable manifolds vary continuously for the Cr -topology when F varies with the Cr -topology. A periodic point x of period n is hyperbolic if it is a hyperbolic fixed point of Fn and its invariant manifolds are the corresponding invariant manifolds for Fn . The stable and unstable manifold of the orbit s u of x, Worb (x) and Worb (x), are the unions of the invariant manifolds of the points in the orbit. Homoclinic Classes
Distinct stable manifolds are always disjoint; however, stable and unstable manifolds may intersect. At the end of the nineteenth century, Poincare´ noted that the existence of transverse homoclinic orbits, s that is, transverse intersection of Worb (x) with u Worb (x) (other than the orbit of x), implies a very rich dynamical behavior: indeed, Birkhoff proved that any transverse homoclinic point is accumulated by a sequence of periodic orbits (see Figure 1). The homoclinic class H(x) of a periodic orbit is the closure of the transverse homoclinic point associated to x: s u HðpÞ ¼ Worb ðxÞ\Worb ðxÞ
There is an equivalent definition of the homoclinic class of x: we say that two hyperbolic periodic s points x and y are homoclinically related if Worb (x) u u and Worb (x) intersect transversally Worb (y) and s Worb (y), respectively; this defines an equivalence relation in Perhyp (F) and the homoclinic classes are the closure of the equivalence classes. The homoclinic classes are transitive invariant compact sets canonically associated to the periodic
x
f 2(x)
f(x)
Figure 1 A transverse homoclinic orbit.
orbits. However, for general systems, homoclinic classes are not necessarily disjoint. For more details, see Homoclinic Phenomena. Smale’s Hyperbolic Theory
A diffeomorphism F is Morse–Smale if (F) = Per(F) is finite and hyperbolic, and if W s (x) is tranverse to W u (y) for any x, y 2 Per(F). Morse–Smale diffeomorphisms have a very simple dynamics, similar to the one of the gradient flow of a Morse function; apart from periodic points and invariant manifolds of periodic saddles, each orbit goes from a source to a sink (hyperbolic periodic repellers and attractors). Furthermore, Morse–Smale diffeomorphisms are C1 -structurally stable, that is, any diffeomorphism C1 -close to F is conjugated to F by a homeomorphism: the topological dynamics of F remains unchanged by small C1 -perturbation. Morse–Smale vector fields were known (Andronov and Pontryagin, 1937) to characterize the structural stability of vector fields on the sphere S2 . However, a diffeomorphism having transverse homoclinic intersections is robustly not Morse–Smale, so that Morse–Smale diffeomorphisms are not Cr -dense, on any compact manifold of dimension 2. In the early 1960s, Smale generalized the notion of hyperbolicity for nonperiodic sets in order to get a model for homoclinic orbits. The goal of the theory was to cover a whole dense open set of all dynamical systems. An invariant compact set K is hyperbolic if the tangent space TMjK of M over K splits as the direct sum TMK = Es Eu of two DF-invariant vector bundles, where the vectors in Es and Eu are uniformly contracted and expanded, respectively, by Fn , for some n > 0. Hyperbolic sets persist under small C1 -perturbations of the dynamics: any diffeomorphism G which is C1 -close enough to F admits a hyperbolic compact set KG close to K and the restrictions of F and G to K and KG are conjugated by a homeomorphism close to the identity. Hyperbolic compact sets have welldefined invariant (stable and unstable) manifolds, tangent (at the points of K) to Es and Eu and the (local) invariant manifolds of KG vary locally continuously with G. The existence of hyperbolic sets is very common: if y is a transverse homoclinic point associated to a hyperbolic periodic point x, then there is a transitive hyperbolic set containing x and y. Diffeomorphisms for which R(F) is hyperbolic are now well understood: the chain recurrence classes are homoclinic classes, finitely many, and transitive, and admit a combinatorical model (subshift of finite type). Some of them are
498 Generic Properties of Dynamical Systems
attractors or repellers, and the basins of the attractors cover a dense open subset of M. If, furthermore, all the stable and unstable manifolds of points in R(f ) are transverse, the diffeomorphism is C1 -structurally stable (Robbin 1971, Robinson 1976); indeed, this condition, called ‘‘axiom A þ strong transversality,’’ is equivalent to the C1 -structural stability (Man˜e´ 1988). In 1970, Abraham and Smale built examples of robustly non-axiom A diffeomorphisms, when dim M 3: the dream of a global understanding of dynamical systems was postponed. However, hyperbolicity remains a key tool in the study of dynamical systems, even for nonhyperbolic systems.
Cr -Generic Systems Kupka–Smale Theorem
Thom’s transversality theorem asserts that two submanifolds can always be put in tranverse position by a Cr -small perturbations. Hence, for F in an open and dense subset of Diff r (M), r 1, the graph of F in M M is transverse to the diagonal = {(x, x), x 2 M}: F has finitely many fixed points xi , depending locally continuously on F, and 1 is not an eigenvalue of the differential DF(xi ). Small local perturbations in the neighborhood of the xi avoid eigenvalue of modulus equal to 1: one gets a dense and open subset Or1 of Diff r (M) such that every fixed point is hyperbolic. This argument, adapted for periodic points, provides a dense and open set Orn Diff r (M), such that every periodic point of period n T is hyperbolic. Now n2N Orn is a residual subset of Diff r (M), for which every periodic point is hyperbolic. the set of diffeomorphisms F 2 TnSimilarly, r O (M) such that all the disks of size n, of i=0 i invariant manifolds of periodic points of period less that n, are pairwise transverse, is open and dense. One gets the Kupka–Smale theorem (see Palis and de Melo (1982) for a detailed exposition): for Cr -generic diffeomorphisms F 2 Diffr (M), every periodic orbit is hyperbolic and W s (x) is transverse to W u (y) for x, y 2 Per(F). Generic Properties of Low-Dimensional Systems
Poincare´–Denjoy theory describes the topological dynamics of all diffeomorphisms of the circle S1 (see Homeomorphisms and Diffeomorphisms of the Circle). Diffeomorphisms in an open and dense subset of Diff rþ (S1 ) have a nonempty finite set of periodic orbits, all hyperbolic, and alternately attracting (sink) or repelling (source). The orbit of
a nonperiodic point comes from a source and goes to a sink. Two Cr -generic diffeomorphisms of S1 are conjugated iff they have same rotation number and same number of periodic points. This simple behavior has been generalized in 1962 by Peixoto for vector fields on compact orientable surfaces S. Vector fields X in a Cr -dense and open subset are Morse–Smale, hence structurally stable (see Palis and de Melo (1982) for a detailed proof). Peixoto gives a complete classification of these vector fields, up to topological equivalence. Peixoto’s argument uses the fact that the return maps of the vector field on transverse sections are increasing functions: this helped control the effect on the dynamics of small ‘‘monotonous’’ perturbations, and allowed him to destroy any nontrivial recurrences. Peixoto’s result remains true on nonorientable surfaces for the C1 -topology but remains an open question for r > 1: is the set of Morse– Smale vector fields C2 -dense, for S nonorientable closed surface? Local C2 -Genericity of Wild Behavior for Surface Diffeomorphisms
The generic systems we have seen above have a very simple dynamics, simpler than the general systems. This is not always the case. In the 1970s, Newhouse exhibited a C2 -open set O Diff 2 (S2 ) (where S2 denotes the two-dimensional sphere), such that C2 -generic diffeomorphisms F 2 O have infinitely many hyperbolic periodic sinks. In fact, C2 -generic diffeomorphisms in O present many other pathological properties: for instance, it has been recently noted that they have uncountably many chain recurrence classes without periodic orbits. Densely (but not generically) in O, they present many other phenomena, such as strange (Henon-like) attractors (see Lyapunov Exponents and Strange Attractors). This phenomenon appears each time that a diffeomorphism F0 admits a hyperbolic periodic point x whose invariant manifolds W s (x) and W u (x) are tangent at some point p 2 W s (x) \ W u (x) (p is a homoclinic tangency associated to x). Homoclinic tangencies appear locally as a codimension-1 submanifold of Diff 2 (S2 ); they are such a simple phenomenon that they appear in very natural contexts. When a small perturbation transforms the tangency into tranverse intersections, a new hyperbolic set K with very large fractal dimensions is created. The local stable and unstable manifolds of K, each homeomorphic to the product of a Cantor set by a segment, present tangencies in a C2 -robust way, that is, for F in some C2 -open set O (see Figure 2). As a consequence, for a C2 -dense subset of O, the
Generic Properties of Dynamical Systems 499
For the C2 topology, thepdistances ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d(xi , x0 ) need to remain greater than d(xn , x0 )=" d(xn , x0 ). This new difficulty is why the C2 -closing lemma remains an open question. Pugh’s argument does not suffice to create homoclinic point for a periodic orbit whose unstable manifold accumulates on the stable one. In 1998, Hayashi solved this problem proving the
Figure 2 Robust tangencies.
invariant manifolds of the point x present some tangency (this is not generic, by Kupka–Smale theorem). If the Jacobian of F at x is < 1, each tangency allows to create one more sink, by an arbitrarily small perturbation. Hence, the sets of diffeomorphisms having more than n hyperbolic sinks are dense open subsets of O, and the intersection of all these dense open subsets is the announced residual set. See Palis and Takens (1993) for details on this deep argument.
C1 -Generic Systems: Global Dynamics and Periodic Orbits See Bonatti et al. (2004), Chapter 10 and Appendix A, for a more detailed exposition and precise references.
Connecting lemma (Hayashi 1997) Let y and z be two points such that the forward orbit of y and the backward orbit of z accumulate on the same nonperiodic point x. Fix some " > 0. There is N > 0 and a "-C1 -perturbation G of F such that Gn (y) = z for some n > 0, and G F out of an arbitrary small neighborhood of {x, F(x), . . . , FN (x)}. Using Hayashi’s arguments, we (with Crovisier) proved the following lemma: Connecting lemma for pseudo-orbits (Bonatti and Crovisier 2004) Assume that all periodic orbits of F are hyperbolic; consider x, y 2 M such that, for any " > 0, there are "-pseudo-orbits joining x to y; then there are arbitrarily small C1 -perturbations of F for which the positive orbit of x passes through y. Densities of Periodic Orbits
As a consequence of the perturbations lemma above, we (Bonatti and Crovisier 2004) proved that for F C1 -generic, RðFÞ ¼ ðFÞ ¼ Perhyp ðFÞ
Perturbations of Orbits: Closing and Connecting Lemmas
In 1968, Pugh proved the following Lemma. Closing lemma If x is a nonwandering point of a diffeomorphism F, then there are diffeomorphisms G arbitrarily C1 -close to F, such that x is periodic for G. Consider a segment x0 , . . . , xn = Fn (x0 ) of orbit such that xn is very close to x0 = x; one would like to take G close to F such that G(xn ) = x0 , and G(xi ) = F(xi ) = xiþ1 for i 6¼ n. This idea works for the C0 -topology (so that the C0 -closing lemma is easy). However, if one wants G "-C1 -close to F, one needs that the points xi , i 2 {1, . . . , n 1}, remain at distance d(xi , x0 ) greater than C(d(xn , x0 )="), where C bounds kDf k on M. If C=" is very large, such a segment of orbit does not exist. Pugh solved this difficulty in two steps: the perturbation is first spread along a segment of orbit of x in order to decrease this constant; then a subsegment y0 , . . . , yk of x0 , . . . , xn is selected, verifying the geometrical condition.
where Perhyp (F) denotes the closure of the set of hyperbolic periodic points. For this, consider the map : F 7! (F) = Perhyp (F) defined on Diff 1 (M) and with value in K(M), space of all compact subsets of M, endowed with the Hausdorff topology. Perhyp (F) may be approximated by a finite set of hyperbolic periodic points, and this set varies continuously with F; so Perhyp (F) varies lower-semicontinuously with F: for G very close to F, Perhyp (G) cannot be very much smaller than Perhyp (F). As a consequence, a result from general topology asserts that, for C1 -generic F, the map is continuous at F. On the other hand, C1 -generic diffeomorphisms are Kupka–Smale, so that the connecting lemma for pseudo-orbits may apply: if x 2 R(F), x can be turned into a hyperbolic periodic point by a C1 -small perturbation of F. So, if x 2 = Perhyp (F), F is not a continuity point of , leading to a contradiction. Furthermore, Crovisier proved the following result: ‘‘for C1 -generic diffeomorphisms, each chain recurrence class is the limit, for the Hausdorff distance, of a sequence of periodic orbits.’’
500 Generic Properties of Dynamical Systems
This good approximation of the global dynamics by the periodic orbits will now allow us to better understand the chain recurrence classes of C1 -generic diffeomorphisms. Chain Recurrence Classes/Homoclinic Classes of C1 -Generic Systems
Tranverse intersections of invariant manifolds of hyperbolic orbits are robust and vary locally continuously with the diffeomorphisms F. So, the homoclinic class H(x) of a periodic point x varies lower-semicontinuously with F (on the open set where the continuation of x is defined). As a consequence, for Cr -generic diffeomorphisms (r 1), each homoclinic class varies continuously with F. Using the connecting lemma, Arnaud (2001) proved the following result: ‘‘for Kupka–Smale diffeou s morphisms, if the closures Worb (x) and Worb (x) 1 have some intersection point z, then a C -pertubau (x) tion of F creates a tranverse intersection of Worb s and Worb (x) at z.’’ So, if z 2 = H(x), then F is not a continuity point of the function F 7! H(x, F). Hence, for C1 -generic diffeomorphisms F and for every periodic point x, u s ðxÞ \ Worb ðxÞ HðxÞ ¼ Worb u s In the same way, Worb (x) and Worb (x) vary locally lower-semicontinuously with F so that, for F Cr -generic, the closures of the invariant manifolds of each periodic point vary locally continuously. For Kupka–Smale diffeomorphisms, the connecting lemma for pseudo-orbits implies: ‘‘if z is a point in the chain recurrence class of a periodic point x, then a C1 -small perturbation of F puts z on the unstable u manifold of x’’; so, if z 2 = Worb (x), then F is not a u continuity point of the function F 7! Worb (x, F). 1 Hence, for C -generic diffeomorphisms F and for every periodic point x, the chain recurrence class of u s x is contained in Worb (x) \ Worb (x), and, therefore, coincides with the homoclinic class of x. This argument proves:
For a C1 -generic diffeomorphism F, each homoclinic class H(x) is a chain recurrence class of F (of Conley’s theory): a chain recurrence class containing a periodic point x coincides with the homoclinic class H(x). In particular, two homoclinic classes are either disjoint or equal.
Tame and Wild Systems
For generic diffeomorphisms, the number N(F) 2 N [ {1} of homoclinic classes varies lower-semicontinuously with F. One deduces that N(F) is locally constant on a residual subset of Diff1 (M) (Abdenur 2003).
A local version (in the neighborhood of a chain recurrence class) of this argument shows that, for C1 -generic diffeomorphisms, any isolated chain recurrence classe C is robustly isolated: for any diffeomorphism G, C1 -close enough to F, the intersection of R(G) with a small neighborhood of C is a unique chain recurrence class CG close to C. One says that a diffeomorphism is ‘‘tame’’ if each chain recurrence class is robustly isolated. We denote by T (M) Diff 1 (M) the (C1 -open) set of tame diffeomorphisms and by W(M) the complement of the closure of T (M). C1 -generic diffeomorphisms in W(M) have infinitely many disjoint homoclinic classes, and are called ‘‘wild’’ diffeomorphisms. Generic tame diffeomorphisms have a global dynamics analogous to hyperbolic systems: the chain recurrence set admits a partition into finitely many homoclinic classes varying continuously with the dynamics. Every point belongs to the stable set of one of these classes. Some of the homoclinic classes are (transitive) topological attractors, and the union of the basins covers a dense open subset of M, and the basins vary continuously with F (Carballo Morales 2003). It remains to get a good description of the dynamics in the homoclinic classes, and particularly in the attractors. As we shall see in the next section, tame behavior requires some kind of weak hyperbolicity. Indeed, in dimension 2, tame diffeomorphisms satisfy axiom A and the noncycle condition. As of now, very little is known about wild systems. One knows some semilocal mechanisms generating locally C1 -generic wild dynamics, therefore proving their existence on any manifold with dimension dim (M) 3 (the existence of wild diffeomorphisms in dimension 2, for the C1 -topology, remains an open problem). Some of the known examples exhibit a universal dynamics: they admit infinitely many disjoint periodic disks such that, up to renormalization, the return maps on these disks induce a dense subset of diffeomorphisms of the disk. Hence, these locally generic diffeomorphisms present infinitely many times any robust property of diffeomorphisms of the disk. Ergodic Properties
A point x is well closable if, for any " > 0 there is G "-C1 -close to F such that x is periodic for G and d(Fi (x), Gi (x)) < " for i 2 {0, . . . , p}, p being the period of x. As an important refinement of Pugh’s closing lemma, Man˜e´ proved the following lemma: Ergodic closing lemma For any F-invariant probability, almost every point is well closable.
Generic Properties of Dynamical Systems 501
As a consequence, ‘‘for C1 -generic diffeomophisms, any ergodic measure is the weak limit of a sequence of Dirac measures on periodic orbits, which converges also in the Hausdorff distance to the support of .’’ It remains an open problem to know if, for C1 -generic diffeomorphisms, the ergodic measures supported in a homoclinic class are approached by periodic orbits in this homoclinic class.
F-invariant set such that the tangent space of M at the points x 2 X admits a DF-invariant splitting Tx (M) = E1 (x) Ek (x), the dimensions dim (Ei (x)) being independent of x. This splitting is dominated if the vectors in Eiþ1 are uniformly more expanded than the vectors in Ei : there exists ‘ > 0 such that, for any x 2 X, any i 2 {1, . . . , k 1} and any unit vectors u 2 Ei (x) and v 2 Eiþ1 (x), one has kDF‘ ðuÞk < 12 kDF‘ ðvÞk
Conservative Systems
The connecting lemma for pseudo-orbits has been adapted for volume preserving and symplectic diffeomorphisms, replacing the condition on the periodic orbits by another generic condition on the eigenvalues. As a consequence, one gets: ‘‘C1 -generic volume-preserving or symplectic diffeomorphisms are transitive, and M is a unique homoclinic class.’’ Notice that the KAM theory implies that this result is wrong for C4 -generic diffeomorphisms, the persistence of invariant tori allowing to break robustly the transitivity. The Oxtoby–Ulam (1941) theorem asserts that C0 -generic volume-preserving homeomorphisms are ergodic. The ergodicity of C1 -generic volumepreserving diffeomorphisms remains an open question.
Dominated splittings are always continuous, extend to the closure of X, and persist and vary continuously under C1 -perturbation of F. Dominated Splittings versus Wild Behavior
For a more detailed exposition of hyperbolic properties of C1 -generic diffeomorphisms, the reader is referred to Bonatti et al. (2004, chapter 7 and appendix B).
Let { Si } be a set of hyperbolic periodic orbits. On X = i one considers the natural splitting TMjX = Es Eu induced by the hyperbolicity of the i . Man˜e´ (1982) proved: ‘‘if there is a C1 -neighborhood of F on which each i remains hyperbolic, then the splitting TMjX = Es Eu is dominated.’’ A generalization of Man˜e´’s result shows: ‘‘if a homoclinic class H(x) has no dominated splitting, then for any " > 0 there is a periodic orbit in H(x) whose derivative at the period can be turned into an homothety, by an "-small perturbation of the derivative of F along the points of ’’; in particular, this periodic orbit can be turned into a sink or a source. As a consequence, one gets: ‘‘for C1 -generic diffeomorphisms F, any homoclinic class either has a dominated splitting or is contained in the closure of the (infinite) set of sinks and sources.’’ This argument has been used in two directions:
Perturbations of Products of Matrices
Tame systems must satisfy some hyperbolicity. In
Hyperbolic Properties of C1 -Generic Diffeomorphisms
The C1 -topology enables us to do small perturbations of the differential DF at a point x without perturbing either F(x) or F out of an arbitrarily small neighborhood of x. Hence, one can perturb the differential of F along a periodic orbit, without changing this periodic orbit (Frank’s lemma). When x is a periodic point of period n, the differential of Fn at x is fundamental for knowing the local behavior of the dynamics. This differential is (up to a choice of local coordinates) a product of the matrices DF(xi ), where xi = Fi (x). So, the control of the dynamical effect of local perturbations along a periodic orbit comes from a problem of linear algebra: ‘‘consider a product A = An An1 A1 of n 0 bounded linear ismorphisms of Rd ; how do the eigenvalues and the eigenspaces of A vary under small perturbations of the Ai ?’’ A partial answer to this general problem uses the notion of dominated splitting. Let X M be an
fact, using the ergodic closing lemma, one proves that the homoclinic classes H(x) of tame diffeomorphisms are volume hyperbolic, that is, there is a dominated splitting TM = E1 Ek over H(x) such that DF contracts uniformly the volume in E1 and expands uniformly the volume in Ek . If F admits a homoclinic class H(x) which is robustly without dominated splittings, then generic diffeomorphisms in the neighborhood of F are wild: at this time this is the unique known way to get wild systems. See also: Cellular Automata; Chaos and Attractors; Fractal Dimensions in Dynamics; Homeomorphisms and Diffeomorphisms of the Circle; Homoclinic Phenomena; Hyperbolic Dynamical Systems; Lyapunov Exponents and Strange Attractors; Polygonal Billiards; Singularity and Bifurcation Theory; Synchronization of Chaos.
502 Geometric Analysis and General Relativity
Further Reading Abdenur F (2003) Generic robustness of spectral decompositions. Annales Scientifiques de l’Ecole Normale Superieure IV 36(2): 213–224. Andronov A and Pontryagin L (1937) Syste`mes grossiers. Dokl. Akad. Nauk. USSR 14: 247–251. Arnaud M-C (2001) Cre´ation de connexions en topologie C1 . Ergodic Theory and Dynamical Systems 21(2): 339–381. Bonatti C and Crovisier S (2004) Re´currence et ge´ne´ricite´. Inventiones Mathematical 158(1): 33–104 (French) (see also the short English version: (2003) Recurrence and genericity. Comptes Rendus Mathematique de l’Academie des Sciences Paris 336 (10): 839–844). Bonatti C, Dı`az LJ, and Viana M (2004) Dynamics Beyond Uniform Hyperbolicity. Encyclopedia of Mathematical Sciences, vol. 102. Berlin: Springer. Carballo CM and Morales CA (2003) Homoclinic classes and finitude of attractors for vector fields on n-manifolds. Bulletins of the London Mathematical Society 35(1): 85–91.
Hayashi S (1997) Connecting invariant manifolds and the solution of the C1 stability and -stability conjectures for flows. Annals of Mathematics 145: 81–137. Man˜e´ R (1988) A proof of the C1 stability conjecture. Inst. Hautes E´tudes Sci. Publ. Math. 66: 161–210. Palis J and de Melo W (1982) Gemetric Theory of Dynamical Systems, An Introduction. New York–Berlin: Springer. Palis J and Takens F (1993) Hyperbolicity and Sensitive-Chaotic Dynamics at Homoclinic Bifurcation. Cambridge: Cambridge University Press. Robbin JW (1971) A structural stability theorem. Annals of Mathematics 94(2): 447–493. Robinson C (1974) Structural stability of vector fields. Annals of Mathematics 99(2): 154–175 (errata. Annals of Mathematics 101(2): 368 (1975)). Robinson C (1976) Structural stability of C1 diffeomorphisms. Journal of Differential Equations 22(1): 28–73. Robinson C (1999) Dynamical Systems. Studies in Advanced Mathematics. Boca Raton, FL: CRC Press.
Geometric Analysis and General Relativity L Andersson, University of Miami, Coral Gables, FL, USA and Albert Einstein Institute, Potsdam, Germany ª 2006 Elsevier Ltd. All rights reserved.
Geometric analysis can be said to originate in the nineteenth century work of Weierstrass, Riemann, Schwarz, and others on minimal surfaces, a problem whose history can be traced at least as far back as the work of Meusnier and Lagrange in the eighteenth century. The experiments performed by Plateau in the mid-19th century, on soap films spanning wire contours, served as an important inspiration for this work, and led to the formulation of the Plateau problem, which concerns the existence and regularity of area-minimizing surfaces in R3 spanning a given boundary contour. The Plateau problem for area-minimizing disks spanning a curve in R3 was solved by J Douglas (who shared the first Fields medal with Lars V Ahlfors) and T Rado in the 1930s. Generalizations of Plateau’s problem have been an important driving force behind the development of modern geometric analysis. Geometric analysis can be viewed broadly as the study of partial differential equations arising in geometry, and includes many areas of the calculus of variations, as well as the theory of geometric evolution equations. The Einstein equation, which is the central object of general relativity, is one of the most widely studied geometric partial differential equations, and plays an important role in its Riemannian as well as in its Lorentzian form, the Lorentzian being most relevant for general relativity.
The Einstein equation is the Euler–Lagrange equation of a Lagrangian with gauge symmetry and thus in the Lorentzian case it, like the Yang– Mills equation, can be viewed as a system of evolution equations with constraints. After imposing suitable gauge conditions, the Einstein equation becomes a hyperbolic system, in particular using spacetime harmonic coordinates (also known as wave coordinates), the Einstein equation becomes a quasilinear system of wave equations. The constraint equations implied by the Einstein equations can be viewed as a system of elliptic equations in terms of suitably chosen variables. Thus, the Einstein equation leads to both elliptic and hyperbolic problems, arising from the constraint equations and the Cauchy problem, respectively. The groundwork for the mathematical study of the Einstein equation and the global nature of spacetimes was laid by, among others, Choquet-Bruhat, who proved local well-posedness for the Cauchy problem, Lichnerowicz, and later York who provided the basic ideas for the analysis of the constraint equations, and Leray who formalized the notion of global hyperbolicity, which is essential for the global study of spacetimes. An important framework for the mathematical study of the Einstein equations has been provided by the singularity theorems of Penrose and Hawking, as well as the cosmic censorship conjectures of Penrose. Techniques and ideas from geometric analysis have played, and continue to play, a central role in recent mathematical progress on the problems posed by general relativity. Among the main results are the
Geometric Analysis and General Relativity
proof of the positive mass theorem using the minimal surface technique of Schoen and Yau, and the spinor-based approach of Witten, as well as the proofs of the (Riemannian) Penrose inequality by Huisken and Illmanen, and Bray. The proof of the Yamabe theorem by Schoen has played an important role as a basis for constructing Cauchy data using the conformal method. The results just mentioned are all essentially Riemannian in nature, and do not involve study of the Cauchy problem for the Einstein equations. There has been great progress recently concerning global results on the Cauchy problem for the Einstein equations, and the cosmic censorship conjectures of Penrose. The results available so far are either small data results (among these the nonlinear stability of Minkowski space proved by Christodoulou and Klainerman) or assume additional symmetries, such as the recent proof by Ringstro¨m of strong cosmic censorship for the class of Gowdy spacetimes. However, recent progress concerning quasilinear wave equations and the geometry of spacetimes with low regularity due to, among others, Klainerman and Rodnianski, and Tataru and Smith, appears to show the way towards an improved understanding of the Cauchy problem for the Einstein equations. Since the constraint equations, the Penrose inequality and the Cauchy problem are discussed in separate articles, the focus of this article will be on the role in general relativity of ‘‘critical’’ and other geometrically defined submanifolds and foliations, such as minimal surfaces, marginally trapped surfaces, constant mean curvature hypersurfaces and null hypersurfaces. In this context it would be natural also to discuss geometrically defined flows such as mean curvature flows, inverse mean curvature flow, and Ricci flow. However, this article restricts the discussion to mean curvature flows, since the inverse mean curvature flow appears naturally in the context of the Penrose inequality and the Ricci flow has so far mainly served as a source of inspiration for research on the Einstein equations rather than an important tool. Other topics which would fit well under the heading ‘‘General relativity and geometric analysis’’ are spin geometry (the Witten proof of the Positive mass theorem), the Yamabe theorem and related results concerning the Einstein constraint equations, gluing and other techniques of ‘‘spacetime engineering.’’ These are all discussed in other articles. Some techniques which have only recently come into use and for which applications in general relativity have not been much explored, such as Cheeger–Gromov compactness, are not discussed.
503
Minimal and Related Surfaces Consider a hypersurface N in Euclidean space Rn which is a graph xn = u(x1 , . . . , xn1 ) with respect to the function u. The area of N is given by R qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 1 A(N) = 1 þ jDuj dx dxn1 . N is stationary with respect to A if u satisfies the equation ! X Di u Di qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0 ½1 i 1 þ jDuj2 A hypersurface N defined as a graph of u solving [1] minimizes area with respect to compactly supported deformations, and hence is called a minimal surface. For n 7, a solution to eqn [1] defined on all of Rn1 must be an affine function. This fact is known as a Bernstein principle. Equation [1], and more generally, the prescribed mean curvature equation which will be discussed below, is a quasilinear, uniformly elliptic second-order equation. The book by Gilbarg and Trudinger (1983) is an excellent general reference for such equations. The theory of rectifiable currents, developed by Federer and Fleming, is a basic tool in the modern approach to the Plateau problem and related variational problems. A rectifiable current is a countable union of Lipschitz submanifolds, counted with integer multiplicity, and satisfying certain regularity conditions. Hausdorff measure gives a notion of area for these objects. One may therefore approach the study of minimal surfaces via rectifiable currents which are stationary with respect to variations of area. Suitable generalizations of familiar notions from smooth differential geometry such as tangent plane, normal vector, extrinsic curvature can be introduced. The book by Federer (1969) is a classic treatise on the subject. Further information concerning minimal surfaces and related variational problems can be found in Lawson, Jr. (1980) and Simon (1997). Note, however, that unless otherwise stated, all fields and manifolds considered in this article are assumed to be smooth. For the Plateau problem in a Riemannian ambient space, we have the following existence and regularity result. Theorem 1 (Existence of embedded solutions for Plateau problem). Let M be a complete Riemannian manifold of dimension n 7 and let be a compact (n 2)-dimensional submanifold in M which bounds. Then there is an (n 1)-dimensional area-minimizing hypersurface N with as its boundary. N is a smooth, embedded manifold in its interior. If the dimension of the ambient space is > 7, solutions to the Plateau problem will in general have a singular set of dimension n 8. Let N be an oriented hypersurface of a Riemannian manifold M
504 Geometric Analysis and General Relativity
with covariant derivative D. Let be the unit normal of N and define the second fundamental form and mean curvature of N by Aij = hDei , ej i and H = trA. R Define the action functional E(N) = A(N) M; N RH0 , where H0 is a function defined on M, and M; N denotes the integral over the volume bounded by N in M. The problem of minimizing E is a useful generalization of the minimization problem for A. Theorem 2 (Existence of minimizers in homology). Let M be a compact Riemannian manifold of dimension 7, and let be an integral homology class on M of codimension 1. Then there is a smooth minimizer for E representing []. Again, in higher dimensions, the minimizers will in general have singularities. The general form of this result deals with elliptic functionals. For surfaces in 3-manifolds, the problem of minimizing area within homotopy classes has been studied. Results in this direction played a central role in the approach of Schoen and Yau to manifolds with nonnegative scalar curvature. If M is not compact, it is in general necessary to use barriers to control the minimizers, or consider some version of the Plateau problem. Barriers can be used due to the strong maximum principle, which holds for the mean curvature operator since it is quasilinear elliptic. Consider two hypersurfaces N1 , N2 which intersect at a point p and assume that N1 lies on one side of N2 with the normal pointing towards N1 . If the mean curvatures H1 , H2 of the hypersurfaces, defined with respect to consistently oriented normals, satisfy H1 H2 for some constant , then N1 and N2 coincide near p and have mean curvatures equal to . This result requires only mild regularity conditions on the hypersurfaces. Generalizations hold also for the case of spacelike or null hypersurfaces in a Lorentzian ambient space, see Andersson et al. (1998) and Galloway (2000). Let be a smooth compactly supported function on N. The variation E 0 = E of E under a deformation is Z E0 ¼ ðH H0 Þ N
Thus, N is stationary with respect to E if and only if N solves the prescribed mean curvature equation H(x) = H0 (x) for x 2 N. Supposing that N is stationary and H0 is constant, the second variation E 00 = E 0 of E is of the form Z 00 ðJÞ E ¼ N
where J is the second-variation operator, a secondorder elliptic operator. A calculation, using the
Gauss equation and the second-variation equation shows J ¼ N 12 ½ðScalM ScalN Þ þ H 2 þ jAj2 ½2 where N , ScalM , ScalN denote the Laplace–Beltrami operator of N, and the scalar curvatures of M and N, respectively. If J is positive semidefinite, N is called stable. To set the context where we will apply the above, let (M, gij ) be a connected, asymptotically Euclidean three-dimensional Riemannian manifold with covariant derivative, and let kij be a symmetric tensor on M. Suppose (M, gij , Kij ) is imbedded isometrically as a spacelike hypersurface in a spacetime (V, ) with gij , Kij the first and second fundamental forms induced on M from V, in particular Kij = hDei T, ej i where T is the timelike normal of M in the ambient spacetime V, and D is the ambient covariant derivative. We will refer to (M, gij , Kij ) as a Cauchy data set for the Einstein equations. Although many of the results which will be discussed below generalize to the case of a nonzero cosmological constant , we will discuss only the case = 0 in this article. G = RicV (1=2)ScalV be the Einstein tensor of V, and let = G T T , j = Gj T . Then the fields (gij , Kij ) satisfy the Einstein constraint equations R þ tr K2 jKj2 ¼ 2
½3
rj tr K ri Kij ¼ j
½4
We assume that the dominant energy condition (DEC) X 1=2
i i ½5 i
holds. We will sometimes make use of the null energy condition (NEC), G L L 0 for null vectors L, and the strong energy condition (SEC), RicV v v 0 for causal vectors v. M will be assumed to satisfy the fall-off conditions 2m ½6a gij ¼ 1 þ ij þ Oð1=r2 Þ r Kij ¼ Oð1=r2 Þ
½6b
as well as suitable conditions for the fall-off of derivatives of gij , Kij . Here m is the ADM (Arnowitt, Deser, Misner) mass of (M, gij , Kij ).
Minimal Surfaces and Positive Mass Perhaps the most important application of the theory of minimal surfaces in general relativity is in the
Geometric Analysis and General Relativity
Schoen–Yau proof of the positive-mass theorem, which states that m 0, and m = 0 only if (M, g, K) can be embedded as a hypersurface in Minkowski space. Consider an asymptotically Euclidean manifold (M, g) with g satisfying [6a] and with non-negative scalar curvature. By using Jang’s equation, see below, the general situation is reduced to the case of a time symmetric data set, with K = 0. In this case, the DEC implies that (M, g) has non-negative scalar curvature. Assuming m < 0 one may, after applying a conformal deformation, assume that ScalM > 0 in the complement of a compact set. Due to the asymptotic conditions, level sets for sufficiently large values of one of the coordinate functions, say x3 , can be used as barriers for minimal surfaces in M. By solving a sequence of Plateau problems with boundaries tending to infinity, a stable entire minimal surface N homeomorphic to the plane is constructed. Stability implies using [2], Z 1 1 2 ScalM þ jAj 0 2 N 2 where = (1=2)ScalN is the Gauss curvature of N. Since by construction Scal R M 0, ScalM > 0 outside a compact set, this gives N > 0. Next, one uses the identity, related to the Cohn–Vossen inequality Z L2 ¼ 2 lim i i 2Ai N where Ai , Li are the area and circumference of a sequence of large discs. Estimates using the fact that M is asymptotically Euclidean show that limi (L2i =2Ai ) 2 which gives a contradiction and shows that the minimal surface constructed cannot exist. It follows that m 0. It remains to show that the case m = 0 is rigid. To do this proves that for an asymptotically Euclidean metric with non-negative scalar curvature, which is positive near infinity, there is a conformally related metric with vanishing scalar curvature and strictly smaller mass. Applying this argument in case m = 0 gives a contradiction to the fact that m 0. Therefore, m = 0 only if the scalar curvature vanishes identically. Suppose now that (M, g) has vanishing scalar curvature but nonvanishing Ricci curvature RicM . Then using a deformation of g in the direction of RicM , one constructs a metric close to g with negative mass, which leads to a contradiction. This technique generalizes to Cauchy surfaces of dimension n 7. The proof involves induction on dimension. For n > 7 minimal hypersurfaces are singular in general and this approach runs into problems. The Witten proof using spinor techniques does not suffer from this limitation but instead requires that M be spin.
505
Marginally Trapped Surfaces Consider a Cauchy data set (M, gij , Kij ) as above and let N be a compact surface in M with normal , second fundamental form A and mean curvature H. Then considering N as a surface in an ambient Lorentzian space V containing M, N has two null normal fields which after a rescaling can be taken to be L = T . Here, T is the future-directed timelike unit normal of M in V. The null mean curvatures (or null expansions) corresponding to L can be defined in terms of the variation of the area element N of N as L N = N or
¼ trN K H where trN K denotes the trace of the projection of Kij to N. Suppose Lþ is the outgoing null normal. N is called outer trapped (marginally trapped, untrapped) if þ < 0( þ = 0, þ > 0). An asymptotically flat spacetime which contains a trapped surface with < 0, þ < 0 is causally incomplete. In the following we will for simplicity drop the word outer from our terminology. Consider a Cauchy surface M. The boundary of the region in M containing trapped surfaces is, if it is sufficiently smooth, a marginally trapped surface. The equation þ = 0 is an equation analogous to the prescribed curvature equation, in particular it is a quasilinear elliptic equation of second order. Marginally trapped surfaces are not variational in the same sense as minimal surfaces. Nevertheless, they are stationary with respect to variations of area within the outgoing light cone. The second variation of area along the outgoing null cone is given, in view of the Raychaudhuri equation, by Lþ þ ¼ ðGþþ þ jþ j2 Þ
½7
Gþþ = G Lþ Lþ ,
for a function on N. Here and þ denotes the shear of N with respect to Lþ , that is, the tracefree part of the null second fundamental form with respect to Lþ . Equation [7] shows that the stability operator in the direction Lþ is not elliptic. In the case of time-symmetric data, Kij = 0, the DEC implies ScalM 0 and marginally trapped surfaces are simply minimal surfaces. A stable compact minimal 2-surface N in a 3-manifold M with non-negative scalar curvature must satisfy Z Z 1 2 ðNÞ ¼ ScalM þ jAj2 0 2 N and hence by the Gauss–Bonnet theorem, N is diffeomorphic to a sphere or a torus. In case N is a stable minimal torus, the induced geometry is flat and the ambient curvature vanishes at N. If, in addition, N minimizes, then M is flat.
506 Geometric Analysis and General Relativity
For a compact marginally trapped surface N in M, analogous results can be proved by studying the stability operator defined with respect to the direction . Let J be the operator defined in terms of a variation of þ by J = þ . Then J ¼ N þ 2sA DA 1 1 2 A A ScalN sA s þ DA s jþ j Gþ þ 2 2 Here, sA = (1=2)hL , DA Lþ i and Gþ is the Einstein tensor evaluated on Lþ , L . We may call N stable if the real part of the spectrum of J is non-negative. A sufficient condition for N to be stable is that N is locally outermost. This can be formulated, for example, by requiring that a neighborhood of N in M contains no trapped surfaces exterior to N. In this case, assuming that the DEC holds, N is a sphere or a torus, and if the real part of the spectrum of J is positive then N is a sphere. If N is a torus, then the ambient curvature and shear vanishes at N, sA is a gradient, and N is flat. One expects that in addition, global rigidity should hold, in analogy with the minimal surface case. This is an open problem. If N satisfies the stronger condition of strict stability, which corresponds to the spectrum of J having positive real part, then N is in the interior of a hypersurface H of the ambient spacetime, with the property that it is foliated by marginally trapped surfaces (Andersson et al. 2005). If the NEC holds and N has nonvanishing shear, then H is spacelike at N. A hypersurface H with these properties is known as a dynamical horizon.
Jang’s Equation Consider a Cauchy data set (M, gij , Kij ). Extend Kij to a tensor field on M R, constant in the vertical direction. Then the equation for a graph N ¼ fðx; tÞ 2 M R;
t ¼ f ðxÞg
such that N has mean curvature equal to the trace of the projection of Kij to N with respect to the induced metric on N, is given by ! ! X ri f rj f ri rj f ij K gij ¼ 0 ½8 1 þ jrf j2 ð1 þ jrf j2 Þ1=2 i;j an equation closely related to the equation
þ = 0. Equation [8] was introduced by P S Jang (Jang 1978) as part of an attempt to generalize the inverse mean curvature flow method of Geroch from time-symmetric to general Cauchy data. Existence and regularity for Jang’s equation were proved by Schoen and Yau (1981) and used to
generalize their proof of the positive-mass theorem from the case of maximal slices to the general case. The solution to Jang’s equation is constructed as the limit of the solution to a sequence of regularized problems. The limit consists of a collection N of submanifolds of M R. In particular, component near infinity is a graph and has the same mass as M. N may contain vertical components which project onto marginally trapped surfaces in M, and in fact these constitute the only possibilities for blow-up of the sequence of graphs used to construct N. If the DEC is valid, the metric on N has non-negative scalar curvature in the weak sense that Z
ScalN 2 þ 2jrj2 > 0 N
for smooth compactly supported functions . If the DEC holds strictly, the strict inequality holds and in this case the metric on N is conformal to a metric with vanishing scalar curvature. Jang’s equation can be applied to prove existence of marginally trapped surfaces, given barriers. Let (M, gij , Kij ) be a Cauchy data set containing two compact surfaces N1 , N2 which together bound a compact region M0 in M. Suppose the surfaces N1 and N2 have þ < 0 on N1 and þ > 0 on N2 . Schoen recently proved the following result. Theorem 3 (Existence of marginally trapped surfaces). Let M0 , N1 , N2 be as above. Then there is a finite collection of compact, marginally trapped surfaces {a } contained in the interior of M0 , such that [a is homologous to N1 . If the DEC holds, then a is a collection of spheres and tori. The proof proceeds by solving a sequence of Dirichlet boundary-value problems for Jang’s equation with boundary value on N1 , N2 tending to 1 and 1, respectively. The assumption on þ is used to show the existence of barriers for Jang’s equation. Let fk be the sequence of solutions to the Dirichlet problems. Jang’s equation is invariant under renormalization fk ! fk þ ck for some sequence ck of real numbers. A Harnack inequality for the gradient of the solutions to Jang’s equation is used to show that the sequence of solutions fk , possibly after a renormalization, has a subsequence converging to a vertical submanifold of M0 R, which projects to a collection a of marginally trapped surfaces. By construction, the zero sets of the fk are homologous to N1 and N2 . The estimates on the sequence {fk } show that this holds also in the limit k ! 1. The statement about the topology of the a follows by showing, using the above-mentioned inequality for ScalN , that if DEC holds, the total Gauss curvature of each surface a is non-negative.
Geometric Analysis and General Relativity
Center of mass Since by the positive-mass theorem m > 0 unless the ambient spacetime is flat, it makes sense to consider the problem of finding an appropriate notion of center of mass. This problem was solved by Huisken and Yau who showed that under the asymptotic conditions [6] the isoperimetric problem has a unique solution if one considers sufficiently large spheres. Theorem 4 (Huisken and Yau 1996). There is an H0 > 0 and a compact region BH0 such that for each H 2 (0, H0 ) there is a unique constant mean curvature sphere SH with mean curvature H contained in MnBH0 . The spheres form a foliation. The proof involves a study of the evolution equation
507
Lorentzian prescribed mean curvature equation, is quasilinear elliptic, but it is not uniformly elliptic, which makes the regularity theory more subtle. A Bernstein principle analogous to the one for the minimal surface equation holds for the maximal surface equation [10]. Suppose that u is a solution to [10] which is defined on all of R n . Then u is an affine function (Cheng and Yau 1976). An important tool used in the proof is a Bochner type identity, originally due to Calabi, for the norm of the second fundamental form. For a hypersurface in a flat ambient space, the Codazzi equation states ri Ajk rj Aik = 0. This gives the identity Aij ¼ ri rj H þ Akm Rmi kj þ Ami Ricm j
½11
Maximal and Related Surfaces
The curvature terms can be rewritten in terms of Aij if the ambient space is flat. Using [11] to compute jAj2 gives an expression which is quadratic in rA, and fourth order in jAj, and which allows one to perform maximum principle estimates on jAj. Generalizations of this technique for hypersurfaces in general ambient spaces play an important role in the proof of regularity of minimal surfaces, and in the proof of existence for Jang’s equation as well as in the analysis of the mean curvature flow used to prove existence of round spheres. The generalization of eqn [11] is known as a Simons identity. For the case of maximal hypersurfaces of Minkowski space, it follows from further maximum principle estimates that a maximal hypersurface of Minkowski space is convex, in particular, it has nonpositive Ricci curvature. Generalizations of this technique allow one to analyze entire constant mean curvature hypersurfaces of Minkowski space. Consider a globally hyperbolic Lorentzian manifold (V, ). A C0 hypersurface is said to be weakly spacelike if timelike curves intersect it in at most one point. Call a codimension-2 submanifold V a weakly spacelike boundary if it bounds a weakly spacelike hypersurface N0 .
Let N be the hypersurface x0 = u(x1 , . . . , xn ) in Minkowski space R1þn with line element dx20 þ dx21 þ þ dx2n . Assume jruj < 1 so that N is spacelike. Then N is stationary with respect to variations of area if u solves the equation 0 1 X B ri u C ri @qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA¼ 0 ½10 2 i 1 jruj
Theorem 5 (Existence for Plateau problem for maximal surfaces (Bartnik 1988)). Let V be a globally hyperbolic spacetime and assume that the causal structure of V is such that the domain of dependence of any compact domain in V is compact. Given a weakly spacelike boundary in V, there is a weakly spacelike maximal hypersurface N with as its boundary. N is smooth except possibly on null geodesics connecting points of .
N maximizes area with respect to compactly supported variations, and hence is called a maximal surface. As in the case of the minimal surface equation, eqn [10] and more generally the
Here, maximal hypersurface is understood in a weak sense, referring to stationarity with respect to variations. Due to the nonuniform ellipticity for the maximal surface equation, the interior regularity
dx ¼ ðH HÞ ds
½9
is the average mean curvature. This is the where H gradient flow for the isoperimetric problem of minimizing area keeping the enclosed volume constant. The solutions in Euclidean space are standard spheres. Equation [9] defines a parabolic system, in particular we have d H ¼ H þ ðRicð; Þ þ jAj2 ÞðH HÞ ds It follows from the fall-off conditions [6] that the foliation of spheres constructed in Theorem 4 are untrapped surfaces. They can therefore be used as outer barriers in the existence result for marginally trapped surfaces, (Theorem 3). The mean curvature flow for a spatial hypersurface in a Lorentz manifold is also parabolic. This flow has been applied to construct constant mean curvature Cauchy hypersurfaces in spacetimes.
508 Geometric Analysis and General Relativity
which holds for minimal surfaces fails to hold in general for the maximal surface equation. A time-oriented spacetime is said to have a crushing singularity to the past (future) if there is a sequence n of Cauchy surfaces so that the mean curvature function Hn of n diverges uniformly to 1(1). Theorem 6 (Gerhardt 1983). Suppose that (V, ) is globally hyperbolic with compact Cauchy surfaces and satisfies the SEC. Then if (V, ) has crushing singularities to the past and future it is globally foliated by constant mean curvature hypersurfaces. The mean curvature of these Cauchy surfaces is a global time function. The proof involves an application of results from geometric measure theory to an action E of the form discussed earlier. A barrier argument is used to control the maximizers. Bartnik (1984, theorem 4.1) gave a direct proof of existence of a constant mean curvature (CMC) hypersurface, given barriers. If the spacetime (V, ) is symmetric, so that a compact Lie group acts on V by isometries, then CMC hypersurfaces in V inherit the symmetry. Theorem 6 gives a condition under which a spacetime is globally foliated by CMC hypersurfaces. In general, if the SEC holds in a spatially compact spacetime, then for each 6¼ 0, there is at most one constant mean curvature Cauchy surface with mean curvature . In case V is vacuum, RicV = 0, and 3 þ 1 dimensional, then each point x 2 V is on at most one hypersurface of constant mean curvature unless V is flat and splits as a metric product. There are vacuum spacetimes with compact Cauchy surface which contain no CMC hypersurface (Chrusciel et al. 2004). The proof is carried out by constructing Cauchy data, using a gluing argument, on the connected sum of two tori, such that the resulting Cauchy data set (M, gij , Kij ) has an involution which reverses the sign of Kij . The involution extends to the maximal vacuum development V of the Cauchy data set. Existence of a CMC surface in V gives, in view of the involution, barriers which allow one to construct a maximal Cauchy surface homeomorphic to M. This leads to a contradiction, since the connected sum of two tori does not carry a metric of positive scalar curvature, and therefore, in view of the constraint equations, cannot be imbedded as a maximal Cauchy surface in a vacuum spacetime. The maximal vacuum development V is causally geodesically incomplete. However, in view of the existence proof for CMC Cauchy surfaces (cf. Theorem 6), these spacetimes cannot have a crushing singularity. It would be interesting to settle the open question whether there are stable examples of this type. In the case of a spacetime V which has an expanding end, one does not expect in general that
the spacetime is globally foliated by CMC hypersurfaces even if V is vacuum and contains a CMC Cauchy surface. This expectation is based on the phenomenon known as the collapse of the lapse; for example, the Schwarzschild spacetime does not contain a global foliation by maximal Cauchy surfaces (Beig and Murchadha 1998). However, no counterexample is known in the spatially compact case. In spite of these caveats, many examples of spacetimes with global CMC foliations are known, and the CMC condition, or more generally prescribed mean curvature, is an important gauge condition for general relativity. Some examples of situations where global constant or prescribed mean curvature foliations are known to exist in vacuum or with some types of matter are spatially homogeneous spacetimes, and spacetimes with two commuting Killing fields. Small data global existence for the Einstein equations with CMC time gauge have been proved for spacetimes with one Killing field, with Cauchy surface a circle bundle over a surface of genus > 1, by Choquet-Bruhat and Moncrief. Further, for (3 þ 1)-dimensional spacetimes with Cauchy surface admitting a hyperbolic metric, small data global existence in the expanding direction has been proved by Andersson and Moncrief. See Andersson (2004) and Rendall (2002) for surveys on the Cauchy problem in general relativity.
Null Hypersurfaces Consider an asymptotically flat spacetime containing a black hole, that is, a region B such that future causal curves starting in B cannot reach observers at infinity. The boundary of the trapped region is called the event horizon H. This is a null hypersurface, which under reasonable conditions on causality has null generators which are complete to the future. Due to the completeness, assuming that H is smooth, one can use the Raychaudhuri equation [7] to show that the null expansion þ of a spatial cross section of H must satisfy þ 0, and hence that the area of cross sections of H grows monotonously to the future. A related statement is that null generators can enter H but may not leave it. This was first proved by Hawking for the case of smooth horizons, using essentially the Raychaudhuri equation. In general H can fail to be smooth. However, from the definition of H as the boundary of the trapped region it follows that it has support hypersurfaces, which are past light cones. This property allows one to prove that H is Lipschitz and hence smooth almost everywhere. At smooth points of H, the calculations in the proof of
Geometric Analysis and General Relativity
Hawking apply, and the monotonicity of the area of cross sections follows. Theorem 7 (Area theorem (Chrusciel et al. 2001)). Let H be a black hole event horizon in a smooth spacetime (M, g). Suppose that the generators are future complete and the NEC holds on H. Let Sa , a = 1, 2, be two spacelike cross sections of H and suppose that S2 is to the future of S1 . Then A(S2 ) A(S1 ). The eikonal equation r ur u = 0 plays a central role in geometric optics. Level sets of a solution u are null hypersurfaces which correspond to wave fronts. Much of the recent progress on rough solutions to the Cauchy problem for quasilinear wave equations is based on understanding the influence of the geometry of these wave fronts on the evolution of highfrequency modes ‘in the background spacetime. In this analysis many objects familiar from general relativity, such as the structure equations for null hypersurfaces, the Raychaudhuri equation, and the Bianchi identities play an important role, together with novel techniques of geometric analysis used to control the geometry of cross sections of the wave fronts and to estimate the connection coefficients in a rough spacetime geometry. These techniques show great promise and can be expected to have a significant impact on our understanding of the Einstein equations and general relativity.
Acknowledgments The author is grateful to, among others, Greg Galloway, Gerhard Huisken, Jim Isenberg, Piotr Chrusciel, and Dan Pollack for helpful discussions concerning the topics covered in this article. The work of the author has been supported in part by the NSF under contract number DMS 0407732 with the University of Miami. See also: Computational Methods in General Relativity: The Theory; Einstein Equations: Initial Value Formulation; Einstein’s Equations with Matter; Geometric Flows and the Penrose Inequality; Hamiltonian Reduction of Einstein’s Equations; Holomorphic Dynamics; Lorentzian Geometry; Minimal Submanifolds; Mirror Symmetry: A Geometric Survey; Spacetime Topology, Causal Structure and Singularities; Stability of Minkowski Space.
509
Further Reading Andersson L (2004) The global existence problem in general relativity. In: Chrusciel P and Friedrich H (eds.) The Einstein Equations and the Large Scale Behavior of Gravitational Fields, pp. 71–120. Basel: Birkhauser. Andersson L, Galloway GJ, and Howard R (1998) A strong maximum principle for weak solutions of quasi-linear elliptic equations with applications to Lorentzian and Riemannian geometry. Communications on Pure and Applied Mathematics 51(6): 581–624. Andersson L, Mars M, and Simon W (2005) Local existence of dynamical and trapping horizons. Physical Review Letters 95: 111102. Bartnik R (1984) Existence of maximal surfaces in asymptotically flat spacetimes. Communications in Mathematical Physics 94(2): 155–175. Bartnik R (1988) Regularity of variational maximal surfaces. Acta Mathematica 161(3–4): 145–181. Beig R and Murchadha NO´ (1998) Late time behavior of the maximal slicing of the Schwarzschild black hole. Physical Review D 57(8): 4728–4737. Cai M and Galloway GJ (2000) Rigidity of area minimizing tori in 3-manifolds of nonnegative scalar curvature. Communications in Analysis and Geometry 8(3): 565–573. Shiu Yuch Cheng and Shing Tung Yau (1976) Maximal space-like hypersurfaces in the Lorentz–Minkowski spaces. Annals of Mathematics (2) 104(3): 407–419. Chrusciel PT, Delay E, Galloway GJ, and Howard R (2001) Regularity of horizons and the area theorem. Ann. Henri Poincare´ 2(1): 109–178. Chrusciel PT, Isenberg J, and Pollack D (2004) Gluing initial data sets for general relativity. Physical Review Letters 93: 081101. Federer H (1969) Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften, Band 153. New York: Springer. Galloway GJ (2000) Maximum principles for null hypersurfaces and null splitting theorems. Annales Henri Poincare´ 1(3): 543–567. Gerhardt C (1983) H-surfaces in Lorentzian manifolds. Communications in Mathematical Physics 89(4): 523–553. Gilbarg D and Trudinger NS (1983) Elliptic partial differential equations of second order. In: Grundlehren der Mathematischen Wissenschaften (Fundamental Principles of Mathematical Sciences), 2nd edn., vol. 224. Berlin: Springer. Huisken G and Yau ST (1996) Definition of center of mass for isolated physical systems and unique foliations by stable spheres with constant mean curvature. Inventiones Mathematicae 124(1–3): 281–311. Jang PS (1978) On the positivity of energy in general relativity. Journal of Mathematical Physics 19(5): 1152–1155. Lawson HB Jr. (1980) Lectures on Minimal Submanifolds, vol. 1. Mathematics Lecture Series 9 (2nd edn). Wilmington, DE: Publish or Perish Inc. Rendall AD (2002) Theorems on existence and global dynamics for the Einstein equations. Living Reviews in Relativity 5: 62 pp. (electronic). Schoen R and Yau ST (1981) Proof of the positive mass theorem II. Communications in Mathematical Physics 79(2): 231–260. Simon L (1997) The minimal surface equation. In: Geometry, V, Encyclopaedia of Mathematical Science vol. 90, pp. 239–272. Berlin: Springer.
510 Geometric Flows and the Penrose Inequality
Geometric Flows and the Penrose Inequality H Bray, Duke University, Durham, NC, USA ª 2006 H Bray. Published by Elsevier Ltd. All rights reserved. This article was originally published as ‘‘Black holes, geometric flows, and the Penrose inequality in general relativity.’’ Notices of the American Mathematical Society, 49(2002), 1372–1381.
Introduction In a paper, R Penrose (1973) made a physical argument that the total mass of a spacetime which contains black holes with event horizons of total area pffiffiffiffiffiffiffiffiffiffiffiffiffiffi A should be at least A=16. An important special case of this physical statement translates into a very beautiful mathematical inequality in Riemannian geometry known as the Riemannian Penrose inequality. The Riemannian Penrose inequality was first proved by Huisken and Ilmanen (1997) for a single black hole and then by the author in 1999 for any number of black holes. The two approaches use two different geometric flow techniques. The most general version of the Penrose inequality is still open. A natural interpretation of the Penrose inequality is that the mass contributed by a collection of pffiffiffiffiffiffiffiffiffiffiffiffiffiffi black holes is (at least) A=16. More generally, the question ‘‘How much matter is in a given region of a spacetime?’’ is still very much an open problem. (Christodoulou and Yau 1988). In this paper, we will discuss some of the qualitative aspects of mass in general relativity, look at examples which are informative, and describe the two very geometric proofs of the Riemannian Penrose inequality. Total Mass in General Relativity
Two notions of mass which are well understood in general relativity are local energy density at a point and the total mass of an asymptotically flat spacetime. However, defining the mass of a region larger than a point but smaller than the entire universe is not very well understood at all. Suppose (M3 , g) is a Riemannian 3-manifold isometrically embedded in a (3 þ 1)-dimensional Lorentzian spacetime N 4 . Suppose that M3 has zerosecond fundamental form in the spacetime. This is a simplifying assumption which allows us to think of (M3 , g) as a ‘‘t = 0’’ slice of the spacetime. (Recall that the second fundamental form is a measure of how much M3 curves inside N 4 . M3 is also sometimes called ‘‘totally geodesic’’ since geodesics of N4 which are tangent to M3 at a point stay inside M3 forever.) The Penrose inequality (which allows for M3 to have general second fundamental form) is known as the
Riemannian Penrose inequality when the second fundamental form is set to zero. We also want to only consider (M3 , g) that are asymptotically flat at infinity, which means that for some compact set K, the ‘‘end’’ M3 nK is diffeomorphic to R3 nB1 (0), where the metric g is asymptotically approaching (with certain decay conditions) the standard flat metric ij on R3 at infinity. The simplest example of an asymptotically flat manifold is (R3 , ij ) itself. Other good examples are the conformal metrics (R3 , u(x)4 ij ), where u(x) approaches a constant sufficiently rapidly at infinity. (Also, sometimes it is convenient to allow (M3 , g) to have multiple asymptotically flat ends, in which case each connected component of M3 nK must have the property described above.) A qualitative picture of an asympotically flat 3-manifold is shown in Figure 1. The purpose of these assumptions on the asymptotic behavior of (M3 , g) at infinity is that they imply the existence of the limit Z X 1 lim ðgij;i j gii;j j Þ d m¼ 16 !1 S i;j where S is the coordinate sphere of radius , is the unit normal to S , and d is the area element of S in the coordinate chart. The quantity m is called the ‘‘total mass’’ (or ADM mass) of (M3 , g) and does not depend on the choice of asymptotically flat coordinate chart. The above equation is where many people would stop reading an article like this. But before you do, we will promise not to use this definition of the total mass in this paper. In fact, it turns out that total mass can be quite well understood with an example. Going back to the example (R3 , u(x)4 ij ), if we suppose that u(x) > 0 has the asymptotics at infinity uðxÞ ¼ a þ b=jxj þ Oð1=jxj2 Þ
½1
Figure 1 A qualitative picture of an asymptotically flat 3-manifold.
Geometric Flows and the Penrose Inequality
(and derivatives of the O(1=jxj2 ) term are O(1=jxj3 )), then the total mass of (M3 , g) is m ¼ 2ab
½2
Furthermore, suppose (M3 , g) is any metric whose ‘‘end’’ is isometric to (R3 nK, u(x)4 ij ), where u(x) is harmonic in the coordinate chart of the end (R3 n K, ij ) and goes to a constant at infinity. Then expanding u(x) in terms of spherical harmonics demonstrates that u(x) satisfies condition [1]. We will call these Riemannian manifolds (M3 , g) ‘‘harmonically flat at infinity,’’ and we note that the total mass of these manifolds is also given by eqn [2]. A very nice lemma by Schoen and Yau is that, given any > 0, it is always possible to perturb an asymptotically flat manifold to become harmonically flat at infinity such that the total mass changes less than and the metric changes less than pointwise, all while maintaining non-negative scalar curvature. Hence, it happens that to prove the theorems in this paper, we only need to consider harmonically flat manifolds! Thus, we can use eqn [2] as our definition of total mass. As an example, note that (R3 , ij ) has zero total mass. Also, note that, qualitatively, the total mass of an asymptotically flat or harmonically flat manifold is the 1=r rate at which the metric becomes flat at infinity.
511
this particle is not acted upon by external forces, then it should follow a geodesic in the spacetime. It turns out that with respect to the asymptotically flat coordinate chart, these geodesics ‘‘accelerate’’ towards the middle of the Schwarzschild metric proportional to m=r2 (in the limit as r goes to infinity). Thus, our Newtonian notion of mass also suggests that the total mass of the spacetime is m. Local Energy Density
Another quantification of mass which is well understood is local energy density. In fact, in this setting, the local energy density at each point is ¼
1 R 16
where R is the scalar curvature of the 3-manifold (which has zero-second fundamental form in the spacetime) at each point. Note that (R3 , ij ) has zero energy density at each point as well as zero total mass. This is appropriate since (R3 , ij ) is in fact a ‘‘t = 0’’ slice of Minkowski spacetime, which represents a vacuum. Classically, physicists consider 0 to be a physical assumption. Hence, from this point on, we will not only assume that (M3 , g) is asymptotically flat, but also that it has non-negative scalar curvature, R0
The Phenomenon of Gravitational Attraction
What do the above definitions of total mass have to do with anything physical? That is, if the total mass is the 1=r rate at which the metric becomes flat at infinity, what does this have to do with our realworld intuitive idea of mass? The answer to this question is very nice. Given a Schwarzschild spacetime metric m 4 2 R4 ; 1 þ dx1 þ dx22 þ dx23 2jxj 1 m=2jxj 2 2 dt 1 þ m=2jxj jxj > m=2, for example, note that the t = 0 slice (which has zero-second fundamental form) is the spacelike Schwarzschild metric m 4 3 R n Bm=2 ð0Þ; 1 þ ij 2jxj (discussed more later). Note that according to eqn [2], the parameter m is in fact the total mass of this 3-manifold. On the other hand, suppose we were to release a small test particle, initially at rest, a large distance r from the center of the Schwarzschild spacetime. If
This notion of energy density also helps us understand total mass better. After all, we can take any asymptotically flat manifold and then change the metric to be perfectly flat outside a large compact set, thereby giving the new metric zero total mass. However, if we introduce the physical condition that both metrics have non-negative scalar curvature, then it is a beautiful theorem that this is in fact not possible, unless the original metric was already (R3 , ij )! (This theorem is actually a corollary to the positive mass theorem discussed below.) Thus, the curvature obstruction of having nonnegative scalar curvature at each point is a very interesting condition. Also, notice the indirect connection between the total mass and local energy density. At this point, there does not seem to be much of a connection at all. The total mass is the 1=r rate at which the metric becomes flat at infinity, and local energy density is the scalar curvature at each point. Furthermore, if a metric is changed in a compact set, local energy density is changed, but the total mass is unaffected. The reason for this is that the total mass is ‘‘not’’ the integral of the local energy density over the manifold. In fact, this integral fails to take potential energy into account (which would be expected to
512 Geometric Flows and the Penrose Inequality
contribute a negative energy) as well as gravitational energy. Hence, it is not initially clear what we should expect the relationship between total mass and local energy density to be, so let us begin with an example. Example Using Superharmonic Functions in R3
Once again, let us return to the (R3 , u(x)4 ij ) example. The formula for the scalar curvature is R ¼ 8uðxÞ5 uðxÞ Hence, since the physical assumption of nonnegative energy density implies non-negative scalar curvature, we see that u(x) > 0 must be superharmonic (u 0). For simplicity, let us also assume that u(x) is harmonic outside a bounded set so that we can expand u(x) at infinity using spherical harmonics. Hence, u(x) has the asymptotics of eqn [1]. By the maximum principle, it follows that the minimum value for u(x) must be a, referring to eqn [1]. Hence, b 0, which implies that m 0! Thus, we see that the assumption of non-negative energy density at each point of (R3 , u(x)4 ij ) implies that the total mass is also non-negative, which is what one would hope. The Positive Mass Theorem
Why would one hope this? What would be the difference if the total mass were negative? This would mean that a gravitational system of positive energy density could collectively act as a net negative total mass. This phenomenon has not been observed experimentally, and so it is not a property that we would hope to find in general relativity. More generally, suppose we have any asymptotically flat manifold with non-negative scalar curvature, is it true that the total mass is also non-negative? The answer is yes, and this fact is know as the positive mass theorem, first proved by Schoen and Yau (1979) using minimal surface techniques and then by Witten (1981) using spinors. In the zero-second fundamental form case, the positive mass theorem is known as the Riemannian positive mass theorem and is stated below. Theorem 1 (Schoen, Yau). Let (M3 , g) be any asymptotically flat, complete Riemannian manifold with non-negative scalar curvature. Then the total mass m 0, with equality if and only if (M3 , g) is isometric to (R3 , ). Gravitational Energy
The previous example neglects to illustrate some of the subtleties of the positive mass theorem. For example, it is easy to construct asymptotically flat
manifolds (M3 , g) (not conformal to R3 ) which have zero scalar curvature everywhere and yet have ‘‘nonzero’’ total mass. By the positive mass theorem, the mass of these manifolds is positive. Physically, this corresponds to a spacetime with zero energy density everywhere which still has positive total mass. From where did this mass come? How can a vacuum have positive total mass? Physicists refer to this extra energy as gravitational energy. There is no known local definition of the energy density of a gravitational field, and presumably such a definition does not exist. The curious phenomenon, then, is that for some reason, gravitational energy always makes a non-negative contribution to the total mass of the system. Black Holes
Another very interesting and natural phenomenon in general relativity is the existence of black holes. Instead of thinking of black holes as singularities in a spacetime, we will think of black holes in terms of their horizons. For example, suppose we are exploring the universe in a spacecraft capable of traveling at any speed less than the speed of light. If we are investigating a black hole, we would want to make sure that we don’t get too close and get trapped by the ‘‘gravitational forces’’ of the black hole. In fact, we could imagine a ‘‘sphere of no return’’ beyond which it is impossible to escape from the black hole. This is called the event horizon of a black hole. However, one limitation of the notion of an event horizon is that it is very hard to determine its location. One way is to let daredevil spacecraft see how close they can get to the black hole and still escape from it eventually. The only problem with this approach (besides the cost in spacecraft) is that it is hard to know when to stop waiting for a daredevil spacecraft to return. Even if it has been 50 years, it could be that this particular daredevil was not trapped by the black hole but got so close that it will take it 1000 or more years to return. Thus, to define the location of an event horizon even mathematically, we need to know the entire evolution of the spacetime. Hence, event horizons can not be computed based only on the local geometry of the spacetime. This problem is solved (at least for the mathematician) with the notion of apparent horizons of black holes. Given a surface in a spacetime, suppose that it emits an outward shell of light. If the surface area of this shell of light is decreasing everywhere on the surface, then this is called a trapped surface. The outermost boundary of these trapped surfaces is called the apparent horizon of the black hole. Apparent horizons can be computed based on their
Geometric Flows and the Penrose Inequality
local geometry, and an apparent horizon always implies the existence of an event horizon outside of it (Hawking and Ellis 1973). Now let us return to the case we are considering in this paper where (M3 , g) is a ‘‘t = 0’’ slice of a spacetime with zero-second fundamental form. Then it is a very nice geometric fact that apparent horizons of black holes intersected with M3 correspond to the connected components of the outermost minimal surface 0 of (M3 , g). All of the surfaces we are considering in this paper will be required to be smooth boundaries of open bounded regions, so that outermost is well defined with respect to a chosen end of the manifold. A minimal surface in (M3 , g) is a surface which is a critical point of the area function with respect to any smooth variation of the surface. The first variational calculation implies that minimal surfaces have zero mean curvature. The surface 0 of (M3 , g) is defined as the boundary of the union of the open regions bounded by all of the minimal surfaces in (M3 , g). It turns out that 0 also has to be a minimal surface, so we call 0 the ‘‘outermost minimal surface.’’ A qualitative sketch of an outermost minimal surface of a 3-manifold is shown in Figure 2. We will also define a surface to be ‘‘(strictly) outer minimizing’’ if every surface which encloses it has (strictly) greater area. Note that outermost minimal surfaces are strictly outer minimizing. Also, we define a ‘‘horizon’’ in our context to be any minimal surface which is the boundary of a bounded open region. It also follows from a stability argument (using the Gauss–Bonnet theorem interestingly) that each component of an outermost minimal surface (in a 3-manifold with non-negative scalar curvature) must have the topology of a sphere. Furthermore, there is a physical argument, based on Penrose (1973), which suggests that the mass contributed by the black holes (thought of as the connected compopffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nents of 0 ) should be defined to be A0 =16,
513
where A0 is the area of 0 . Hence, the physical argument that the total mass should be greater than or equal to the mass contributed by the black holes yields the following geometric statement. The Riemannian Penrose Inequality Let (M3 , g) be a complete, smooth, 3-manifold with non-negative scalar curvature which is harmonically flat at infinity with total mass m and which has an outermost minimal surface 0 of area A0 . Then, rffiffiffiffiffiffiffiffiffi A0 m ½3 16 with equality if and only if (M3 , g) is isometric to the Schwarzschild metric m 4 3 R nf0g; 1 þ ij 2jxj outside their respective outermost minimal surfaces. The above statement has been proved by the present author, and Huisken and Ilmanen proved it when A0 is defined instead to be the area of the largest connected component of 0 . We will discuss both approaches in this paper, which are very different, although they both involve flowing surfaces and/or metrics. We also clarify that the above statement is with respect to a chosen end of (M3 , g), since both the total mass and the definition of outermost refer to a particular end. In fact, nothing very important is gained by considering manifolds with more than one end, since extra ends can always be compactified by connect summing them (around a neighborhood of infinity) with large spheres while still preserving nonnegative scalar curvature, for example. Hence, we will typically consider manifolds with just one end. In the case that the manifold has multiple ends, we will require every surface (which could have multiple connected components) in this paper to enclose all of the ends of the manifold except the chosen end. The Schwarzschild Metric
The Schwarzschild metric m 4 3 R n f0g; 1 þ ij 2jxj
Figure 2 A qualitative sketch of an outermost minimal surface of a 3-manifold.
referred to in the above statement of the Riemannian Penrose inequality, is a particularly important example to consider, and corresponds to a zerosecond fundamental form, spacelike slice of the usual (3 þ 1)-dimensional Schwarzschild metric (which represents a spherically symmetric static black hole in vacuum). The three-dimensional
514 Geometric Flows and the Penrose Inequality
Schwarzschild metrics have total mass m > 0 and are characterized by being the only spherically symmetric, geodesically complete, zero scalar curvature 3-metrics, other than (R3 , ij ). They can also be embedded in four-dimensional Euclidean space (x, y, z, w) as the set of points satisfying jðx; y; zÞj ¼ ðw2 =8mÞ þ 2m which is a parabola rotated around an S2 . This last picture allows us to see that the Schwarzschild metric, which has two ends, has a Z2 symmetry which fixes the sphere with w = 0 and j(x, y, z)j = 2m, which is clearly minimal. Furthermore, the area of this sphere is 4(2m)2 , giving equality in the Riemannian Penrose inequality.
when the flow is smooth, where R is the scalar curvature of (M3 , g), K is the Gauss curvature of the surface , and 1 and 2 are the eigenvalues of the second fundamental form of , or principle curvatures. Hence, R0 and Z
K 4
½4
(which is true for any connected surface by the Gauss–Bonnet theorem) imply d mHawking ðÞ 0 dt
½5
Furthermore,
A Brief History of the Problem The Riemannian Penrose inequality has a rich history spanning nearly three decades and has motivated much interesting mathematics and physics. In 1973, R Penrose in effect conjectured an even more general version of inequality [3] using a very clever physical argument, which we will not have room to repeat here (Penrose 1973). His observation was that a counterexample to inequality [3] would yield Cauchy data for solving the Einstein equations, the solution to which would likely violate the cosmic censor conjecture (which says that singularities generically do not form in a spacetime unless they are inside a black hole). Jang and Wald (1977), extending ideas of Geroch, gave a heuristic proof of inequality [3] by defining a flow of 2-surfaces in (M3 , g) in which the surfaces flow in the outward normal direction at a rate equal to the inverse of their mean curvatures at each point. The Hawking mass of a surface (which is supposed to estimate the total amount of energy inside the surface) is defined to be rffiffiffiffiffiffiffiffiffi Z jj 1 mHawking ðÞ ¼ 1 H2 16 16 (where jj is the area of and H is the mean curvature of in (M3 , g)) and, amazingly, is nondecreasing under this ‘‘inverse mean curvature flow.’’ This is seen by the fact that under inverse mean curvature flow, it follows from the Gauss equation and the second variation formula that rffiffiffiffiffiffiffiffiffi Z d jj 1 1 jr Hj2 mHawking ðÞ ¼ þ 2 dt 16 2 16 H2 1 þR 2K þ ð1 2 Þ2 2
rffiffiffiffiffiffiffiffiffi j0 j mHawking ð0 Þ ¼ 16 since 0 is a minimal surface and has zero mean curvature. In addition, the Hawking mass of sufficiently round spheres at infinity in the asymptotically flat end of (M3 , g) approaches the total mass m. Hence, if inverse mean curvature flow beginning with 0 eventually flows to sufficiently round spheres at infinity, inequality [3] follows from inequality [5]. As noted by Jang and Wald, this argument only works when inverse mean curvature flow exists and is smooth, which is generally not expected to be the case. In fact, it is not hard to construct manifolds which do not admit a smooth inverse mean curvature flow. The problem is that if the mean curvature of the evolving surface becomes zero or is negative, it is not clear how to define the flow. For 20 years, this heuristic argument lay dormant until the work of Huisken and Ilmanen in 1997. With a very clever new approach, Huisken and Ilmanen discovered how to reformulate inverse mean curvature flow using an energy minimization principle in such a way that the new generalized inverse mean curvature flow always exists. The added twist is that the surface sometimes jumps outward. However, when the flow is smooth, it equals the original inverse mean curvature flow, and the Hawking mass is still monotone. Hence, as will be described in the next section, their new flow produced the first complete proof of inequality [3] for a single black hole. Coincidentally, the author found another proof of inequality [3], submitted in 1999, which works for any number of black holes. The approach involves flowing the original metric to a Schwarzschild metric (outside the horizon) in such a way that the area of the outermost minimal surface does not change and the
Geometric Flows and the Penrose Inequality
total mass is nonincreasing. Then, since the Schwarzschild metric gives equality in inequality [3], the inequality follows for the original metric. Fortunately, the flow of metrics which is defined is relatively simple, and in fact stays inside the conformal class of the original metric. The outermost minimal surface flows outwards in this conformal flow of metrics, and encloses any compact set (and hence all of the topology of the original metric) in a finite amount of time. Furthermore, this conformal flow of metrics preserves nonnegative scalar curvature. We will describe this approach later in the paper. Other contributions on the Penrose conjecture have also been made by Herzlich using the Dirac operator which Witten used to prove the positive mass theorem, by Gibbons in the special case of collapsing shells, by Tod, by Bartnik for quasispherical metrics, and by the present author using isoperimetric surfaces. There is also some interesting work of Ludvigsen and Vickers using spinors and Bergqvist, both concerning the Penrose inequality for null slices of a spacetime.
Inverse Mean Curvature Flow Geometrically, Huisken and Ilmanen’s idea can be described as follows. Let (t) be the surface resulting from inverse mean curvature flow for time t beginning with the minimal surface 0 . Define (t) to be the outermost minimal area in the flow, enclosure of (t). Typically, (t) = (t) but in the case that the two surfaces are not equal, immediately replace (t) with (t) and then continue flowing by inverse mean curvature. An immediate consequence of this modified flow is is always non-negative that the mean curvature of (t) by the first variation formula, since otherwise (t) would be enclosed by a surface with less area. This is because if we flow a surface in the outward direction with speed , the first variation of the area R is H , where H is the mean curvature of . Furthermore, by stability, it follows that in the regions where (t) has zero mean curvature, it is always possible to flow the surface out slightly to have positive mean curvature, allowing inverse mean curvature flow to be defined, at least heuristically at this point. Furthermore, the Hawking mass is still monotone under this new modified flow. Notice that when (t) jumps outwards to (t), Z Z 2 H H2 ðtÞ
ðtÞ
515
since (t) has zero mean curvature where the two surfaces do not touch. Furthermore, jðtÞj ¼ jðtÞj since (this is a neat argument) j(t)j j(t)j (since (t) is a minimal area enclosure of (t)) and we cannot have j(t)j < j(t)j since (t) would have jumped outwards at some earlier time. This is only a heuristic argument, but we can then see that the Hawking mass is nondecreasing during a jump by the above two equations. This new flow can be rigorously defined, always exists, and the Hawking mass is monotone. Huisken and Ilmanen define (t) to be the level sets of a scalar valued function u(x) defined on (M3 , g) such that u(x) = 0 on the original surface 0 and satisfies ru ¼ jruj ½6 div jruj in an appropriate weak sense. Since the left-hand side of the above equation is the mean curvature of the level sets of u(x) and the right-hand side is the reciprocal of the flow rate, the above equation implies inverse mean curvature flow for the level sets of u(x) when jru(x)j 6¼ 0. Huisken and Ilmanen use an energy minimization principle to define weak solutions to eqn [6]. Equation [6] is said to be weakly satisfied in by the locally Lipschitz function u if for all locally Lipschitz v with {v 6¼ u} , Ju ðuÞ Ju ðvÞ where Ju ðvÞ :¼
Z
jrvj þ vjruj
It can then be seen that the Euler–Lagrange equation of the above energy functional yields eqn [6]. In order to prove that a solution u exists to the above two equations, Huisken and Ilmanen regularize the degenerate elliptic equation 6 to the elliptic equation 0 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ru B C div@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA ¼ jruj2 þ 2 jruj2 þ 2 Solutions to the above equation are then shown to exist using the existence of a subsolution, and then taking the limit as goes to zero yields a weak solution to eqn [6]. There are many details which we are skipping here, but these are the main ideas. As it turns out, weak solutions u(x) to eqn [6] often have flat regions where u(x) equals a constant. Hence, the level sets (t) of u(x) will be
516 Geometric Flows and the Penrose Inequality
discontinuous in t in this case, which corresponds to the ‘‘jumping out’’ phenomenon referred to at the beginning of this section. We also note that since the Hawking mass of the level sets of u(x) is monotone, this inverse mean curvature flow technique not only proves the Riemannian Penrose inequality, but also gives a new proof of the positive mass theorem in dimension 3. This is seen by letting the initial surface be a very small, round sphere (which will have approximately zero Hawking mass) and then flowing by inverse mean curvature, thereby proving m 0. The Huisken and Ilmanen inverse mean curvature flow also seems ideally suited for proving Penrose inequalities for 3-manifolds which have R 6 and which are asymptotically hyperbolic. This situation occurs if (M3 , g) is chosen to be a constant mean curvature slice of the spacetime or if the spacetime is defined to solve the Einstein equation with nonzero cosmological constant. In these cases, there exists a modified Hawking mass which in monotone under inverse mean curvature flow which is the usual Hawking mass plus 4(jj=16)3=2 . However, because the monotonicity of the Hawking mass relies on the Gauss–Bonnet theorem, these arguments do not work in higher dimensions, at least so far. Also, because of the need for eqn [4], inverse mean curvature flow only proves the Riemannian Penrose inequality for a single black hole. In the next section, we present a technique which proves the Riemannian Penrose inequality for any number of black holes, and which can likely be generalized to higher dimensions.
The Conformal Flow of Metrics Given any initial Riemannian manifold (M3 , g0 ) which has non-negative scalar curvature and which is harmonically flat at infinity, we will define a continuous, one-parameter family of metrics (M3 , gt ), 0 t < 1. This family of metrics will converge to a three-dimensional Schwarzschild metric and will have other special properties which will allow us to prove the Riemannian Penrose inequality for the original metric (M3 , g0 ). In particular, let 0 be the outermost minimal surface of (M3 , g0 ) with area A0 . Then, we will also define a family of surfaces (t) with (0) = 0 such that (t) is minimal in (M3 , gt ). This is natural since as the metric gt changes, we expect that the location of the horizon (t) will also change. Then, the interesting quantities to keep track of in this flow are A(t), the total area of the horizon (t) in (M3 , gt ), and m(t), the total mass of (M3 , gt ) in the chosen end.
In addition to all of the metrics gt having nonnegative scalar curvature, we will also have the very nice properties that A0 ðtÞ ¼ 0 m0 ðtÞ 0 for all t 0. Then, since (M3 , gt ) converges to a Schwarzschild metric (in an appropriate sense) which gives equality in the Riemannian Penrose inequality as described in the introduction, rffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffi Að1Þ Að0Þ mð0Þ mð1Þ ¼ ¼ ½7 16 16 which proves the Riemannian Penrose inequality for the original metric (M3 , g0 ). The hard part, then, is to find a flow of metrics which preserves nonnegative scalar curvature and the area of the horizon, decreases total mass, and converges to a Schwarzschild metric as t goes to infinity. The Definition of the Flow
In fact, the metrics gt will all be conformal to g0 . This conformal flow of metrics can be thought of as the solution to a first-order ODE in t defined by eqns [8]–[11]. Let gt ¼ ut ðxÞ4 g0
½8
and u0 (x) 1. Given the metric gt , define ðtÞ ¼ the outermost minimal area enclosure of 0 in ðM3 ; gt Þ
½9
where 0 is the original outer minimizing horizon in (M3 , g0 ). In the cases in which we are interested, (t) will not touch 0 , from which it follows that (t) is actually a strictly outer minimizing horizon of (M3 , gt ). Then given the horizon (t), define vt (x) such that 8 v ðxÞ 0 outside ðtÞ > < g0 t vt ðxÞ ¼ 0 on ðtÞ ½10 > : lim v ðxÞ ¼ et t x!1
and vt (x) 0 inside (t). Finally, given vt (x), define Z t ut ðxÞ ¼ 1 þ vs ðxÞ ds ½11 0
so that ut (x) is continuous in t and has u0 (x) 1. Note that eqn [11] implies that the first-order rate of change of ut (x) is given by vt (x). Hence, the firstorder rate of change of gt is a function of itself, g0 , and vt (x) which is a function of g0 , t, and (t) which is in turn a function of gt and 0 . Thus, the first-order rate of change of gt is a function of t, gt , g0 , and 0 .
Geometric Flows and the Penrose Inequality
Theorem 2 Taken together, eqns [8]–[11] define a first-order ODE in t for ut (x) which has a solution which is Lipschitz in the t variable, C1 in the x variable everywhere, and smooth in the x variable outside (t). Furthermore, (t) is a smooth, strictly outer minimizing horizon in (M3 , gt ) for all t 0, and (t2 ) encloses but does not touch (t1 ) for all t2 > t1 0. Since vt (x) is a superharmonic function in (M3 , g0 ) (harmonic everywhere except on (t), where it is weakly superharmonic), it follows that ut (x) is superharmonic as well. Thus, from eqn [11] we see that limx ! 1 ut (x) = et and consequently that ut (x) > 0 for all t by the maximum principle. Then, since Rðgt Þ ¼ ut ðxÞ5 ð8g0 þ Rðg0 ÞÞut ðxÞ
½12
it follows that (M3 , gt ) is an asymptotically flat manifold with non-negative scalar curvature. Even so, it still may not seem like gt is particularly naturally defined since the rate of change of gt appears to depend on t and the original metric g0 in eqn [10]. We would prefer a flow where the rate of change of gt can be defined purely as a function of gt (and 0 perhaps), and interestingly enough this actually does turn out to be the case! The present author has proved this very important fact and defined a new equivalence class of metrics called the harmonic conformal class. Then, once we decide to find a flow of metrics which stays inside the harmonic conformal class of the original metric (outside the horizon) and keeps the area of the horizon (t) constant, then we are basically forced to choose the particular conformal flow of metrics defined above. Theorem 3 The function A(t) is constant in t and m(t) is nonincreasing in t, for all t 0. The fact that A0 (t) = 0 follows from the fact that to first order the metric is not changing on (t) (since vt (x) = 0 there) and from the fact that to first order the area of (t) does not change as it moves outward since (t) is a critical point for area in (M3 , gt ). Hence, the interesting part of Theorem 3 is proving that m0 (t) 0. Curiously, this follows from a nice trick using the Riemannian positive mass theorem, which we describe later. Another important aspect of this conformal flow of the metric is that outside the horizon (t), the manifold (M3 , gt ) becomes more and more spherically symmetric and ‘‘approaches’’ a Schwarzschild manifold (R3 n{0}, s) in the limit as t goes to 1. More precisely, Theorem 4 For sufficiently large t, there exists a diffeomorphism t between (M3 , gt ) outside the horizon (t) and a fixed Schwarzschild manifold
517
(R3 n{0}, s) outside its horizon. Furthermore, for all > 0, there exists a T such that for all t > T, the metrics gt and t (s) (when determining the lengths of unit vectors of (M3 , gt )) are within of each other and the total masses of the 2-manifolds are within of each other. Hence, rffiffiffiffiffiffiffiffiffi mðtÞ 1 lim pffiffiffiffiffiffiffiffiffi ¼ t!1 16 AðtÞ Theorem 4 is not that surprising really although a careful proof is reasonably long. However, if one is willing to believe that the flow of metrics converges to a spherically symmetric metric outside the horizon, then Theorem 4 follows from two facts. The first fact is that the scalar curvature of (M3 , gt ) eventually becomes identically zero outside the horizon (t) (assuming (M3 , g0 ) is harmonically flat). This follows from the facts that (t) encloses any compact set in a finite amount of time, that harmonically flat manifolds have zero scalar curvature outside a compact set, that ut (x) is harmonic outside (t), and eqn [12]. The second fact is that the Schwarzschild metrics are the only complete, spherically symmetric 3-manifolds with zero scalar curvature (except for the flat metric on R3 ). The Riemannian Penrose inequality, inequality [3], then follows from eqn [7] using Theorems 2–4, for harmonically flat manifolds. Since asymptotically flat manifolds can be approximated arbitrarily well by harmonically flat manifolds while changing the relevant quantities arbitrarily little, the asymptotically flat case also follows. Finally, the case of equality of the Penrose inequality follows from a more careful analysis of these same arguments. Qualitative Discussion
Figures 3 and 4 are meant to help illustrate some of the properties of the conformal flow of the metric. Figure 3 is the original metric which has a strictly outer minimizing horizon 0 . As t increases, (t) moves outwards, but never inwards. In Figure 4, we can observe one of the consequences of the fact that A(t) = A0 is constant in t. Since the metric is not changing inside (t), all of the horizons (s), 0 s t v (M 3, g0) Σ(t ) Σ(0) = Σ0
Figure 3 Original metric having a strictly outer minimizing horizon 0 :
518 Geometric Flows and the Penrose Inequality
v
(M 3, gt)
Σ(t )
mass m(t) depends on the 1/r rate at which the metric gt becomes flat at infinity (see eqn [2]), it is not surprising that direct calculation gives us that m0 ð0Þ ¼ 2ðc mð0ÞÞ
Σ(0) = Σ0
Hence, to show that m0 (0) 0, we need to show that c mð0Þ
Figure 4 Metric after time t.
have area A0 in (M3 , gt ). Hence, inside (t), the manifold (M3 , gt ) becomes cylinder-like in the sense that it is laminated (i.e., foliated but with some gaps allowed) by all of the previous horizons which all have the same area A0 with respect to the metric gt . Now let us suppose that the original horizon 0 of (M3 , g) had two components, for example. Then each of the components of the horizon will move outwards as t increases, and at some point before they touch they will suddenly jump outwards to form a horizon with a single component enclosing the previous horizon with two components. Even horizons with only one component will sometimes jump outwards, but no more than a countable number of times. It is interesting that this phenomenon of surfaces jumping is also found in the Huisken–Ilmanen approach to the Penrose conjecture using their generalized 1=H flow. Proof that m 0 (t) 0
The most surprising aspect of the flow defined earlier is that m0 (t) 0. As mentioned in that section, this important fact follows from a nice trick using the Riemannian positive mass theorem. The first step is to realize that while the rate of change of gt appears to depend on t and g0 , this is in fact an illusion. As described in detail by Bray, the rate of change of gt can be described purely in terms of gt (and 0 ). It is also true that the rate of change of gt depends only on gt and (t). Hence, there is no special value of t, so proving m0 (t) 0 is equivalent to proving m0 (0) 0. Thus, without loss of generality, we take t = 0 for convenience. Now expand the harmonic function v0 (x), defined in eqn [10], using spherical harmonics at infinity, to get ! c 1 þO v0 ðxÞ ¼ 1 þ ½13 jxj jxj2 for some constant c. Since the rate of change of the metric gt at t = 0 is given by v0 (x) and since the total
½14
In fact, counterexamples to eqn [14] can be found if we remove either of the requirements that (0) (which is used in the definition of v0 (x)) be a minimal surface or that (M3 , g0 ) have non-negative scalar curvature. Hence, we quickly see that eqn [14] is a fairly deep conjecture which says something quite interesting about manifold with non-negative scalar curvature. Well, the Riemannian positive mass theorem is also a deep conjecture which says something quite interesting about manifolds with non-negative scalar curvature. Hence, it is natural to try to use the Riemannian positive mass theorem to prove eqn [14]. Thus, we want to create a manifold whose total mass depends on c from eqn [13]. The idea is to use a reflection trick similar to one used by Bunting and Masood-ul-Alam (1987) for another purpose. First, remove the region of M3 inside (0) and then reflect the remainder of (M3 , g0 ) through (0). Define the 3 , g0 ) which resulting Riemannian manifold to be (M has two asymptotically flat ends since (M3 , g0 ) has exactly one asymptotically flat end not contained by 3 , g0 ) has non-negative scalar (0). Note that (M curvature everywhere except on (0) where the metric has corners. In fact, the fact that (0) has zero mean curvature (since it is a minimal surface) 3 , g0 ) has ‘‘distributional’’ nonimplies that (M negative scalar curvature everywhere, even on (0). This notion is made rigorous by Bray. Thus, we have used the fact that (0) is minimal in a critical way. Recall from eqn [10] that v0 (x) was defined to be the harmonic function equal to zero on (0) which goes to 1 at infinity. We want to reflect v0 (x) to be 3 , g0 ). The trick here is to define defined on all of (M 3 v0 (x) on (M , g0 ) to be the harmonic function which goes to 1 at infinity in the original end and goes to 1 at infinity in the reflect end. By symmetry, v0 (x) equals 0 on (0) and so agrees with its original definition on (M3 , g0 ). 3, The next step is to compactify one end of (M g0 ). By the maximum principle, we know that v0 (x) > 1 3, and c > 0, so the new Riemannian manifold (M 4 (v0 (x) þ 1) g0 ) does the job quite nicely and compactifies the original end to a point. In fact, the compactified point at infinity and the metric there
Geometric Flows and the Penrose Inequality
can be filled in smoothly (using the fact that (M3 , g0 ) is harmonically flat). It then follows from eqn [12] that this new compactified manifold has non-negative scalar curvature since v0 (x) þ 1 is harmonic. The last step is simply to apply the Riemannian 3 , (v0 (x) þ 1)4 positive mass theorem to (M g0 ). It is ~ not surprising that the total mass m(0) of this manifold involves c, but it is quite lucky that direct calculation yields ~ mð0Þ ¼ 4ðc mð0ÞÞ which must be positive by the Riemannian positive mass theorem. Thus, we have that ~ m0 ð0Þ ¼ 2ðc mð0ÞÞ ¼ 12mð0Þ 0
Open Questions and Applications Now that the Riemannian Penrose conjecture has been proved, what are the next interesting directions? What applications can be found? Is this subject only of physical interest, or are there possibly broader applications to other problems in mathematics? Clearly, the most natural open problem is to find a way to prove the general Penrose inequality in which M3 is allowed to have any second fundamental form in the spacetime. There is good reason to think that this may follow from the Riemannian Penrose inequality, although this is a bit delicate. On the other hand, the general positive mass theorem followed from the Riemannian positive mass theorem as was originally shown by Schoen and Yau using an idea due to Jang. For physicists, this problem is definitely a top priority since most spacetimes do not even admit zero-second fundamental form spacelike slices. Another interesting question is to ask these same questions in higher dimensions. The author is currently working on a paper to prove the Riemannian Penrose inequality in dimensions 0 we set
According to intuition, the length of a surface should be infinite, while the area of a curve should be null. These are indeed particular cases of the following implications:
H d ðEÞ :¼
!d inf 2d
X ðdiamðEj ÞÞd
½1
0
H d ðEÞ > 0 ) H d ðEÞ ¼ 1
j
0
where !d is the d-dimensional volume of the unit ball in R d whenever d is an integer (there is no canonical choice for !d when d is not an integer; a convenient one is !d = 2d ), and the infimum is taken over all countable families of sets {Ej } that cover E and whose diameters satisfy diam(Ej ) . The d-dimensional Hausdorff measure of E is H d ðEÞ :¼ lim H d ðEÞ !0
½2
(the limit exists because H d (E) is decreasing in ). Remarks (i) H d is called d-dimensional because of its scaling behavior: if E is a copy of E scaled homothetically by a factor , then H d ðE Þ ¼ d H d ðEÞ Thus, H 1 scales like the length, H 2 scales like the area, and so on. (ii) The measure H d is clearly invariant under rigid motions (translations and rotations). This implies that H d agrees on R d with the Lebesgue measure up to some constant factor; the renormalization constant !d =2d in [1] makes this factor equal to 1. Thus, H d (E) agrees with the usual d-dimensional volume for every set E in Rd , and the area formula shows that the same is true if E is (a subset of) a d-dimensional surface of class C1 in R n . (iii) Besides the Hausdorff measure, there are several other, less popular notions of d-dimensional measure: all of them are invariant under rigid motion, scale in the expected way, and agree with H d for sets contained in Rd or in a d-dimensional surface of class C1 , and yet they differ for other sets (for further details, see Federer (1996, section 2.10)). (iv) The definition of H d (E) uses only the notion of diameter, and therefore makes sense when E is a subset of an arbitrary metric space. Note that H d (E) depends only on the restriction of the metric to E, and not on the ambient space. (v) The measure H d is countably additive on the -algebra of Borel sets in Rn , but not on all sets; to avoid pathological situations, we shall always assume that sets and maps are Borel measurable.
H d ðEÞ < 1 ) H d ðEÞ ¼ 0
for d0 < d for d0 > d
Hence, the infimum of all d such that H d (E) = 0 and the supremum of all d such that H d (E) = 1 coincide. This number is called Hausdorff dimension of E, and denoted by dimH (E). For surface of class C1 , the notion of Hausdorff dimension agrees with the usual one. Example of sets with nonintegral dimension are described in the next subsection. Remarks (i) Note that H d (E) may be 0 or 1 even for d = dimH (E). (ii) The Hausdorff dimension of a set E is strictly related to the metric on E, and not just to the topology. Indeed, it is preserved under diffeomorphisms but not under homeomorphisms, and it does not always agree with the topological dimension. For instance, the Hausdorff dimension of the graph of a continuous function f : R ! R can be any number between 1 and 2 (included). (iii) For nonsmooth sets, the Hausdorff dimension does not always conform to intuition: for example, the dimension of a Cartesian product E F of compact sets does not agree in general with the sum of the dimensions of E and F. (iv) There are many other notions of dimension besides Hausdorff and topological ones. Among these, packing dimension and box-counting dimension have interesting applications (see Falconer (2003, chapters 3 and 4)). Self-Similar Fractals
Interesting examples of sets with nonintegral dimension are self-similar fractals. We present here a simplified version of a construction due to Hutchinson (Falconer 2003, chapter 9). Let {i } be a finite set of similitudes of Rn with scaling factor i < 1, and assume that there exists a bounded open set V such that the sets Vi := i (V) are pairwise disjoint and contained in V. The self-similar fractal associated with the system {i } is the compact set C that satisfies [ i ðCÞ ½3 C¼ i
The term ‘‘self-similar’’ follows by the fact that C can be written as a union of scaled copies of itself.
522 Geometric Measure Theory
The existence (and uniqueness) of such a C follows from a standard fixed-point argument applied to the S map C 7! i (C). The dimension d of C is the unique solution of the equation X di ¼ 1 ½4
More precisely, it is possible to associate with every x 2 E a k-dimensional subspace of Rn , denoted by Tan(E, x), so that for every k-dimensional surface S of class C1 in R n there holds TanðE; xÞ ¼ TanðS; xÞ
for H k -a.e. x 2 E \ S ½5
i
Formula [4] can be easily justified: if the sets i (C) are disjoint – and the assumption on V implies that this almost the case – then [3] implies H d (C) = H d (i (C)) = di H d (C), and therefore H d (C) can be positive and finite if and only if d satisfies [4]. An example of this construction is the usual Cantor set in R, which is given by the similitudes 1 ðxÞ :¼ 13x and 2 ðxÞ :¼ 23 þ 13x By [4], its dimension is d = log 2= log 3. Other examples are described in Figures 1–3. Rectifiable Sets
Given an integer k = 1, . . . , n, we say that a set E in Rn is k-rectifiable if it can be covered by a countable family of sets {Sj } such that S0 is H k -negligible (i.e., H k (S0 ) = 0) and Sj is a k-dimensional surface of class C1 for j = 1, 2, . . . Note that dimH (E) k because each Sj has dimension k. A k-rectifiable set E bears little resemblance to smooth surfaces (it can be everywhere dense!), but it still admits a suitably weak notion of tangent bundle.
V
V1
V2
V3
V4
1
λ C
Figure 1 The maps i , i = 1, . . . , 4, take the square V into the squares Vi at the corners of V. The scaling factor is for all i, hence dimH (C) = log 4=(log ). Note that dimH (C) can be any number between 0 and 2, including 1.
V1 V2 V3 ....
where Tan(S, x) is the tangent space to S at x according to the usual definition. It is not difficult to see that Tan(E, x) is uniquely determined by [5] up to an H k -negligible amount of points x 2 E, and if E is a surface of class C1 , then it agrees with the usual tangent space for H k -almost all points of E. Remarks (i) In the original definition of rectifiability, the sets Sj with j > 0 are Lipschitz images of Rk , that is, Sj := fj (Rk ), where fj : Rk ! Rn is a Lipschitz map. It can be shown that this definition is equivalent to the one above. (ii) The construction of the tangent bundle is straightforward: Let {Sj } be a covering of E as earlier, and set Tan(E, x) := Tan(Sj , x), where j is the smallest positive integer such that x 2 Sj . Then [5] is an immediate corollary of the following lemma: if S and S0 are k-dimensional surfaces of class C1 in R n , then Tan(S, x) = Tan(S0 , x) for H k -almost every x 2 S \ S0 . (iii) A set E in R n is called purely k-unrectifiable if it contains no k-rectifiable subset with positive k-dimensional measure, or, equivalently, if H k (E \ S) = 0 for every k-dimensional surface S of class C1 . For instance, every product E := E1 E2 , where E1 and E2 are H 1 -negligible sets in R is a purely 1-unrectifiable set in R2 (it suffices to show that H 1 (E \ S) = 0 whenever S is the graph of a function f : R ! R of class C1 , and this follows by the usual formula for the length of the graph). Note that the Hausdorff dimension of such product sets can be any number between 0 and 2, hence rectifiability is not related to dimension. The selfsimilar fractals described in Figures 1 and 3 are both purely 1-unrectifiable.
C
V
Rectifiable Sets with Finite Measure Figure 2 A self-similar fractal with more complicated topology. The scaling factor is 1/4 for all twelve similitudes, hence dimH (C) = log 12= log 4:
V
C V V1 2
V3
V4
Figure 3 The von Koch curve (or snowflake). The scaling factor is 1/3 for all four similitudes, hence dimH (C) = log 4= log 3:
If E is a k-rectifiable set with finite (or locally finite) k-dimensional measure, then Tan(E, x) can be related to the behavior of E close to the point x. Let B(x, r) be the open ball in Rn with center x and radius r, and let C(x, T, a) be the cone with center x, axis T – a k-dimensional subspace of R n – and amplitude = arcsin a, that is, Cðx; T; aÞ :¼ fx0 2 Rn: distðx0 x; TÞ ajx0 xjg
Geometric Measure Theory
α
where #A stands for the number of elements of A, and the Jacobian J is qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½8 JðxÞ :¼ detðrðxÞ rðxÞÞ
C(x, T, a)
x
E
T = Tan(E, x)
r
523
Figure 4 A rectifiable set E close to a point x of approximate tangency. The part of E contained in the ball B(x , r ) but not in the cone C(x, T , a) is not empty, but only small in measure.
For H k -almost every x 2 E, the measure of E \ B(x, r) is asymptotically equivalent, as r ! 0, to the measure of a flat disk of radius r, that is,
Note that the left-hand side of [7] is H k ((E)) when is injective. Remark Formula [7] holds even if E is a k-rectifiable set in R n . In this case, the gradient r(x) in [8] should be replaced by the tangential derivative of at x (viewed as a linear map from Tan(E, x) into R m ). No version of formula [7] is available when E is not rectifiable.
H k ðE \ Bðx; rÞÞ !k rk Moreover, the part of E contained in B(x, r) is mostly located close to the tangent plane Tan(E, x), that is, H k ðE \ Bðx; rÞ \ Cðx; TanðE; xÞ; aÞÞ !k rk for every a > 0 When this condition holds, Tan(E, x) is called the approximate tangent space to E at x (see Figure 4). The Area Formula
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qX ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detðL LÞ ¼ ðdet MÞ2
½6
where L is the adjoint of L (thus, L L is a linear map from Rk into Rk ), and the sum in the last term is taken over all k k minors M of the matrix associated with L. Let : Rk ! Rm be a map of class C1 with m k, and E a set in Rk . Then Z
#ð1 ðyÞ \ EÞ dH k ðyÞ ¼ ðEÞ
In this section, we review some basic notions of multilinear algebra. We have chosen a definition of k-vectors and k-covectors in Rn , and of the corresponding exterior products, which is quite convenient for computations, even though not as satisfactory from the formal viewpoint. The main drawback is that it depends on the choice of a standard basis of Rn , and therefore cannot be used to define forms (and currents) when the ambient space is a general manifold. k-Vectors and Exterior Product
The area formula allows to compute the measure H k ((E)) of the image of a set E in Rk as the integral over E of a suitably defined Jacobian determinant of . When is injective and takes values in R k , we recover the usual change of variable formula for multiple integrals. We consider first the linear case. If L is a linear map from Rk to Rm with m k, the volume ratio := H k (L(E))=H k (E) does not depend on E, and agrees with j det (PL)j, where P is any linear isometry from the image of L into R k , and det (PL) is the determinant of the k k matrix associated with PL. The volume ratio can be computed using one of the following identities: ¼
Vectors, Covectors, and Differential Forms
Z E
JðxÞ dH k ðxÞ
½7
Let {e1 , . . . , en } be the standard basis of Rn . Given an integer k n, I(n, k) is the set of all multi-indices i = (i1 , . . . , ik ) with 1 i1 < i2 < < ik n, and for every i 2 I(n, k) we introduce the expression ei ¼ ei1 ^ ei2 ^ ^ eik A k-vector in Rn is any formal linear combination i ei with i 2 R for every i 2 I(n, k). The space of k-vectors is denoted by ^k (R n ); in particular, ^1 (Rn ) = Rn . For reasons of formal convenience, we set ^0 (Rn ) := R and ^k (Rn ) := {0} for k > n. We denote by j j the Euclidean norm on ^k (R n ). The exterior product v ^ w 2 ^kþh (Rn ) is defined for every v 2 ^k (Rn ) and w 2 ^h (Rn ), and is completely determined by the following properties: (1) associativity, (2) linearity in both arguments, and (3) ei ^ ej = ej ^ ei for every i 6¼ j and ei ^ ei = 0 for every i. Simple Vectors and Orientation
A simple k-vector is any v in ^k (Rn ) that can be written as a product of 1-vectors, that is, v ¼ v1 ^ v2 ^ ^ vk It can be shown that v is null if and only if the vectors {vi } are linearly dependent. If v is not null,
524 Geometric Measure Theory
then it is uniquely determined by the following objects: (1) the k-dimensional space M spanned by {vi }; (2) the orientation of M associated with the basis {vi }; (3) the euclidean norm jvj. In particular, M does not depend on the choice of the vectors vi . Note that jvj is equal to the k-dimensional volume of the parallelogram spanned by {vi }. Hence, the map v 7! M is a one-to-one correspondence between the class of simple k-vectors with norm jvj = 1 and the Grassmann manifold of oriented k-dimensional subspaces of Rn . This remark paves the way to the following definition: if S is a k-dimensional surface of class C1 in Rn , possibly with boundary, an orientation of S is a continuous map S : S ! ^k (Rn ) such that S (x) is a simple k-vector with norm 1 that spans Tan(S, x) for every x. With every orientation of S (if any exists) is canonically associated the orientation of the boundary @S that satisfies S ðxÞ ¼ ðxÞ ^ @S ðxÞ for every x 2 @S
½9
where (x) is the inner normal to @S at x. k-Covectors
The standard basis of the dual of Rn is {dx1 , . . . , dxn }, where dxi : Rn ! R is the linear functional that takes every x = (x1 , . . . , xn ) into the ith component xi . For every i 2 I(n, k) we set dxi ¼ dxi1 ^ dxi2 ^ ^ dxik and the space ^k (Rn ) of k-covectors consists of all formal linear combinations i dxi . The exterior product of covectors is defined as that for vectors. The space ^k (Rn ) is dual to ^k (Rn ) via the duality pairing h ; i defined by the relations hdxi ; ej i := ij (that is, 1 if i = j and 0 otherwise). Differential Forms and Stokes Theorem
A differential form of order k on R n is a map ! : R n ! ^k (Rn ). Using the canonical basis of ^k (Rn ), we can write ! as X !i ðxÞdxi !ðxÞ ¼ i2Iðn;kÞ n
where the coordinates !i are real functions on R . The exterior derivative of a k-form ! of class C1 is the (k þ 1)-form X d!ðxÞ :¼ d!i ðxÞ ^ dxi
If S is a k-dimensional oriented surface, the integral of a k-form ! on S is naturally defined by Z Z ! :¼ h!ðxÞ; S ðxÞi dH k ðxÞ S
S
Stokes theorem states that for every (k 1)-form ! of class C1 there holds Z Z !¼ d! ½10 @S
S
provided that @S is endowed with the orientation @S that satisfies [9].
Currents The definition of k-dimensional currents closely resembles that of distributions: they are the dual of smooth k-forms with compact support. Since every oriented k-dimensional surface defines by integration a linear functional on forms, currents can be regarded as generalized oriented surfaces. As every distribution admits a derivative, so every current admits a boundary. Indeed, many other basic notions of homology theory can be naturally extended to currents – this was actually one of the motivations behind the introduction of currents, due to de Rham. For the applications to variational problems, smaller classes of currents are usually considered; the most relevant to the Plateau problem is that of integral currents. Note that the definitions of the spaces of normal, rectifiable, and integral currents and the symbols used to denote them vary, sometimes more than slightly, depending on the author. Currents, Boundary, and Mass
Let n, k be integers with n k. The space of k-dimensional currents on Rn , denoted by D k (Rn ), is the dual of the space D k (Rn ) of smooth k-forms with compact support in Rn . For k 1, the boundary of a k-current T is the (k 1)-current @T defined by h@T; !i :¼ hT; d!i
for every ! 2 D k1 ðRn Þ
½11
while the boundary of a 0-current is set equal to 0. The mass of T is the number n o MðTÞ :¼ sup hT; !i: ! 2 D k ðRn Þ; j!j 1 ½12
i2Iðn;kÞ
where, for every scalar function f, df is the 1-form df ðxÞ :¼
n X @f ðxÞdxi @xi i¼1
Fundamental examples of k-currents are oriented k-dimensional surfaces: with each oriented surface S of class RC1 is canonically associated the current hT; d!i := S ! (in fact, S is completely determined by the action on forms, i.e., by the associated
Geometric Measure Theory
current). By Stokes theorem, the boundary of T is the current associated with the boundary of S; thus, the notion of boundary for currents is compatible with the classical one for oriented surfaces. A simple computation shows that M(T) = H k (S); therefore, the mass provides a natural extension of the notion of k-dimensional volume to k-currents. Remarks (i) Not all k-currents look like k-dimensional surfaces. For example, every k-vectorfield v : Rn ! ^k (Rn ) defines by duality the k-current Z hT; !i :¼ h!ðxÞ; vðxÞi dH n ðxÞ R
n
The mass of T is jvjdH , and the boundary is represented by a similar integral formula involving the partial derivatives of v (in particular, for 1-vectorfields, the boundary is the 0-current associated with the divergence of v). Note that the dimension of such T is k because k-vectorfields act on k-forms, and there is no relation with the dimension of the support of T, which is n. (ii) To be precise, D k (Rn ) is a locally convex topological vector space, and D k (R n ) is its topological dual. As such, D k (Rn ) is endowed with a dual (or weak ) topology. We say that a sequence of k-currents (Tj ) converge to T if they converge in the dual topology, that is, hTj ; !i ! hT; !i
for every ! 2 D k ðRn Þ
½13
Recalling the definition of mass, it is easy to show that it is lower-semicontinuous with respect the dual topology, and in particular lim inf MðTj Þ MðTÞ
½14
Currents with Finite Mass
By definition, a k-current T with finite mass is a linear functional on k-forms which is bounded with respect to the supremum norm, and by Riesz theorem it can be represented as a bounded measure with values in ^k (Rn ). In other words, there exist a finite positive measure on Rn and a density function : Rn ! ^k (Rn ) such that j(x)j = 1 for every x and Z hT; !i ¼ h!ðxÞ; ðxÞi d ðxÞ The fact that currents are the dual of a separable space yields the following compactness result: a
525
sequence of k-currents (Tj ) with uniformly bounded masses M(Tj ) admits a subsequence that converges to a current with finite mass. Normal Currents
A k-current T is called normal if both T and @T have finite mass. The compactness result stated in the previous paragraph implies the following compactness theorem for normal currents: a sequence of normal currents (Tj ) with M(Tj ) and M(@Tj ) uniformly bounded admits a subsequence that converges to a normal current. Rectifiable Currents
A k-current T is called rectifiable if it can represented as Z hT; !i ¼ h!ðxÞ; ðxÞi ðxÞ dH k ðxÞ E
where E is a k-rectifiable set E, is an orientation of E – that is, (x) is a simple unit k-vector that spans Tan(E, x) for H k -almost every x 2 E – and is a real R function such that E j jdH k is finite, called multiplicity. Such T is denoted by T = [E, , ]. In particular, a rectifiable 0-current can be written as hT; !i = i !(xi ), where E = {xi } is a countable set in Rn and { i } is a sequence of real numbers with j i j < þ1. Integral Currents
If T is a rectifiable current and the multiplicity
takes integral values, T is called an integer multiplicity rectifiable current. If both T and @T are integer multiplicity rectifiable currents, then T is an integral current. The first nontrivial result is the boundary rectifiability theorem: if T is an integer multiplicity rectifiable current and @T has finite mass, then @T is an integer multiplicity rectifiable current, too, and therefore T is an integral current. The second fundamental result is the compactness theorem for integral currents: a sequence of integral currents (Tj ) with M(Tj ) and M(@Tj ) uniformly bounded admits a subsequence that converges to an integral current. Remarks (i) The point of the compactness theorem for integral currents is not the existence of a converging subsequence – that being already established by the compactness theorem for normal currents – but the fact that the limit is an integral current. In fact, this result is often referred to as a ‘‘closure theorem’’ rather than a ‘‘compactness theorem.’’
526 Geometric Measure Theory
Length = 1/j 2 1/j e
1/j Ej′
Ej Ω
T := e·1Ω
Tj′:= [Ej′, e, 1]
Tj := [Ej, e, 1/j] 2
Figure 5 T is the normal 1-current on R associated with the vectorfield equal to the unit vector e on the unit square , and equal to 0 outside. Tj are the rectifiable currents associated with the sets Ej (middle) and the constant multiplicity 1=j, and then M(Tj ) = 1, M(@Tj ) = 2. Tj0 are the integral currents associated with the sets Ej0 (left) and the constant multiplicity 1, and then M(Tj0 ) = 1, M(@Tj0 ) = 2j 2 . Both (Tj ) and (Tj0 ) converge to T.
(ii) The following observations may clarify the role of assumptions in the compactness theorem: (1) a sequence of integral currents (Tj ) with M(Tj ) uniformly bounded – but not M(@Tj ) – may converge to any current with finite mass, not necessarily a rectifiable one. (2) A sequence of rectifiable currents (Tj ) with rectifiable boundaries and M(Tj ), M(@Tj ) uniformly bounded may converge to any normal current, not necessarily a rectifiable one. Examples of both situations are described in Figure 5. Application to the Plateau Problem
The compactness result for integral currents implies the existence of currents with minimal mass: if is the boundary of an integral k-current in Rn , 1 k n, then there exists a current T of minimal mass among those that satisfy @T = . The proof of this existence result is a typical example of the direct method: let m be the infimum of M(T) among all integral currents with boundary , and let (Tj ) be a minimizing sequence (i.e., a sequence of integral currents with boundary such that M(Tj ) converges to m). Since M(Tj ) is bounded and M(@Tj ) = M() is constant, we can apply the compactness theorem for integral currents and extract a subsequence of (Tj ) that converges to an integral current T. By the continuity of the boundary operator, @T = lim @Tj = , and by the semicontinuity of the mass M(T) lim M(Tj ) = m (cf. [14]). Thus, T is the desired minimal current.
k 7 such that T agrees with a smooth surface in the complement of S and of the support of the boundary. In particular, T is smooth away from the boundary for n 7. For general k, it can only be proved that dimH (S) k 2 Both results are optimal: in R4 R4 , the minimal 7-current with boundary := {jxj = jyj = 1} – a product of two 3-spheres – is the cone T := {jxj = jyj 1}, and is singular at the origin. In R 2 R 2 , the minimal 2-current with boundary := {x = 0, jyj = 1} [ {y = 0, jxj = 1} – a union of two disjoint circles – is the union of the disks {x = 0, jyj 1} [ {y = 0, jxj 1}, and is singular at the origin. (iii) In certain cases, the mass-minimizing current T may not agree with the solution of the Plateau problem suggested by intuition. The first reason is that currents do not include nonorientable surfaces, which sometimes may be more convenient (Figure 6). Another reason is that the mass of an integral current T associated with a k-rectifiable set E does not agree with the measure H k (E) – called size of T – because multiplicity must be taken into account, and for certain the mass-minimizing current may be not size-minimizing (Figure 7). Unfortunately, proving the existence of size-minimizing currents is much more complicated, due to lack of suitable compactness theorems. (iv) For k = 2, the classical approach to the Plateau problem consists in parametrizing surfaces in R n by maps f from a given two-dimensional domain D into R n , and looking for minimizers of the area functional Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detðrf rf Þ D
Σ′ Γ
Σ
Figure 6 The surface with minimal area spanning the (oriented) curve is the Mo¨bius strip . However, is not orientable, and cannot be viewed as a current. The massminimizing current with boundary is 0 :
Remarks
–1
+1
(i) Every integral (k 1)-current with null boundary and compact support in Rn is the boundary of an integral current, and therefore is an admissible datum for the previous existence result. (ii) A mass-minimizing integral current T is more regular than a general integral current. For k = n 1, there exists a closed singular set S with dimH (S)
–1
+1
Γ
θ=1 θ=2 T
θ′= 1 T′ '
Figure 7 The boundary is a 0-current associated with four oriented points. The size (length) of T is smaller than that of T 0 . However, @T = implies that the multiplicity of T must be 2 on the central segment and 1 on the others; thus the mass of T is larger than its size. The size-minimizing current with boundary is T, while the mass-minimizing one is T 0 :
Geometric Measure Theory
Γ
Σ
Σ′
Figure 8 The surface minimizes the area among surfaces parametrized by the disk with boundary . The mass-minimizing current 0 can only be parametrized by a disk with a handle. Note that is a singular surface, while 0 is not.
Γ
Σ
Σ′
Figure 9 Two possible soap films spanning the wire : unlike , 0 cannot be viewed as a current with multiplicity 1 and boundary .
(recall the area formula, discussed earlier) under the constraint f (@D) = . In this framework, the choice of the domain D prescribes the topological type of admissible surfaces, and therefore the minimizer may differ substantially from the mass-minimizing current with boundary (Figure 8). (v) For some modeling problems, for instance, those related to soap films and soap bubbles, currents do not provide the right framework (Figure 9). A possible alternative are integral varifolds (cf. Almgren 2001). However, it should be pointed out that this framework does not allow for ‘‘easy’’ application of the direct method, and the existence of minimal varifolds is in general quite difficult to prove. Miscellaneous Results and Useful Tools
(i) An important issue, related to the use of currents for solving variational problems, concerns the extent to which integral currents can be approximated by regular objects. For many reasons, the ‘‘right’’ regular class to consider are not smooth surfaces, but integral polyhedral currents, that is, linear combinations with integral coefficients of oriented simplexes. The following approximation theorem holds: for every integral current T in R n there exists a sequence of integral polyhedral currents (Tj ) such that Tj ! T; @Tj ! @T MðTj Þ ! MðTÞ; Mð@Tj Þ ! Mð@TÞ The proof is based on a quite useful tool, called polyhedral deformation. (ii) Many geometric operations for surfaces have an equivalent for currents. For instance, it is possible to define the image of a current in Rn via a smooth proper map f : Rn ! Rm . Indeed, with every k-form ! on Rm is canonically associated a k-form f # ! on R n , called pullback of ! according to f. The adjoint of the
527
pullback is an operator, called push-forward, that takes every k-current T in R n into a k-current f# T in Rm . If T is the rectifiable current associated with a rectifiable set E and a multiplicity , the push-forward f# T is the rectifiable current associated with f (E) – and a multiplicity 0 (y) which is computed by adding up with the right sign all (x) with x 2 f 1 (y). As one might expect, the boundary of the push-forward is the push-forward of the boundary. (iii) In general, it is not possible to give a meaning to the intersection of two currents, and not even of a current and a smooth surface. However, it is possible to define the intersection of a normal k-current T and a level surface f 1 (y) of a smooth map f : R n ! Rh (with k h n) for almost every y, resulting in a current Ty with the expected dimension h k. This operation is called slicing. (iv) When working with currents, a quite useful notion is that of flat norm: FðTÞ :¼ inf fMðRÞ þ MðSÞ: T ¼ R þ @Sg where T and R are k-currents, and S is a (k þ 1)current. The relevance of this notion lies in the fact that a sequence (Tj ) that converges with respect to the flat norm converges also in the dual topology, and the converse holds if the masses M(Tj ) and M(@Tj ) are uniformly bounded. Hence, the flat norm metrizes the dual topology of currents (at least on sets of currents where the mass and the mass of the boundary are bounded). Since F(T) can be explicitly estimated from above, it can be quite useful in proving that a sequence of currents converges to a certain limit. Finally, the flat norm gives a (geometrically significant) measure of how far apart two currents are: for instance, given the 0currents x and y (the Dirac masses at x and y, respectively), then F(x y ) is exactly the distance between x and y. See also: Free Interfaces and Free Discontinuities: Variational Problems; -Convergence and Homogenization; Geometric Phases; Image Processing: Mathematics; Minimal Submanifolds; Mirror Symmetry: A Geometric Survey; Moduli Spaces: An Introduction.
Further Reading Almgren FJ Jr. (2001) Plateau’s Problem: An Invitation to Varifold Geometry, Revised Edition, Student Mathematical Library, vol. 13. Providence: American Mathematical Society. Falconer KJ (2003) Fractal Geometry. Mathematical Foundations and Applications, 2nd edn. Hoboken, NJ: Wiley. Federer H (1969) Geometric Measure Theory, Grundlehren der mathematischen Wissenschaften, vol. 153. Berlin: Springer. (Reprinted in the series Classics in Mathematics. Springer, Berlin, 1996).
528 Geometric Phases Federer H and Fleming WH (1960) Normal and integral currents. Annals of Mathematics 72: 458–520. Mattila P (1995) Geometry of Sets and Measures in Euclidean Spaces, Fractals and Rectifiability, Cambridge Studies in Advanced Mathematics, vol. 44. Cambridge: Cambridge University Press.
Morgan F (2000) Geometric Measure Theory. A Beginner’s Guide, 3rd edn. San Diego: Academic Press. Simon L (1983) Lectures on Geometric Measure Theory. Proceedings of the Centre for Mathematical Analysis, 3. Australian National University, Centre for Mathematical Analysis, Canberra 1983.
Geometric Phases P Le´vay, Budapest University of Technology and Economics, Budapest, Hungary ª 2006 Elsevier Ltd. All rights reserved.
Let us denote the complex amplitudes characterizing the state j i by Z , = 0, 1, . . . , n. For a normalized state, Z ¼ 1 Z Z k k2 ¼ h j i Z
Introduction We invite the reader to perform the following simple experiment. Put your arm out in front of you keeping your thumb pointing up perpendicular to your arm. Move your arm up over your head, then bring it down to your side, and at last bring the arm back in front of you again. In this experiment an object (your thumb) was taken along a closed path traced by another object (your arm) in a way that a simple local law of transport was applied. In this case the local law consisted of two ingredients: (1) preserve the orthogonality of your thumb with respect to your arm and (2) do not rotate the thumb about its instantaneous axis (i.e., your arm). Performing the experiment in this way, you will manage to avoid rotations of your thumb locally; however, in the end you will experience a rotation of 90 globally. The experiment above can be regarded as the archetypical example of the phenomenon called anholonomy by physicists and holonomy by mathematicians. In this article, we consider the manifestation of this phenomenon in the realm of quantum theory. The objects to be transported along closed paths in suitable manifolds will be wave functions representing quantum systems. After applying local laws dictated by inputs coming from physics, one ends up with a new wave function that has picked up a complex phase factor. Phases of this kind are called geometric phases, with the famous Berry phase being a special case.
The Space of Rays Let us consider a quantum system with physical states represented by elements j i of some Hilbert space H with scalar product hji:H H ! C. For simplicity, we assume that H is finite dimensional, H ’ Cnþ1 with n 1. The infinite-dimensional case can be studied by taking the inductive limit n ! 1.
½1
where summation over repeated indices is understood, indices raised and lowered by and , respectively, and the overbar refers to complex conjugation. A normalized state lies on the unit sphere S ’ S2nþ1 in Cnþ1 . Two nonzero states j i and j’i are equivalent, j i j’i, iff they are related as j i = j’i for some nonzero complex number . For equivalent states, physically meaningful quantities such as h jAj i ; h j i
jh j’ij2 k k2 k’k2
½2
(mean value of a physical quantity represented by a Hermitian operator A, transition probability from a physical state represented by j i to one represented by j’i) are invariant. Hence, the real space of states representing the physical states of a quantum system unambiguously is the set of equivalence classes P H= . P is called the ‘‘space of rays.’’ For H ’ Cnþ1 , we have P ’ CPn , where CP n is the n-dimensional complex projective space. For normalized states, j i and j’i are equivalent iff j i = j’i, where jj = 1, that is, 2 U(1). Thus, two normalized states are equivalent iff they differ merely in a complex phase. It is well known that S can be regarded as the total space of a principal bundle over P with structure group U(1). This means that we have the projection : j i 2 S H ! j ih j 2 P
½3
where the rank-1 projector j ih j represents the equivalence class of j i. Since we will use this bundle frequently in this article, we call it 1 (the meaning of the subscript 1 will be clarified later). Then, we have
1 : Uð1Þ ,! S ! P
½4
0
For Z 6¼ 0 the space of rays P can be given local coordinates wj Zj =Z0 ;
j ¼ 1; . . . ; n
½5
Geometric Phases 529
The wj are inhomogeneous coordinates for CPn on the coordinate patch U 0 defined by the condition Z0 6¼ 0. P is a compact complex manifold with a natural Riemannian metric g. This metric g is induced from the scalar product on H. Let us consider the construction of g by using the physical input provided by the invariance of the transition probability of [1]. For this we define a distance between j ih j and j’ih’j in P as follows: 2
cos ðð ; ’Þ=2Þ
jh j’ij
2
k k2 k’k2
½6
This definition makes sense since, due to the Cauchy–Schwartz inequality, the right-hand side of [6] is non-negative and 1. It is equal to 1 iff j i is a nonzero complex multiple of j’i, that is, iff they define the same point in P. Hence in this case, ( , ’) = 0 as expected. Suppose now that j i and j’i are separated by an infinitesimal distance ds ( , ’). Putting this into the definition [6], using the local coordinates wj of [5] for j i and wj þ dwj for j’i after expanding both sides using Taylor series, one gets
ds2 ¼ 4gjk dwj dwk ;
¼ 1; 2; . . . ; n j; k
½7
and j’i (not orthogonal) when these states belong to different rays? To compare the phases of nonorthogonal states belonging to different rays, Pancharatnam employed the following simple rule: two states are ‘‘in phase’’ iff their interference is maximal. In order to find the state j’i ei j’0 i from the ray spanned by the representative j’0 i which is ‘‘in phase’’ with j i, we have to find a modulo 2 for which the interference term in 2
jj þ ei ’0 jj ¼ 2ð1 þ Reðei h j’0 iÞÞ
is maximal. Obviously the interference is maximal iff ei h j’0 i is a real positive number, that is, ei ¼
h’0 j i ; jh’0 j ij
j’i ¼ j’0 i
h’0 j i jh’0 j ij
k
l wl Þjk w j wk ð1 þ w m w m Þ2 ð1 þ w
h j’i ¼ jh j’0 ij 2 Rþ
k
with dw dw . The line element [7] defines the Fubini–Study metric for P.
The Pancharatnam Connection Having defined the basic entity, the space of rays P, and the principal U(1) bundle 1 , now we define a connection giving rise to a local law of parallel transport. This approach gives rise to a very general definition of the geometric phase. In the mathematical literature, the connection defined below is called the ‘‘canonical connection’’ on the principal bundle. However, since the motivation is coming from physics, we are going to rediscover this construction using merely physical information provided by quantum theory alone. The information needed is an adaptation of Pancharatnam’s study of polarized light to quantum mechanics. Let us consider two normalized states j i and j’i. When these states belong to the same ray, then we have j i = ei j’i for some phase factor ei ; hence, the phase difference between them can be defined to be just . How to define the phase difference between j i
½11
When such j i and j’i j þ d i are infinitesimally separated, from [11] it follows that Imh jd i ¼
1 Z ¼ 0 Z dZ dZ 2i
½12
Z = Z 0 Z0 (1 þ w j wj ) = 1 due to normalwhere Z 0 0 i ization. Writing Z jZ je using [5], one obtains Imh jd i ¼ d þ A ¼ 0;
½8
½10
Hence for the state j’i ‘‘in phase’’ with j i, one has
where gjk
½9
A Im
j dwj w ½13 k wk 1þw
In order to clarify the meaning of the 1-form A, notice that the choice 1 1 j 0 i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½14 j k w kw 1þw defines a local section of the bundle 1 . In terms of this section, the state j i can be expressed as 0 Z 0 i 1 j i¼ je ½15 ¼ jZ ¼ ei j 0 i wj Zj For a path wj (t) lying entirely in U 0 P, j (t)i = ei(t) j 0 (t)i defines a path in S with a ˙ þ A = 0. For a closed (t) satisfying the equation path C, the equation above defines a (generically) open path projecting onto C by the projection . It must be clear by now that the process described is the one of parallel transports with respect to a connection with a connection 1-form !. The pullback of ! with respect to the local section in [14] is the 1-form (U(1) gauge field) A in [13]. The curve corresponding to j (t)i is the horizontal lift of C in P. The U(1) phase H i A ei½C e C ½16
530 Geometric Phases
is the holonomy of the connection. We call this connection the ‘‘Pancharatnam connection,’’ and its holonomy for a closed path in the space of rays is the geometric phase acquired by the wave function. Now the question of fundamental importance is: how to realize closed paths in P physically? This question is addressed in the following sections.
Quantum Jumps We have seen that physical states of a quantum system are represented by the space of rays P and normalized states used as representatives for such states form the total space S of a principal U(1) bundle 1 over P. Moreover, in the previous section we have realized that the physical notions of transition probability, and quantum interference naturally lead to the introduction of a Riemannian metric g and an abelian U(1) gauge field A living on P. An interesting result based on the connection between g and A concerns a nice geometric description of a special type of quantum evolution consisting of a sequence of ‘‘quantum jumps.’’ Consider two nonorthogonal rays jAihAj and jBihBj in P. Let us suppose that the system’s normalized wave function initially is jAi 2 S, and measure by the ‘‘polarizer’’ jBihBj. Then the result of this filtering measurement is jBihBjAi, or after projecting back to the set of normalized states we have the ‘‘quantum jump’’ jAi ! jBi
hBjAi jhBjAij
½17
Theorem The [17] jump can be recovered by parallel transporting the normalized state jAi according to the Pancharatnam connection along the shortest geodesic (with respect to the [8] metric), connecting jAihAj and jBihBj in P. Let us now consider a cyclic series of filtering measurements with projectors jAa ihAa j, a = 1, 2, . . . , N þ 1, where jA1 ihA1 j = jANþ1 ihANþ1 j. Prepare the system in the state jA1 i 2 S, and then subject it to the sequence of filtering measurements. Then according to the theorem, the phase hA1 jAN ihAN jAN 1 i hA2 jA1 i jhA1 jAN ihAN jAN 1 i hA2 jA1 ij
Unitary Evolutions Adiabatic Evolution
Suppose that the evolution of our quantum system with H ’ Cnþ1 is generated by a Hermitian Hamiltonian matrix depending on a set of external parameters x , = 1, 2, . . . , M. Here we assume that the x are local coordinates on some coordinate patch V of a smooth M-dimensional manifold M. We lable the eigenvalues of H(x) by the numbers r = 0, 1, 2, . . . , n, and assume that the rth eigenvalue Er (x) is nondegenerate: HðxÞjr; xi ¼ Er ðxÞjr; xi;
Now we have the following theorem:
ei ¼
the projectors jAa ihAa j and jAaþ1 ihAaþ1 j with a = 1, 2, . . . , N. It is important to realize that this filtering measurement process is not a unitary one; hence, unitarity is not essential for the geometric phase to appear. In this section we have managed to obtain closed paths in the form of geodesic polygons in P via the physical process of subjecting the initial state jA1 i to a sequence of filtering measurements. It is clear that for any type of evolution, the geodesics of the Fubini-study metric play a fundamental role since any smooth closed curve in P can be approximated by geodesic polygons. Nonunitary evolution provided by the quantum measuring process is only half of the story. In the next section, we start describing closed paths in P arising also from unitary evolutions generated by parameter-dependent Hamiltonians, the original context where geometric phases were discovered.
½18
r ¼ 0; 1; 2; . . . ; n
We assume that H(x), Er (x), jr, xi are smooth functions of x. The rank-1 spectral projectors Pr ðxÞ jr; xihr; xj;
r ¼ 0; 1; 2; . . . ; n
½20
for each r define a map fr : M ! P: fr : x 2 V M 7! Pr ðxÞ 2 P
½21
Recall now that we have the bundle 1 over P, at our disposal, and we can pull back 1 using the map fr to construct a new bundle r1 over the parameter space M. Moreover, we can define a connection on
r1 by pulling back the canonical (Pancharatnam) connection of 1 . The resulting bundle r1 is called the Berry–Simon bundle over the parameter space M. Explicitly,
picked up by the state is equal to the one obtained by parallel transporting jA1 i along a geodesic polygon consisting of the shorter arcs connecting
½19
r1 : Uð1Þ ,! r1 ! M
½22
The states jr, xi of [19] define a local section of r1 . Supressing the index r, the relationship between 1
Geometric Phases 531
and 1 can be summarized by the following diagram:
1
f
# f
M !
ih
1
#
½23
Here f denotes the pullback map, and we have 1 f (1 ). (We have denoted the total space S as 1 .) The local section of 1 arising as the pullback of [14] an 1 is given by
1 wj ðxÞ
;
x2VM
½24
with j = 1, 2, . . . , n. The pullback of the Pancharatnam connection ! on 1 is f (!). We can further pull back f (!) to V M with respect to the local section of [24] to obtain a gauge field living on the parameter space. This gauge field is called the ‘‘Berry gauge field’’ and the corresponding connection is the Berry connection. Thus,
A ¼ f ðAÞ ¼ A ðxÞdx ¼ ðAj @ wj þ Aj @ wj Þdx ½25 here @ @=@x and A is given by [13]. When we have a closed curve C in M, then f C defines a closed curve C in P. We already know that the holonomy for C in P can be written in the [16] form; hence, B ¼
I f C
A¼
I
f ðAÞ ¼ C
d jðtÞi ¼ HðtÞjðtÞi dt
I
A
½26
This formula states that there is a geometric phase picked up by the eigenstates of a parameterdependent Hermitian Hamiltonian when we change the parameters along a closed curve. Our formula shows that the geometric phase can be calculated using either the canonical connection on 1 or the Berry connection on 1 . Let us then change the parameters x adiabatically. The closed path in parameter space then defines Hamiltonians satisfying H(x(T)) = H(x(0)) for some T 2 Rþ . Moreover, there is also the associated closed curve Pr (x(T)) = Pr (x(0)) in P. The quantum adiabatic theorem states that if we prepare a state j(0)i jr, x(0)i at t = 0, which is an eigenstate of the instantaneous Hamiltonian H(x(0)), then after changing the parameters
½28
after time t, which belongs to the same eigensubspace. The point is that the theorem holds only for cases when the kinetic energy associated with the slow change in the external parameters is much smaller than the energy separation between Er (x) and Er0 (x) for all x 2 M. Under this assumption, transitions between adjacent levels are prohibited during evolution. Notice that the adiabatic theorem clearly breaks down in the vicinity of level crossings where the gap is comparable with the magnitude of the kinetic energy of the external parameters. However, if one takes it for granted that the projector Pr (t) Pr (x(t)) for some r satisfies the Schro¨dinger–von Neumann equation ih
d Pr ðtÞ ¼ ½HðtÞ; Pr ðtÞ dt
½29
by virtue of [19], we get zero for the right-hand side. This means that Pr (t) is constant; hence, the curve in P degenerates to a point. The upshot of this is that exact adiabatic cyclic evolutions do not exist. It can be shown, however, that under certain conditions one can find an initial state j(0)i 6¼ jr, x(0)i that is ‘‘close enough’’ to Pr (x(t)) = jr, x(t)ihr, x(t)j. Then, we can say that the projector analog of [28] only approximately holds jðtÞihðtÞj ’ jr; xðtÞihr; xðtÞj
C
½27
takes the form jðtÞi ¼ jr; xðtÞieir ðtÞ
P
1 jr; xi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k ðxÞwk ðxÞ 1þw
infinitely slowly, the time evolution generated by the time-dependent Schro¨dinger equation
½30
This means that the use of the bundle picture for the generation of closed curves for P via the adiabatic evolution can merely be used as an approximation. Berry’s Phase
The straightforward calculation after substituting [28] into [27] shows that Z i T expðir ðTÞÞ exp Er ðtÞdt h 0 I exp i AðrÞ ½31 C
where C is a closed curve lying entirely in V M. The first phase factor is the dynamical and the second is the celebrated Berry phase. Notice that the index r labeling the eigensubspace in question
532 Geometric Phases
should now be included in the definition of A (see eqn [25]). As an explicit example, let us take the Hamiltonian HðXðtÞÞ ¼ !0 J XðtÞ; X 2 R3 ;
!0
Bge ; 2mc
jXj ¼ 1
½32
where e, m, and g are the charge, mass, and Lande´ factor of a particle, c is the speed of light, and B is the (constant) magnitude of an applied magnetic field. The three components of J are (2J þ 1) (2J þ 1)-dimensional spin matrices satisfying J J = i hJ. The Hamiltonian (eqn [32]) describes a spin J particle moving in a magnetic field with slowly varying direction. It is obvious that the parameter space is a 2-sphere. Introducing polar coordinates 0 < , 0 < 2 for the patch V of S2 excluding the south pole, we have x1 , x2 . As an illustration, let us consider the spin 1/2 case. Then H can be expressed in terms of the 2 2 Pauli matrices. The eigenvalues are E0 = !0 h=2 and E1 = !0 h=2 (r = 0, 1). For the ground state, the mapping f0 of [21] from V M ’ S2 to P ’ CP 1 is given by i ½33 wð; Þ tan e 2 2
which is stereographic projection of S from the south pole onto the complex plane corresponding to the coordinate patch U 0 CP 1 . Using [13] and [25], one can calculate the pullback gauge field and its curvature F (0) dA(0) , where 1 Að0Þ ¼ ð1 cos Þd ; 2
1 F ð0Þ ¼ sin d ^ d ½34 2
Notice that F (0) is the field strength of a magnetic monopole of strength 1/2 living on M. Using Stokes theorem, from [26] one can calculate Berry’s phase I Z 1 ð0Þ ½C ¼ Að0Þ ¼ F ð0Þ ¼ ½C ½35 2 C S where S is the surface bounded by the loop C and [C] is the solid angle subtended by the curve C at X = 0. The above result can be generalized for arbitrary spin J. Then, we have the eigenvalues Er = !0 h( J r), where 0 r 2J. The final result in this case is ðrÞ ½C ¼ ð J rÞ½C;
0 r 2J
½36
The Aharonov–Anandan Phase
We have seen that the quantum adiabatic theorem can only be used approximately for generating
closed curves in P. This section, describes as to how such curves can be generated exactly. Let us consider the Schro¨dinger equation with a time-dependent Hamiltonian (eqn [27]). Then we call its solution j(t)i cyclic if the state of the system returns, after a period T, to its original state. This means that the projector j(t)ih(t)j traverses a closed path C in P. In order to realize this situation, we have to find solutions of [27] for which j(T)i = ei j(0)i for some . Taking for granted the existence of such a solution, let us first explore its consequences. First, we remove the dynamical phase from the cyclic solution j(t)i Z t i 0 0 0 j ðtÞi exp hðt ÞjHðtÞjðt Þidt jðtÞi ½37 h 0 Then, j (t)i satisfies [12], that is, it defines a unique horizontal lift of the closed curve C in P. Following the same steps as in section describing the Pancharatnam condition, we see that the phase AA ½C ¼
I
A C
1 ¼ þ h
Z
T
hðtÞjHðtÞjðtÞi dt
½38
0
is purely geometric in origin. It is called the Aharonov–Anandan (AA) phase. Let us now turn back to the question of finding cyclic states satisfying j(T)i = ei j(0)i. One possible solution is as follows. Suppose that H depends on time through some not necessarily slowly changing parameters x. Let us find a partner Hamiltonian h for our H by defining a smooth mapping : M ! M, such that hðxÞ Hð ðxÞÞ;
x2VM
½39
For the special class we study here, the cyclic vectors are eigenvectors of h(x). Hence, the projectors pr and Pr of h and H are related as pr (x) = Pr ( (x)); this means that we have a map gr : M ! P, gr fr : x 2 V M ! pr ðxÞ 2 P
½40
which associates with every x an eigenstate of h(x). Moreover, gr associates with a closed curve C in M a closed curve C in P. Notice that generically [h(x), H(x)] 6¼ 0; hence, cyclic states are not eigenstates of the instanteneous Hamiltonian. It should be clear by now that we can repeat the construction as discussed in the adiabatic case with gr replacing fr . In particular, we can construct a new bundle 1 over the parameter space via the usual
Geometric Phases 533
pullback procedure. More precisely, we have the corresponding diagram 1
g
#
1
g
M !
#
½41
P
The AA connection can be obtained by pulling back the Pancharatnam connection: a g ðAÞ ¼ f ðAÞ ¼ ðAÞ
½42
where the last equality relates the AA connection with the Berry connection. Now the AA phase is I I I AA ¼ A ¼ g ðAÞ ¼ a ½43 gC
C
C
As an example, let us take the Hamiltonian [32] with the curve C on M S2 : XðtÞ ¼ ðsin cosð þ !tÞ;sin sinð þ !tÞ; cos Þ ½44 Here and are the polar coordinates of a fixed point in S2 where the motion starts. The curve C is a circle of fixed latitude and is traversed with an arbitrary speed. This model can be solved exactly and it can be shown that the mapping s : S2 ! S2 is given by u s
: ðu; Þ 7! pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ; s2 2us þ 1 u cos ;
s
! !0
½45
One can prove that for 0 s < 1, s is a diffeomorphism. In the s ! 0 (the adiabatic) limit, the mapping gr,s fr,s s is continuously deformed to fr . Moreover, h(x) as defined above commutes with the time evolution operator; hence, cyclic states are indeed eigenstates of h(x). Using [42], [43], and [45], the explicit form of s , we get for the AA phase u s ðr;sÞ AA ½C ¼ 2ðJ rÞ 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½46 s2 2us þ 1 In the adiabatic limit, the result goes to 2(J r) (1 u) which is just (J r) times the solid angle of the path of fixed latitude, as it has to be.
Generalization In the sequence of examples, we have shown that geometric phases are related to the geometric structures on the bundle 1 . The Berry and AA phases are special cases arising from Pancharatnam’s phase via a pullback procedure with respect to suitable maps
defined by the physical situation in question. Hence, the Pancharatnam connection in this sense is universal. The root of this universality rests in a deep theorem of mathematics concerning the existence of universal bundles and their universal connections. In order to elaborate the insight provided by this theorem into the geometry of quantum evolution, let us first make a further generalization. In our study of time-dependent Hamiltonians we have assumed that the eigenvalues of [19] were nondegenerate. Let us now relax this assumption. Fix an integer N 1, the degeneracy of the eigensubspace corresponding to the eigenvalue Er . One can then form a U(N) principal bundle N over M, furnished with a connection, that is a natural generalization of the Berry connection. The pullback of this connection to a patch of M is a U(N)-valued gauge field and its holonomy along a loop in M gives rise to a U(N) matrix generalization of the U(1) Berry phase. The natural description of this connection and its AA analog is as follows. Take the complex Grassmannian Gr(n þ 1, N) of N planes in Cnþ1 . Obviously, Gr(n þ 1, 1) P. Each point of Gr(n þ 1, N) corresponds to an N plane through the origin represented by a rank-N projector. This projector can be written in terms of N orthonormal basis vectors in an infinite number of ways. This ambiguity of choosing orthonormal frames is captured by the U(N) gauge symmetry, the analog of the U(1) (phase) ambiguity in defining a normalized state as the representative of the rank-1 projector. This bundle of frames is the Stiefel bundle V(n þ 1, N) alternatively denoted by N .V(n þ 1, N) is a principal U(N) bundle over Gr(n þ 1, N) equipped with a canonical connection !N which is the U(N) analog of Pancharatnam’s connection. Now according to the powerful theorem of Narasimhan and Ramanan if we have a U(N) bundle N over the M-dimensional parameter space M, then there exists an integer n0 (N, M) such that for n n0 there exists a map f : M ! Gr(n þ 1, N) such that N = f (V(n þ 1, N)). Moreover, given any two such maps f and g, the corresponding pullback bundles are isomorphic if and only if f is homotopic to g. For the examples of the sections ‘‘Berry’s phase’’ and ‘‘The Aharonov–Anandan phase,’’ we have N = 1, n = 1, and M = 2. Since the maps fr and gr,s defined by the rank-1 spectral projectors of H(x) and h(x) for 0 s < 1 are homotopic, the corresponding pullback bundles 1 and 1 are isomorphic. Moreover, the Berry and AA connections are the pullbacks of the universal connection on V(n þ 1, 1) 1 which is just Pancharatnam’s connection. For the infinite-dimensional case, one can define Gr(1, N) by taking the union of the natural inclusion maps of Gr(n, N) into Gr(n þ 1, N).
534 Geophysical Dynamics
We denote this universal classifying bundle V(1, N) as . Then, we see that given an N-dimensional eigensubspace bundle over M and a map fr : x 2 M 7! Pr (x) 2 Gr(1, N) defined by the physical situation, the geometry of evolving eigensubspaces can be understood in terms of the holonomy of the pullback of the universal connection on .
dealt with in this article. There are spectacular applications such as holonomic quantum computation, the gauge kinematics of deformable bodies, quantum Hall-effect, fractional spin and statistics. The interested reader should consult the vast literature on the subject or as a first glance, the book of Shapere and Wilczek (1989).
Conclusions
See also: Fractional Quantum Hall Effect; Geometric Measure Theory; Holomorphic Dynamics; Moduli Spaces: An Introduction.
In this article, we elucidate the mathematical origin of geometric phases. We have seen that the key observation is the fact that the space of rays P represents unambiguously the physical states of a quantum system. The particular representatives of a class in P belonging to the usual Hilbert space H form (local) sections of a U(1) bundle 1 . Based on the physical notions of transition probability and interference, 1 can be furnished with extra structures: the metric and the connection, the latter giving rise to a natural definition of parallel transport. We have seen that the geodesics of P with respect to the metric play a fundamental role in approximating evolutions of any kind, giving rise to a curve in P. The geometric structures of 1 induce similar structures for pullback bundles. These bundles encapsulate the geometric details of time evolutions generated by Hamiltonians that depend on a set of parameters x belonging to a manifold M. It was shown that the famous examples of Berry and AA phases arise as an important special case in this formalism. A generalization of evolving N-dimensional subspaces based on the theory of universal connections can also be given. This shows that the basic structure responsible for the occurrence of anholonomy effects in evolving quantum systems is the universal bundle which is the bundle of subspaces of arbitrary dimension N in a Hilbert space. The important issue of applying the idea of anholonomy to physical problems has not been
Further Reading Aharonov Y and Anandan J (1987) Phase change during a cyclic evolution. Physical Review Letters 58: 1593–1596. Benedict MG and Fehe´r LGy (1989) Quantum jumps, geodesics, and the topological phase. Physical Review D 39: 3194–3196. Berry MV (1984) Quantal phase factors accompanying adiabatic changes. Proceedings of the Royal Society A 392: 45–57. Berry MV (1987) The adiabatic phase and Pancharatnam’s phase for polarized light. Journal of Modern Optics 34: 1401–1407. Bohm A and Mostafazadeh A (1994) The relation between the Berry and the Anandan–Aharonov connections for UðNÞ bundles. Journal of Mathematical Physics 35: 1463–1470. Kobayashi S and Nomizu K (1969) Foundations of Differential Geometry, vol. 2. Berlin: Springer. Le´vay P (1990) Geometrical description of SU(2) geometric phases. Physical Review A 41: 2837–2840. Narasimhan MS and Ramanan S (1961) Existence of universal connections. American Journal of Mathematics 83: 563–572. Pancharatnam S (1956) Generalized theory of interference, and its applications. The Proceedings of the Indian Academy of Sciences XLIV. No 5. Sec. A: 247–262. Samuel J and Bhandari R (1988) General setting for Berry’s phase. Physical Review Letters 60: 2339–2342. Shapere A and Wilczek F (eds.) (1989) Geometric Phases in Physics. Wiley. Simon B (1983) Holonomy, the quantum adiabatic theorem, and Berry’s phase. Physical Review Letters 51: 2167–2170. Wilczek F and Zee A (1984) Appearance of gauge structure in simple dynamical systems. Physical Review Letters 52: 2111–2114.
Geophysical Dynamics M B Ziane, University of Southern California, Los Angeles, CA, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction The equations of geophysical fluid dynamics are the equations governing the motion of the atmosphere and the ocean, and are derived from the conservation equations from physics, namely conservation of mass,
momentum, energy, and some other components such as salt for the ocean, humidity (or chemical pollutants) for the atmosphere. The first assumption used in any circulation model is the well-accepted Boussinesq approximation, that is, the density differences are neglected in the system except in the buoyancy term and in the equation of state. The resulting system is the socalled Boussinesq equations (Pedlosky 1987). Due to the extremely high accuracy of this approximation, these equations are considered as the basic equations
Geophysical Dynamics
in geophysical dynamics. From the computational point of view, however, the Boussinesq equations are still not accessible. Owing to the difference of sizes of the vertical and horizontal dimensions, both in the atmosphere and in the ocean (10–20 km versus several thousands of kilometers), the second approximation is based on the smallness of the vertical length scales with respect to the horizontal length scales, that is, oceans (and the atmosphere) compose very thin layers. The scale analysis ensures that the dominant forces in the vertical-momentum equation come from the pressure gradient and the gravity. This leads to the so-called hydrostatic approximation, which amounts to replacing the vertical component of the momentum equation by the hydrostatic balance equation, and hence leading to the well-accepted primitive equations (PEs) (Washington and Parkinson 1986). As far as we know, the primitive equations were first considered by L F Richardson (1922); when it appeared that they were still too complicated they were left out and, instead, attention was focused on even simpler models, the geostrophic and quasigeostrophic models, considered in the late 1940s by J von Neumann and his collaborators, in particular J G Charney. With the increase of computing power, interest eventually returned to the PEs, which are now the core of many global circulation models (GCMs) or ocean global circulation models (OGCMs), available at the National Center for Atmospheric Research (NCAR) and elsewhere. GCMs and OGCMs are very complex models which contain many components, but still, the PEs are the central component for the dynamics of the air or the water. Further approximations based on the fast rotation of the Earth implying the smallness of the Rossby number lead to the quasigeostrophic and goestrophic equations (Pedlosky 1987). The mathematical study of the PEs was initiated by Lions, Temam, and Wang in the early 1990s. They produced a mathematical formulation of the PEs which resembles that of the Navier–Stokes due to Leray, and obtained the existence, for all time, of weak solutions (see Lions et al. 1992a, b, 1993, 1995). Further works conducted during the 1990s have improved and supplemented these early results bringing the mathematical theory of the PEs to that of the three-dimensional incompressible Navier–Stokes equations (Constantin and Foias 1998, Teman 2001). In summary, the following results are now available which will be presented in this article: 1. existence of weak solutions for all time; 2. existence of strong solutions in space dimension three, local in time; 3. existence and uniqueness of a strong solution in space dimension two, for all time; and
535
4. uniqueness of weak solutions in space dimension two.
The PEs of the Ocean The ocean is made up of a slightly compressible fluid subject to a Coriolis force. The full set of equations of the large-scale ocean are the following: the conservation of momentum equation, the continuity equation (conservation of mass), the thermodynamics equation, the equation of state and the equation of diffusion for the salinity S:
dV 3 þ 2 W V 3 þ r3 p þ g ¼ D dt d þ div3 V 3 ¼ 0 dt dT ¼ QT dt dS ¼ QS dt ¼ f ðT; S; pÞ
½1 ½2 ½3 ½4 ½5
Here V 3 is the three-dimensional velocity vector, V 3 = (u, v, w), , p, T are respectively, the density, pressure, and temperature, and S is the concentration of salinity; g = (0, 0, g) is the gravity vector, D the molecular dissipation, QT and QS are the heat and salinity diffusions, respectively. Remark 1 The equation of state for the oceans is derived on a phenomenological basis. Only empirical forms of the function f (T, S, ) are known (see Washington and Parkinson (1986)). It is natural, however, to expect that decreases if T increases and that increases if S increases. The simplest law is ¼ 0 ð1 T ðT Tr Þ þ S ðS Sr ÞÞ
½6
corresponding to a linearization around reference values 0 , Tr , Sr of respectively, the density, temperature, and the salinity, T and S are positive expansion coefficients. The Mach number for the flow in the ocean is not large and, therefore, as a starting point, we can make the so-called Boussinesq approximation in which the density is assumed constant, = 0 , except in the buoyancy term and in the equation of state. This amounts to replacing [1], [2] by 0
dV 3 þ 20 V 3 þ r3 p þ g ¼ D dt div3 V 3 ¼ 0
½7 ½8
Furthermore, since for large-scale ocean, the horizontal scale is much larger than the vertical one, a scale analysis (Pedolsky 1987) shows that @p=@z and g are
536 Geophysical Dynamics
the dominant terms in the vertical-momentum equation, leading to the hydrostatic approximation ½9
For mid-latitude regional studies, it is usual to consider the beta-plane approximation of the equations. Thus, we assume that the ocean fills a domain M" of R 3 . The top of the ocean is a domain i included in the surface of the earth Sa (sphere of radius a centered at 0). The bottom b of the ocean is defined by (z = x3 = r a), z = "h(, ’), where " > 0 is a positive parameter. It is introduced to take into consideration the smallness of the vertical scales compared to the horizontal scales. h is a function of class C2 at least on i ; it is assumed also that h is bounded from below, that is, 0 < h h(, ’) h, (, ’) 2 i . The lateral surface l consists of the part of cylinder {(, ’) 2 @i , "h(, ’) r 0}. The PEs of the ocean are given by @v @v 1 þ rv v þ w þ rp @t @z 0 þ 2 sin k v v v v div v þ
@2v ¼ Fv @z2
@w ¼0 @z
@T @T @2T þ rv T þ w T T T 2 ¼ FT @t @z @z @S @S @2S þ rv S þ w S S S 2 ¼ FS @t @z @z Z 0 v dz ¼ 0 div
@v þ v ðv va Þ ¼ v ; @z @T þ T ðT Ta Þ ¼ 0; T @z
Z
0
v ¼ 0; w ¼ 0;
w¼0
v ¼ 0; w ¼ 0;
½12
½18
@S ¼0 @z
b (z = h(, ’))
@T @S ¼ 0; ¼0 @nT @nS
On the lateral boundary (, ’) 2 @i }
½19
l = {h(, ’) < z < 0,
@T @S ¼ 0; ¼0 @nT @nS
½20
Here n = (nH , nz ) is the unit outward normal on @M" decomposed into its horizontal and vertical components; the conormal derivatives @=@nT and @=@nS are those associated with the linear (temperature and salinity) operators, @ @ ¼ T nH r þ T nz @nT @z @ @ ¼ S nH r þ S nz @nS @z
½11
½21
Equations [10]–[17] with boundary conditions [18]–[20] are supplemented with the initial conditions vjt¼0 ¼ v0 ; Tjt¼0 ¼ T0 ; Sjt¼0 ¼ S0
½13 ½14
pdz0
½15
¼ 0 ð1 T ðT Tr Þ þ S ðS Sr ÞÞ Z S dM" ¼ 0
½16
P ¼ PðT; SÞ ¼ g
At the bottom of the ocean
½10
h
p ¼ ps þ P;
i (z = 0)
v
@p ¼ g @z
@p ¼ ; @z
On the top of the ocean
z
½17
M"
where v is the horizontal velocity of the water, w is the vertical velocity, and Tr , Sr are averaged (or reference) values of T and S. The diffusion coefficients v , T , S and v , T , S are different in the horizontal and vertical directions, accounting for some eddy diffusions in the sense of Smagorinsky (1962). Note that Fv , FT , and FS correspond to volumic sources of horizontal momentum, heat, and salt, respectively.
½22
where v0 , T0 , S0 are given initial data. Following the work of Lions et al. (1992a, b, 1993, 1995) (see also Temam and Ziane (2004)), we introduce the following function spaces V = V1 V2 V3 , H = H1 H2 H3 , where Z 0 n 2 1 V1 ¼ v 2 H ðMÞ ; div vdz ¼ 0; o h v ¼ 0 on b [ l V2 ¼ H 1 ðMÞ
Z n _ 1 ðMÞ ¼ S 2 H 1 ðMÞ; V3 ¼ H n
H1 ¼ v 2 L2 ðMÞ2 ; div nH
Z
Z
o
SdM ¼ 0
M 0
v dz ¼ 0; h
0
v dz ¼ 0 on @i ði:e:; on l Þ
o
h
H2 ¼ L2 ðMÞ
Z n H3 ¼ L_ 2 ðMÞ ¼ S 2 L2 ðMÞ;
o
SdM ¼ 0
M
Boundary conditions
There are several sets of natural boundary conditions that one can associate to the PEs; for instance, the following:
The global existence of weak solutions is established in Lions et al. (1992b), using the Galerkin method and assuming the H 2 -regularity of the GFD– Stokes problem, which was established in Ziane
Geophysical Dynamics
(1995). A more general global existence result based on the method of finite differences in time and independent of the H 2 -regularity is established in Temam and Ziane (2004), which we state here. Theorem 2 Given t1 > 0, U0 in H, and F = (Fv , FT , FS ) in L2 (0, t1 ; H); g = gv , gT is given in L2 (0, t1 ; (L2 (i )3 ). Then there exists U 2 L1 ð0; t1 ; HÞ \ L2 ð0; t1 ; VÞ
½23
which is a weak solution of [10]–[17] and [18]–[20], [22]; furthermore, U is weakly continuous from [0, t1 ] into H. Strong Solutions
The local existence and uniqueness of strong solutions of the primitive equations of the ocean relies on the H 2 -regularity of the stationary linear primitive equations associated to [10]–[17]: 1 @2v rp þ 2 sin k v v v v 2 ¼ Fv 0 @z Z 0 div v dz ¼ 0
½24
h
@2T ¼ FT @z2 @2S S S S 2 ¼ FS @z Z 0 p ¼ ps þ P; P ¼ PðT; SÞ ¼ g p dz0
Theorem 4 Let " > 0 be given. We assume that i i ! Rþ is of class C3 . We is of class C3 and that h : are given U0 in V, F = (Fv , FT , FS ) in L2 (0, t1 ; H) with @F=@t in L2 (0, t1 ; L2 (M" )4 ), and g = (gv , gT ) in L2 (0, t1 ; H01 (i )3 ) with @g=@t in L2 (0, t1 ; H01 (i )3 ). Then there exists t > 0, t = t (kU0 k), and there exists a unique solution U = U(t) = (v(t), T(t), S(t)) of the PEs [10]–[17], [18]–[20], and [22] such that ½28
½25
The PEs of the Atmosphere ½26
z
with boundary conditions [18]–[20]. Here Fv , FT , FS are independent of time. We have the following H 2 -regularity of solutions (Ziane 1995, Hu et al. 2002, Temam and Ziane 2004). i ), h h > 0, Theorem 3 Assume that h is in C4 ( Fv , FT , FS 2 (L2 (M" ))4 and gv = v þ v va , gT = a Ta 2 (H10 (i ))4 . Let (v, T, S; p) 2 (H1 (M" ))4 L2 (i ) be a weak solution of [24]–[26]. Then 2 ðv; pÞ 2 H2 ðM" Þ H 1 ðM" Þ ½27 2 ðT; SÞ 2 H2 ðM" Þ Moreover, the following inequalities hold: jvj2H2 ðM" Þ þ "jpj2H1 ði Þ h i C jFv j2" þ jgv j2L2 ði Þ þ "jrgv j2L2 ði Þ
jTj2H2 ðM" Þ
We now turn our attention to the nonlinear timedependent PEs. The local-in-time existence and uniqueness of strong solutions is obtained in Temam and Ziane (2004); see also Hu et al. (2003) and Guille´n-Gonza´lez et al. (2001). The proof is more involved than that of the threedimensional Navier–Stokes equations. It consists of several steps. In the first step, one proves the global existence of strong solutions to the linearized timedependent problem. In the second step, one uses the solution of the linearized equation in order to reduce the PEs to a nonlinear evolution equation with zero initial data and homogeneous boundary conditions. Finally, in the last step, one uses nonisotropic Sobolev inequalities together with Theorem 3. The local existence result is given by the following:
U 2 Cð½0; t ; VÞ \ L2 ð0; t ; H 2 ðM" Þ4 Þ
T T T
537
h i C jFT j2 þ jgT j2L2 ði Þ þ "jrgT j2L2 ði Þ
jSj2H2 ðM" Þ CjFS j2 where C is a positive constant independent of ".
In this section we briefly describe the PEs of the atmosphere, for which all the mathematical results obtained for the PEs of the ocean are valid. We start from the conservations equations similar to [1]–[5]; in fact [1] and [2] are the same; the equation of energy conservation (temperature) is slightly different from [3] because of the compressibility of air; the state equation is that of perfect gas instead of [5]; finally, instead of the concentration of salt in the water, we consider the amount of water in air, q. Hence, we have
dV 3 þ 2 W V 3 þ r3 p þ g ¼ D dt d þ div3 V 3 ¼ 0 dt hcp
dT RT dp ¼ QT dt p dt
dq ¼ 0; dt
½29 ½30
½31
p ¼ RpT
Here cp > 0 is the specific heat of air at constant pressure, and R is the specific gas constant for the air. Proceeding as in the PEs of the ocean, we decompose V 3 into its horizontal and vertical components, V 3 = v þ w; then we use the hydrostatic approximation, replacing the equation of
538 Geophysical Dynamics
conservation of vertical momentum by the hydrostatic equation [9]. We find @v @v 1 þ rv v þ w þ p @t @z r0 þ 2 sin v v v v
@2v ¼0 @z2
@p ¼ g @z @T @T þ rv T þ w T T @t @z @ 2 T RT dp ¼ QT T 2 @z p dt @q @q @2q þ rv q þ w q q q 2 ¼ 0 @t @z @z p ¼ RpT
½32 ½33
½34 ½35 ½36
The right-hand side of [34], represents the solar heating. Change of Vertical Coordinate
Since does not vanish, the hydrostatic equation [33] implies that p is a strictly decreasing function of z, and we are thus allowed to use p as the vertical coordinate; hence in spherical geometry the independent variables are now ’, , p, and t. By an abuse of notation, we still denote by v, p, T, q, these functions expressed in the ’, , p, t variables. We denote by ! the vertical component of the wind in the new variables, and one can show that the PEs of the atmosphere become @v @v þ rv v þ ! @t @p þ 2 sin k v þ r Lv v ¼ Fv
operators, with suitable eddy viscosity coefficients, expressed in the ’, , p variables. Hence, for example, " # @ gp 2 @v ½43 Lv v ¼ v v þ v @p @p RT with similar expressions for LT and Lq . Note that FT corresponds to the heating of the Sun, whereas Fv and Fq (which vanish in reality) are added here for mathematical generality. The change of variable gives, for @ 2 v=@z2 , a term different from the coefficient of v . The expression above is simplified for of this coefficient; the simplification is legitimate because v is a very small coefficient (in particular, T has been replaced by (known) average value of the temperature). T Pseudogeometrical Domain
For physical and mathematical reasons, we do not allow the pressure to go to zero, and assume that p p0 , with p0 > 0 ‘‘small.’’ Physically, in the very high atmosphere (p very small), the air is ionized and the equations above are not valid anymore. The pressure is then restricted to an interval p0 < p < p1 , where p1 is a value of the pressure smaller in average than the pressure on Earth, so that the isobar p = p1 is slightly above the Earth and the isobar p = p0 is an isobar high in the sky. We study the motion of the air between these two isobars. For the whole atmosphere, the boundary of this domain M ¼ fð’; ; pÞ; p0 < p < p1 g consists first of an upper part u , p = p0 ; the lower part p = p1 is divided into two parts i the part of p = p1 at the interface with the ocean, and e the part of p = p1 above the earth. Boundary Conditions
½37
@ R þ T¼0 @p p
½38
@w div v þ z¼0 @
½39
@T @T RT þ rv T þ ! ! LT ¼ FT @t @p p
½40
@q @q þ rv q þ ! Lq q ¼ Fq @t @z
½41
p ¼ RT
½42
Typically, the boundary conditions are as follows: On the top of the atmosphere
u (p = p0 )
@v @T @q ¼ 0; ! ¼ 0; ¼ 0; ¼0 @p @p @p
½44
Acknowledgments
We have denoted by = gz the geopotential (z is now function of ’, , p, t); Lv , LT , Lq are the Laplace
This work was partially supported by the National Science Foundation under the grant NSFDMS-0204863. See also: Boundary Control Method and Inverse Problems of Wave Propagation; Compressible Flows: Mathematical Theory; Fluid Mechanics: Numerical Methods; Turbulence Theories.
Gerbes in Quantum Field Theory
Further Reading Bresch D, Kazhikov A, and Lemoine J (2004/05) On the twodimensional hydrostatic Navier–Stokes equations. SIAM Journal of Mathematical Analysis 36(3): 796–814. Constantin P and Foias C (1998) Navier–Stokes Equations. Chicago: University of Chicago Press. Guille´n-Gonza´lez F, Masmoudi N, and Rodrı´guez-Bellido MA (2001) Anisotropic estimates and strong solutions of the primitive equations. Differential and Integral Equations 14(11): 1381–1408. Hu C, Temam R, and Ziane M (2002) Regularity results for linear elliptic problems related to the primitive equations. Chinese Annals of Mathematics Series B 23(2): 277–292. Hu C, Temam R, and Ziane M (2003) The primitive equations of the large scale ocean under the small depth hypothesis. Discrete and Continuous Dynamical Systems Series A 9(1): 97–131. Lions JL, Temam R, and Wang S (1992a) New formulations of the primitive equations of the atmosphere and applications. Nonlinearity 5: 237–288. Lions JL, Temam R, and Wang S (1992b) On the equations of the large-scale ocean. Nonlinearity 5: 1007–1053. Lions JL, Temam R, and Wang S (1993) Models of the coupled atmosphere and ocean (CAO I). In: Oden JT (ed.) Computational Mechanics Advances, vol. 1: 5–54. North-Holland: Elsevier Science.
539
Lions JL, Teman R, and Wang S (xxxx) Numerical analysis of the coupled models of atmosphere and ocean. (CAO II). In: Oden JT (ed.) Computational Mechanics Advances, vol. 1: 55–119. North-Holland: Elsevier Science. Lions JL, Temam R, and Wang S (1995) Mathematical study of the coupled models of atmosphere and ocean (CAO III). Journal de Mathe´matiques Pures et Applique´es, Neuvie´me Se´rie 74: 105–163. Pedlosky J (1987) Geophysical Fluid Dynamics, 2nd edn. New York: Springer. Schlichting H (1979) Boundary Layer Theory, 7th edn. New York: McGraw-Hill. Smagorinsky J (1963) General circulation experiments with the primitive equations, I. The basic experiment. Monthly Weather Review. 91: 98–164. Temam R (2001) Navier–Stokes Equations, Studies in Mathematics and Its Applications, AMS-Chelsea Series. Providence: American Mathematical Society. Temam R and Ziane M (2004) Some mathematical problems in geophysical fluid dynamics. In: Friedlander S and Serre D (eds.) Handbook of Mathematical Fluid Dynamics, vol. III, pp. 535–657. Amsterdam: North-Holland. Washington WM and Parkinson CL (1986) An Introduction to Three Dimensional Climate Modeling.Oxford: Oxford University Press. Ziane M (1995) Regularity results for Stokes type systems. Applicable Analysis 58: 263–292.
Gerbes in Quantum Field Theory J Mickelsson, KTH Physics, Stockholm, Sweden ª 2006 Elsevier Ltd. All rights reserved.
Definitions and an Example A gerbe can be viewed as a next step in a ladder of geometric and topological objects on a manifold which starts from ordinary complex-valued functions and in the second step of sections of complex line bundles. It is useful to recall the construction of complex line bundles and their connections. Let M be a smooth manifold and {U } an of open cover of M which trivializes a line bundle L over M. Topologically, up to equivalence, the line bundle is completely determined by its Chern class, which is a cohomology class [c] 2 H 2 (M, Z). On each open set U we may write 2 ic = dA , where A is a 1-form. On the overlaps U = U \ U we can write 1 df A A ¼ f
½1
at least when U is contractible, where f is a circle-valued complex function on the overlap. The data {c, A , f } define what is known as a (representative of a) Deligne cohomology class on the open cover {U }. The 1-forms A are the local
potentials of the curvature form 2 ic and the f ’s are the transition functions of the line bundle L. Each of these three different data defines separately the equivalence class of the line bundle but together they define the line bundle with a connection. The essential thing here is that there is a bijection between the second integral cohomology of M and the set of equivalence classes of complex line bundles over M. It is natural to ask whether there is a geometric realization of integral third (or higher) cohomology. In fact, gerbes provide such a realization. Here, we shall restrict to a smooth differential geometric approach which by no means is the most general possible, but it is sufficient for most applications to quantum field theory. However, there are examples of gerbes over orbifolds that do not need to come from finite group action on a manifold, which are not covered by the following definition. For the examples in this article, it is sufficient to adapt the following definition. A gerbe over a manifold M (without geometry) is simply a principal bundle : P ! M with fiber equal to PU(H), the projective unitary group of a Hilbert space H. The Hilbert space may be either finite or infinite dimensional. The quantum field theory applications discussed in this article are related to the chiral anomaly for
540 Gerbes in Quantum Field Theory
fermions in external fields. The link comes from the fact that the chiral symmetry breaking leads in the generic case to projective representations of the symmetry groups. For this reason, when modding out by the gauge or diffeomorphism symmetries, one is led to study bundles of projective Hilbert spaces. The anomaly is reflected as a nontrivial characteristic class of the projective bundle, known in mathematics literature as the Dixmier– Douady class. In a suitable open cover, the bundle P has a family of local trivializations with transition functions g : U ! PU(H), with the usual cocycle property
Let M be an oriented Riemannian manifold and FM its bundle of oriented orthonormal frames. The structure group of FM is the rotation group SO(n) with n = dimM. The spin bundle (when it exists) is a double covering Spin(M) of FM, with structure group Spin(n), a double cover of SO(n). Even when the spin bundle does not exist there is always the bundle Cl(M) of Clifford algebras over M. The fiber at x 2 M is the Clifford algebra defined by the metric gx , that is, it is the complex 2n -dimensional algebra generated by the tangent vectors v 2 Tx (m) with the defining relations
g g g ¼ 1
ðuÞðvÞ þ ðvÞðuÞ ¼ 2gx ðu; vÞ
½2
First example
on triple overlaps. Assuming that the overlaps are contractible, we can choose lifts ^ g : U ! U(H), to the unitary group of the Hilbert space. However,
The Clifford algebra has a faithful representation in N = 2[n=2] dimensions ([x] is the integral part of x) such that
^ g ^ g ¼ f g ^
ða uÞ ¼ SðaÞðuÞSðaÞ1
½3
where the f ’s are circle-valued functions on triple overlaps. They satisfy automatically the cocycle property 1 1 f f ¼1 f f
½4
on quadruple overlaps. There is an important difference between the finite- and infinite-dimensional cases. In the finite-dimensional case, the circle bundle U(H) ! U(H)=S1 = PU(H) reduces to a bundle with fiber Z=NZ = ZN , where N = dimH. This follows from U(N)=S1 = SU(N)=ZN and the fact that SU(N) is a subgroup of U(N). For this reason one can choose the lifts ^ g such that the functions f take values in the finite subgroup ZN S1 . The functions f define an element a = {a } in the Cˇech cohomology H3 (U, Z) by a choice of logarithms, 2ia ¼ log f log f þ log f log f
½5
ˇ ech cocycle is In the finite-dimensional case, the C necessarily torsion, Na = 0, but not so if H is infinite dimensional. In the finite-dimensional case (by passing to a good cover and using the Cˇech – de Rham equivalence over real or complex numbers), the class is third de Rham cohomology constructed from the transition functions is necessarily zero. Thus, in general one has to work with Cˇech cohomology to preserve torsion information. One can prove: Theorem The construction above is a one-to-one map between the set of equivalence classes of PU(H) bundles over M and elements of H 3 (M, Z). The characteristic class in H3 (M, Z) of a PU(H) bundle is called the Dixmier–Douady class.
where S is an unitary representation of Spin(n) in CN . Since Spin(n) is a double cover of SO(n), the representation S may be viewed as a projective representation of SO(n). Thus again, if the overlaps U are contractible, we may choose a lift of the frame bundle transition functions g to unitaries ^g in H = CN . In this case, the functions f reduce to Z2 -valued functions, and the obstruction to the lifting problem, which is the same as the obstruction to the existence of spin structure, is an element of H 2 (M, Z2 ), known as the second Stiefel–Whitney class w2 . The image of w2 with respect to the Bockstein map (in this case, given by the formula [5]) gives a 2-torsion element in H 3 (M, Z), the Dixmier–Douady class. Another way to think of a gerbe is the following (we shall see that this arises in a natural way in quantum field theory). There is a canonical complex line bundle L over PU(H), the associated line bundle to the circle bundle S1 ! U(H) ! PU(H). Pulling back L by the local transition functions g ! PU(H), we obtain a family of line bundles L over the open sets U . By the cocycle property [2] we have natural isomorphisms L L ¼ L
½6
We can take this as a definition of a gerbe over M: a collection of line bundles over intersections of open sets in an open cover of M, satisfying the cocycle condition [6]. By [6] we have a trivialization L L L ¼ f 1
½7
where the f ’s are circle-valued functions on the triple overlaps. By the theorem above, we conclude
Gerbes in Quantum Field Theory
541
that indeed the data in [6] define (an equivalence class of) a principal PU(H) bundle. If L and L0 are two systems of local line bundles over the same cover, then the gerbes are equivalent if there is a system of line bundles L over open sets U such that
space P consists of smooth paths f (t) in G starting from the neutral element such that f 1 df is smooth and periodic. The projection P ! G is the evaluation at the end point f(1). The fiber is clearly G. By Bott periodicity, the homotopy groups of G are shifted from those of G by one dimension, that is,
L0 ¼ L L L
n G ¼ nþ1 G
½8
on each U . A gerbe may come equipped with geometry, encoded in a Deligne cohomology class with respect to a given open covering of M. The Deligne class is given by functions f , 1-forms A , 2-forms F , and a global 3-form (the Dixmier–Douady class of the gerbe) , subject to the conditions dF ¼ 2i F F ¼ dA A A þ A ¼
½9
1 f df
Gerbes from Canonical Quantization Let Dx be a family of self-adjoint Fredholm operators in a complex Hilbert space H parametrized by x 2 M. This situation arises in quantum field theory, for example, when M is some space of external fields, coupled to Dirac operator D on a compact manifold. The space M might consist of gauge potentials (modulo gauge transformations) or M might be the moduli space of Riemann metrics. In these examples, the essential spectrum of Dx is both positive and negative and the family Dx defines an element of K1 (M). In fact, one of the definitions of K1 (M) is that its elements are homotopy classes of maps from M to the space F of self-adjoint Fredholm operators with both positive and negative essential spectrum. In physics applications, one deals most often with unbounded Hamiltonians, and the operator norm topology must be replaced by something else; popular choices are the Riesz topology defined by the map F 7! F=(jFj þ 1) to bounded operators or the gap topology defined by graph metric. The space F is homotopy equivalent to the group G = U1 (H) of unitary operators g in H such that g 1 is a trace-class operator. This space is a classifying space for principal Ures bundles, where Ures is the group of unitary operators g in a polarized complex Hilbert space H = Hþ H such that the off-diagonal blocks of g are Hilbert–Schmidt operators. This is related to Bott periodicity. There is a natural principal bundle P over G = U1 (H) with fiber equal to the group G of based loops in G. The total
The latter are zero in even dimensions and equal to Z in odd dimensions. On the other hand, it is known that the even homotopy groups of Ures (H) are equal to Z and the odd ones vanish. In fact, with a little more effort, one can show that the embedding of G to Ures (H) is a homotopy equivalence, when H = L2 (S1 , H), the polarization being the splitting to nonnegative and negative Fourier modes and the action of G is the pointwise multiplication on H-valued functions on the circle S1 . Since P is contractible, it is indeed the classifying bundle for Ures bundles. Thus, we conclude that ‘‘K1 (M) = the set of homotopy classes of maps M ! G = the set of equivalence classes of Ures bundles over M.’’ The relevance of this fact in quantum field theory follows from the properties of representations of the algebra of canonical anticommutation relations (CAR). For any complex Hilbert space H, this algebra is the algebra generated by elements a(v) and a (v), with v 2 H, subject to the relations a ðuÞaðvÞ þ aðvÞa ðuÞ ¼ 2 < v; u > where the Hilbert space inner product on the righthand side is antilinear in the first argument, and all other anticommutators vanish. In addition, a (u) is linear and a(v) antilinear in its argument. An irreducible Dirac representation of the CAR algebra is given by a polarization H = Hþ H . The representation is characterized by the existence of a vacuum vector in the fermionic Fock space F such that a ðuÞ ¼ 0 ¼ aðvÞ
for u 2 H ; v 2 Hþ
½10
A theorem of D Shale and W F Stinespring says that two Dirac representations defined by a pair of 0 polarizations Hþ , Hþ are equivalent if and only if 0 there is g 2 Ures (Hþ H ) such that Hþ = g Hþ . In addition, in order that a unitary transformation g is implementable in the Fock space, that is, there is a unitary operator ^g in F such that ^ga ðvÞ^g1 ¼ a ðgvÞ;
8v 2 H
½11
and similarly for the a(v)’s, one must have g 2 Ures with respect to the polarization defining the vacuum vector. This condition is both necessary and sufficient.
542 Gerbes in Quantum Field Theory
The polarization of the one-particle Hilbert space comes normally from a spectral projection onto the positive-energy subspace of a Hamilton operator. In the background field problems one studies families of Hamilton operators Dx and then one would like to construct a family of fermionic Fock spaces parametrized by x 2 M. If none of the Hamilton operators has zero modes, this is unproblematic. However, the presence of zero modes makes it impossible to define the positive-energy subspace Hþ (x) as a continuous function of x. One way out of this is to weaken the condition for the polarization: each x 2 M defines a Grassmann manifold Grres (x) consisting of all subspaces W H such that the projections onto W and Hþ (x) differ by Hilbert– Schmidt operators. The definition of Grres (x) is stable with respect to finite-rank perturbations of Dx =jDx j. For example, when Dx is a Dirac operator on a compact manifold then (Dx )=jDx j defines the same Grassmannian for all real numbers because in each finite interval there are only a finite number of eigenvalues (with multiplicities) of Dx . From this follows that the Grassmannians form a locally trivial fiber bundle Gr over families of Dirac operators. If the bundle Gr has a global section x 7!Wx then we can define a bundle of Fock space representations for the CAR algebra over the parameter space M. However, there are important situations when no global sections exist. It is easier to explain the potential obstruction in terms of a principal Ures bundle P such that Gr is an associated bundle to P. The fiber of P at x 2 M is the set of all unitaries g in H such that g Hþ 2 Grx where H = Hþ H is a fixed reference polarization. Then we have Gr ¼ P Ures Grres ; where the right action of Ures = Ures (Hþ H ) in the fibers of P is the right multiplication on unitary operators and the left action on Grres comes from the observation that Grres = Ures =(Uþ U ), where U are the diagonal block matrices in Ures . By a result of N Kuiper, the subgroup U U is contractible and so Gr has a global section if and only if P is trivial. Thus, when P is trivial we can define the family of Dirac representations of the CAR algebra parametrized by M such that in each of the Fock spaces we have a Dirac vacuum which, in a precise sense, is close to the vacuum defined by the energy polarization. However, the triviality of P is not a necessary condition. Actually, what is needed is that P has a ^ with fiber U ^ res . The group prolongation to a bundle P ^ Ures is a central extension of Ures by the group S1 .
^ res is as a vector space the direct The Lie algebra u sum ures iR, with commutators ½X þ ; Y þ ¼ ½X; Y þ cðX; YÞ
½12
where c is the Lie algebra cocycle cðX; YÞ ¼ 14 tr ½ ; X½ ; Y
½13
Here is the grading operator with eigenvalues 1 on H . The trace exists since the off diagonal blocks of X, Y are Hilbert–Schmidt. ^ res is a circle bundle over Ures . The The group U Chern class of the associated complex line bundle is the generator of H2 (Ures , Z) and is given explicitly at the identity element as the antisymmetric bilinear form c=2i and at other points on the group manifold through left-translation of c=2i. If P is trivial, then it has an obvious prolongation to the ^ res . In any case, if the prolongatrivial bundle M U tion exists we can define the bundle of Fock spaces carrying CAR representations as the associated bundle ^ ^ F0 F ¼P Ures where is F 0 is the fixed Fock space defined by the same polarization H = Hþ H used to define Ures . By the Shale–Stinespring theorem, any g 2 Ures has an implementation ^g in F 0 , but ^g is only defined up to phase, thus the central S1 extension. The action of the CAR algebra in the fibers is ^x. given as follows. For x 2 M choose any ^g 2 P Define a ðvÞ ð^g; Þ ¼ ð^g; a ðg1 vÞ Þ where 2 F 0 and v 2 H; similarly for the operators a(v). It is easy to check that this definition passes to the equivalence classes in F . Note that the representations in different fibers are in general inequivalent because the tranformation g is not implementable in the Fock space F 0 . The potential obstruction to the existence of the prolongation of P is again a 3-cohomology class on the base. Choose a good cover of M. On the intersections U of the open cover the transition functions g of P can be prolonged to functions ^ res . We have ^g : U ! U ^g ^g ^g ¼ f 1
½14
for functions f : U ! S1 , which by construction satisfy the cocycle property [4]. Since the cocycle is ˇ ech defined on a good cover, it defines an integral C 3 cohomology class ! 2 H (M, Z). Let us return to the universal Ures bundle P over G = U1 (H). In this case the prolongation obstruction can be computed relatively easily. It turns out that
Gerbes in Quantum Field Theory
the 3-cohomology class is represented by the de Rham class which is the generator of H 3 (G, Z). Explicitly, !¼
1 tr ðg1 dgÞ3 242
½15
Any principal Ures bundle over M comes from a pullback of P with respect to a map f : M ! G, so the Dixmier–Douady class in the general case is the pullback f !. The line bundle construction of the gerbe over the parameter space M for Dirac operators is given by the observation that the spectral subspaces E0 (x) of Dx , corresponding to the open interval ],0 [ in the real line, form finite-rank vector bundles over open sets U0 = U \ U0 . Here U is the set of points x 2 M such that does not belong to the spectrum of Dx . Then we can define, as top exterior power, ^top ðE0 Þ L0 ¼ as the complex vector bundle over U0 . It follows immediately from the definition that the cocycle property [6] is satisfied. Example 1 (Fermions on an interval). Let K be a compact group and its unitary representation in a finite-dimensional vector space V. Let H be the Hilbert space of square-integrable V-valued functions on the interval [0, 2] of the real axis. For each g 2 K let Domg H be the dense subspace of smooth functions with the boundary condition (2) = (g) (0). Denote by Dg the operator id=dx on this domain. The spectrum of Dg is a function of the eigenvalues k of (g), consisting of real numbers n þ log (k )=2i with n 2 Z. For this reason the splitting of the one-particle space H to positive and negative modes of the operator Dg is in general not continuous as function of the parameter g. This leads to the problems described above. However, the principal Ures bundle can be explicitly constructed. It is the pullback of the universal bundle P with respect to the map f : K ! G defined by the embedding (K) G as N N block matrices, N = dim V. Thus, the Dixmier–Douady class in this example is !¼
1 tr ð ðgÞ1 d ðgÞÞ3 242
½16
Example 2 (Fermions on a circle). Let H = L2 (S1 , V) and DA = i(d=dx þ A) where A is a smooth vector potential on the circle taking values in the Lie algebra k of K. In this case, the domain is fixed,
543
consisting of smooth V-valued functions on the circle. The k-valued function A is represented as a multiplication operator through the representation
of K. The parameter space A of smooth vector potentials is flat; thus, there cannot be any obstruction to the prolongation problem. However, in quantum field theory, one wants to pass to the moduli space A=G of gauge potentials. Here G is the group of smooth based gauge transformations, that is, G = K. Now the moduli space is the group of holonomies around the circle, A=G = K. Thus, we are in a similar situation as in Example 1. In fact, these examples are really two different realizations of the same family of self-adjoint Fredholm operators. The operator DA with k = holonomy(A) has exactly the same spectrum as Dk in Example 1. For this reason, the Dixmier–Douady class on K is the same as before. The case of Dirac operators on the circle is simple because all the energy polarizations for different vector potentials are elements in a single Hilbert– Schmidt Grassmannian Gr(Hþ H ), where we can take as the reference polarization the splitting to positive and negative Fourier modes. Using this polarization, the bundle of fermionic Fock spaces over A can be trivialized as F = A F 0 . However, the action of the gauge group G on F acquires a c where LK is the free loop central extension G^ LK, group of K. The Lie algebra cocycle determining the central extension is Z 1 cðX; YÞ ¼ tr X dY ½17 2i S1 where tr is the trace in the representation of K. Because of the central extension, the quotient F =G^ defines only a projective vector bundle over A=G, the Dixmier–Douady class being given by [16]. In the Example 1 (and Example 2) above, the complex line bundles can be constructed quite explicitly. Let us study the case K = SU(n). Define U K as the set of matrices g such that is not an eigenvalue of g. Select n different points j on the unit circle such that their product is not equal to 1. We assume that the points are ordered counterclockwise on the circle. Then the sets Uj = Uj form an open cover of SU(n). On each Uj we can choose a continuous branch of the logarithmic function log : Uj ! su(n). The spectrum of the Dirac operator Dg with the holonomy g consists of the infinite set of numbers Z þ Spec(i log (g)). In particular, the numbers Z i log j do not belong to the spectrum of Dg . Choosing k = i log k as an increasing sequence in the interval [0, 2], we can as well define Uj = {x 2 Mjj 2 = Spec(Dx )}. In any case, the
544 Gerbes in Quantum Field Theory
top exterior power of the spectral subspace Ej ,k (x) is given by zero Fourier modes consisting of the spectral subspace of the holonomy g in the segment [j , k ] of the unit circle.
S
Index Theory and Gerbes Gauge and gravitational anomalies in quantum field theory can be computed by Atiyah–Singer index theory. The basic setup is as follows. On a compact even-dimensional spin manifold S (without boundary) the Dirac operators coupled to vector potentials and metrics form a family of Fredholm operators. The parameter space is the set A of smooth vector potentials (gauge connections) in a vector bundle over S and the set of smooth Riemann metrics on S. The family of Dirac operators is covariant with respect to gauge transformations and diffeomorphisms of S; thus, we may view the Dirac operators parametrized by the moduli space A=G of gauge connections and the moduli space M=Diff 0 (S) of Riemann metrics. Again, in order that the moduli spaces are smooth manifolds, one has to restrict to the based gauge transformations, that is, those which are equal to the neutral element in a fixed base point in each connected component of S. Similarly, the Jacobian of a diffeomorphism is required to be equal to the identity matrix at the base points. Passing to the quotient modulo gauge transformations and diffeomorphims, we obtain a vector bundle over the space S A=G M=Diff 0 ðSÞ
½18
Actually, we could as well consider a generalization in which the base space is a fibering over the moduli space with model fiber equal to S, but for simplicity we stick to [18]. According to the Atiyah–Singer index formula for families, the K-theory class of the family of Dirac operators acting on the smooth sections of the tensor product of the spin bundle and the vector bundle V over [18] is given through the differential forms ^ AðRÞ ^ chðVÞ ^ where A(R) is the A-roof genus, a function of the Riemann curvature tensor R associated with the Riemann metric, R=4i 1=2 ^ AðRÞ ¼ det sinhðR=4iÞ and ch(V) is the Chern character chðVÞ ¼ tr eF=2i
where F is the curvature tensor of a gauge connection. Here both R and F are forms on the infinitedimensional base space [18]. After integrating over the fiber S, Z ^ AðRÞ ^ chðVÞ ½19 Ind ¼ we obtain a family of differential forms 2k , one in each even dimension, on the moduli space. The (cohomology classes of) forms 2k contain important topological information for the quantized Yang–Mills theory and for quantum gravity. The form 2 describes potential chiral anomalies. The chiral anomaly is a manifestation of gauge or reparametrization symmetry breaking. If the class [2 ] is nonzero, the quantum effective action cannot be viewed as a function on the moduli space. Instead, it becomes a section of a complex line bundle DET over the moduli space. Since the Dirac operators are Fredholm (on compact manifolds), at a given point in the moduli space we can define the complex line ^top ^top DETx ¼ ðker Dþ ðcoker Dþ ½20 xÞ xÞ for the chiral Dirac operators Dþ x . In the evendimensional case, the spin bundle is Z2 graded such that the grading operator anticommutes with Dx . Then Dþ x = P Dx Pþ , where P = (1=2)(1 ) are the chiral projections. ^top means the operation on finite-dimensional vector spaces W taking the exterior power of W to dim W. When the dimensions of the kernel and cokernel of Dx are constant, eqn [20] defines a smooth complex line bundle over the moduli space. In the case of varying dimensions, a little extra work is needed to define the smooth structure. The form 2 is the Chern class of DET. So if DET is nontrivial, gauge covariant quantization of the family of Dirac operators is not possible. One can also give a geometric and topological meaning to the chiral symmetry breaking in Hamiltonian quantization, and this leads us back to gerbes on the moduli space. Here we have to use an odd version of the index formula [19]. Assuming that the physical spacetime is even dimensional, at a fixed time the space is an odd-dimensional manifold S. We still assume that S is compact. In this case, the integration in [19] is over odd-dimensional fibers and, therefore, the formula produces a sequence of odd forms on the moduli space. The first of the odd forms 1 gives the spectral flow of a one-parameter family of operators Dx(s) . Its integral along the path x(t), after a correction by the difference of the eta invariant at the end points
Gerbes in Quantum Field Theory
of the path, in the moduli space, gives twice the difference of positive eigenvalues crossing over to the negative side of the spectrum minus the flow of eigenvalues in the opposite direction. The second term 3 is the Dixmier–Douady class of the projective bundle of Fock spaces over the moduli space. In Examples 1 and 2, the index theory calculation gives exactly the form [16] on K. Example Consider Dirac operators on the threedimensional sphere S3 coupled to vector potentials. Any vector bundle on S3 is trivial, so let V = S3 CN . Take SU(N) as the gauge group and let A be the space of 1-forms on S3 taking values in the Lie algebra su(N) of SU(N). Fix a point xs on S3 , the ‘‘south pole,’’ and let G be the group of gauge transformations based at xs . That is, G consists of smooth functions g : S3 ! SU(N) with g(xs ) = 1. In this case A=G can be identified as Map(S2 , SU(N)) times a contractible space. This is because any point x on the equator of S3 determines a unique semicircle from the south pole to the north pole through x. The parallel transport along this path with respect to a vector potential A 2 A defines an element g0A (x) 2 SU(N), using the fixed trivialization of V. Set gA (x) = g0A (x)g0A (x0 )1 , where x0 is a fixed point on the equator. The element gA (x) then depends only on the gauge equivalence class [A] 2 A=G. It is not difficult to show that the map A 7! gA is a homotopy equivalence from the moduli space of gauge potentials to the group G2 = Mapx0 (S2 , SU(N)), based at x0 . When N > 2, the cohomology H 5 (SU(N), Z) = Z transgresses to the cohomology H 3 (G2 , Z) = Z. In particular, the generator 3 i 2 trðg1 dgÞ5 !5 ¼ 2 5! of H 5 (SU(N), Z) gives the generator of H 3 (G2 , Z) by contraction and integration, Z ¼ !5 S2
545
a certain class of complex functions on A. The extension is then defined by the commutators ½ðX; Þ; ðY; Þ ¼ ð½X; Y; LX LY þ cðX; YÞÞ ½21 where , are functions on A and LX denotes the Lie derivative of in the direction of the infinitesimal gauge transformation X. The 2-cocycle property of c is expressed as cð½X; Y; ZÞ þ LX cðY; ZÞ þ cyclic permutations of X; Y; Z ¼ 0 In the case of Dirac operators on a 3-manifold S the form c is the Mickelsson–Faddeev cocycle Z i cðX; YÞ ¼ tr A ^ ðdX ^ dY dY ^ dXÞ ½22 122 S The corresponding gauge group extension is an extension of Map(S, G) by the normal subgroup Map(A, S1 ). As a topological space, the extension is the product MapðA; S1 Þ S1 P where P is a principal S1 bundle over Map(S, G). The Chern class c1 of the bundle P is again computed by transgression from !5 ; this time Z c1 ¼ !5 S
In fact, we can think of the cocycle c as a 2-form on the space of flat vector potentials A = g1 dg with g 2 Map(S3 , G). Then one can show that the cohomology classes [c] and [c1 ] are equal. As we have seen, the central extension of a loop group is the key to understanding the quantum field theory gerbe. Here is a brief description of it starting from the 3-form [16] on a compact Lie group G. First define a central extension Map(D, G) S1 of the group of smooth maps from the unit disk D to G, with pointwise multiplication. The group multiplication is given as 0
ðg; Þ ðg 0 ; 0 Þ ¼ ðgg 0 ; 0 e2iðg; g Þ Þ
Gauge Group Extensions The new feature for gerbes associated with Dirac operators in higher than one dimension is that the gauge group, acting on the bundle of Fock spaces parametrized by vector potentials, is represented through an abelian extension. On the Lie algebra level this means that the Lie algebra extension is not given by a scalar cocycle c as in the one-dimensional case but by a cocycle taking values in an abelian Lie algebra. In the case of Dirac operators coupled to vector potentials, the abelian Lie algebra consists of
where ðg; g 0 Þ ¼
1 82
Z
tr g1 dg ^ dg 0 g 0
1
½23
D
where the trace is computed in a fixed unitary representation of G. This group contains as a normal subgroup the group N consisting of pairs (g, e2iC(g) ) with Z 1 tr ðg1 dgÞ3 ½24 CðgÞ ¼ 242 B
546 Gerbes in Quantum Field Theory
Here g(x) = 1 on the boundary circle S1 = @D, and thus can be viewed as a function S2 ! G. The three-dimensional unit ball B has S2 as a boundary and g is extended in an arbitrary way from the boundary to the ball B. The extension is possible since 2 (G) = 0 for any finite-dimensional Lie group. The value of C(g) depends on the extension only modulo an integer and therefore e2iC(g) is well defined. The central extension is then defined as d ¼ ðMapðD; GÞ S1 Þ=N LG d One can show easily that the Lie algebra of LG is indeed given through the cocycle [17]. When G = SU(n) in the defining representation, this central extension is the basic extension: The cohomology class is the generator of H 2 (LG, Z). In general, to obtain the basic extension one has to correct [23] and [24] by a normalization factor. This construction generalizes to the higher loop groups Map(S, G) for compact odd-dimensional manifolds S. For example, in the case of a 3-manifold, one starts from an extension of Map(D, G), where D is a 4-manifold with boundary S. The extension is defined by a 2-cocycle , but now for given g, g 0 the cocycle is a realvalued function of a point g0 2 Map(S, G), which is a certain differential polynomial in the Maurer– 1 1 Cartan 1-forms g1 0 dg0 , g dg, g dg. The normal subgroup N is defined in a similar way; now C(g) is the integral of the 5-form !5 over a 5-manifold B with boundary @B identified as D=, the equivalence shrinking the boundary of D to one point. This gives the extension only over the connected component of identity in Map(S, G), but it can be generalized to the whole group. For example, when S = S3 and G is simple, the connected components are labeled by elements of the third homotopy group 3 G = Z. In some cases, the de Rham cohomology class of the extension vanishes but the extension still contains interesting torsion information. In quantum field theory this comes from Hamiltonian
formulation of global anomalies. A typical example of this phenomenon is the Witten SU(2) anomaly in four spacetime dimensions. In the Hamiltonian formulation, we take S3 as the physical space, the gauge group G = SU(2). In this case, the second cohomology of Map(S3 , G) becomes pure torsion, related to the fact that the 5-form !5 on SU(2) vanishes for dimensional reasons. Here the homotopy group 4 (G) = Z2 leads the nontrivial fundamental group Z2 in each connected component of Map(S3 , G). Using this fact, one can show that there is a nontrivial Z2 extension of the group Map(S3 , G). See also: Anomalies; Bosons and Fermions in External Fields; Characteristic Classes; Dirac Operator and Dirac Field; Index Theorems; K-Theory.
Further Reading Araki H (1987) Bogoliubov automorphisms and Fock representations of canonical anticommutation relations. Contemporary Mathematics, vol. 62. Providence: American Mathematical Society. Berline N, Getzler E, and Vergne M (1992) Heat Kernels and Dirac Operators, Die Grundlehren der mathematischen Wissenschaften, vol. 298. Berlin: Springer. Brylinski J-L Loop Spaces, Characteristic Classes, and Geometric Quantization, Progress in Mathematics, vol. 107. Boston: Birkha¨user. Carey AL, Mickelsson J, and Murray M (2000) Bundle gerbes applied to quantum field theory. Reviews in Mathematical Physics 12: 65–90. Gawedzki K and Reis N (2002) WZW branes and gerbes. Reviews in Mathematical Physics 14: 1281–1334. Giraud J (1971) Cohomologie Non Abelienne, (French) Die Grundlehren der Mathematischen Wissenschaften, vol. 179. Berlin: Springer. Hitchin N (2001) Lectures on special Lagrangian submanifolds. In: Winter School on Mirror Symmetry, Vector Bundles and Lagrangian Submanifolds (Cambridge, MA, 1999), AMS/IP Stud. Adv. Math., vol. 23, pp. 151–182. Providence, RI: American Mathematical Society. Mickelsson J (1989) Current Algebras and Groups. London: Plenum. Treiman SB, Jackiw R, Zumino B, and Witten E (1985) Current Algebra and Anomalies, Princeton Series in Physics. Princeton, NJ: Princeton University Press.
Ginzburg–Landau Equation 547
Ginzburg–Landau Equation Y Morita, Ryukoku University, Otsu, Japan ª 2006 Elsevier Ltd. All rights reserved.
Introduction In the Ginzburg–Landau theory of superconductivity, a complex order parameter characterizes a macroscopic/mesoscopic superconducting state in a bulk superconductor. The square of the magnitude jj2 expresses the density of superconducting electrons and is regarded as a macroscopic wave function. With a magnetic vector potential A and the order parameter , the Helmholtz free energy density in a superconducting material near the critical temperature is given by F ¼ Fn þ jj2 þ jj4 2 1 es 2 jHj2 þ hr A þ i 2ms c 8 where Fn denotes the energy density of the normal state, c is the light speed, H = curl A, and ms and es are mass and charge of a superconducting electron, respectively. The parameters and depend on temperature and are determined by the material. Moreover, below the critical temperature Tc , = (T) and = (T) take negative and positive values, respectively. In the presence of an applied magnetic field Hap , we have to consider the Gibbs free energy density, G = F H Hap =4. Introduce the following physical parameters: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffi 0 ¼ = ; Hc ¼ 42 = qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ms c2 =4e2s ; ¼ h2 =2ms ½1 ¼ = The value 20 implies the equilibrium density and Hc is the thermodynamic critical field, which is obtained by equating G = Fn jHap j2 =8 (for the normal state = 0, H = Hap ) with G = Fn 2 =2 (for the perfect superconductivity jj2 = 20 , A = 0). The parameters and stand for penetration depth and coherence length, respectively. The ratio of these characteristic lengths is called the Ginzburg– Landau parameter, which determines the p type ffiffiffi of superconducting material: type I for < 1= 2 and pffiffiffi type II for > 1= 2.
We use the nondimensional variables x0 , 0 , A0 , ~ Hap 0 , and G: ¼ 0 0 x ¼ x0 ; pffiffiffi A ¼ 2Hc A0 ðH0 ¼ curl0 A0 Þ; pffiffiffi Hap ¼ 2Hc Hap 0 = ~ 2 1=2 F ¼ Fn þ ðG=
½2
0
þ 2H0 Hap 0 =2 jHap j2 =2 ÞHc2 =4 Dropping the primes after the change of variables ~ over a domain R n (n = 2, 3), and integrating G which is occupied by a superconducting sample, yields a functional of and A, called the Ginzburg– Landau energy in a nondimensional form, Z 2 Eð; AÞ ¼ jðr iAÞj2 þ ð1 jj2 Þ2 2 þ jcurl A Hap j2 dx ½3 The Ginzburg–Landau equations are the Euler– Lagrange equations of this energy, which are given by ðr iAÞ2 ¼ 2 ðjj2 1Þ in
½4
curl2 A ¼ J þ curl Hap
in
½5
1 ð r r Þ jj2 A 2i
½6
where J :¼
stands for the complex conjugate of . In a twodimensional domain , the differential operator ‘‘curl’’ acts on A = (A1 , A2 ) : R2 ! R2 such that curl A ¼ @x1 A2 @x2 A1 curl H ¼ ð@x2 H; @x1 HÞ H :¼ curl A and Hap is replaced by a scalar-valued function. Note that J represents a supercurrent in the material. Every critical point of the energy is obtained by solving the Ginzburg–Landau equations with appropriate boundary conditions and, thus, a physical state in the superconducting sample is realized by a solution of the equations. A minimizer of [3] is a solution of [4]–[5] that minimizes the energy [3] in an appropriate function space, whereas a local minimizer is a solution minimizing the energy locally in the space. A solution is called a stable solution if it is a local minimizer of the energy. A physically stable phenomenon could be realized by a minimizer or at least a local minimizer.
548 Ginzburg–Landau Equation
The Ginzburg–Landau energy and the equations are gauge invariant under the transformation i
ð; AÞ 7! ðe ; A þ rÞ
½7
for a smooth scalar function (x). Therefore, we can identify two solutions which have the correspondence through the transformation [7]. The following London (Coulomb) gauge is often chosen: div A ¼ 0
in
Ginzburg–Landau Equations in R 2 A standard model of the Ginzburg–Landau energy is considered in the whole space R2 . Let A = (A1 , A2 ) and assume Hap = 0 in [3]. Consider then the energy functional Z Eð; AÞ ¼ jDA j2 þ jcurl Aj2 R2
þ
½8
(with a boundary condition if necessary). Let (, A) be a smooth solution of [4]–[5]. In a region for j(x)j > 0, the expression = w(x) exp (i (x)) (w = j(x)j) leads to
2 ð1 jj2 Þ2 dx 2
where DA := r iA. Then the Ginzburg–Landau equations are D2A ¼ 2 ðjj2 1Þ 2
curl A ¼ Imð DA Þ
r2 w ¼ jr Aj2 w þ 2 ðw2 1Þw
½9
divðw2 ðr AÞÞ ¼ 0
½10
curl2 A ¼ J ¼ w2 ðr AÞ
½11
where the gauge [8] is fixed and curl Hap = 0 is assumed. Let S be a surface in bounded by a closed curve @S. Suppose w(x) > 0 on @S. Then from [11], Z :¼ ðJ=w2 þ AÞ ds @S Z Z 1 ¼ J ds þ curl A dS w2 S Z@S r ds ¼ 2d ½12 ¼ @S
where d is an integer; in fact, d = deg(, @S) is the winding number of (@S) in the complex plane. Thus, the identity [12] relates the magnetic field to a topological degree of the order parameter. The quantity , multiplied by an appropriate constant, is called the fluxoid. A connected component of vanishing points of generally has codimension 2 in the domain, and it is called a vortex. From the expression [9], the asymptotic behavior w ! 1 as ! 1 is expected under a suitable condition. Then, by [11], H = curl A enjoys the property curl2 H þ curl H = 0, which is known as the London equation. However, this is valid for jj > 0. Otherwise, a singularity appears around zeros of . There are several characteristic phenomena observed in a bulk superconductor. Typical phenomena are: perfect conductivity (persistent current), perfect diamagnetism (Meissner effect), nucleation of superconductivity, and vortices (quantization of a penetrating magnetic field). These phenomena can be expressed by solutions of the Ginzburg–Landau equations in various settings.
½13
in R 2
½14
2
½15
in R
In the gauge theory, this model can be regarded as a two-dimensional abelian (U(1)) Higgs model. In that context, is a scalar (Higgs) field, A is a connection on the U(1) bundle R2 U(1), and DA is the covariant derivative. Equations [14]–[15] are useful in observing quantization of the magnetic field, although it is an ideal model for superconductivity. By the natural condition that the right-hand side of [13] is finite, we may assume that jDA j, jcurl Aj ! 0 and jj ! 1 as jxj ! 1. From [12], the flux quantization follows: Z curl A dx ¼ 2d ½16 R2
If has a finite number of zeros {aj }N j=1 , [16] implies Z N X curl A dx ¼ 2 degð; @Bðaj ; ÞÞ R2
j¼1
for a small positive number , where B(aj , ) stands for the disk with the center aj and the radius . A zero of represents a vortex, at which the magnetic field is quantized, and a supercurrent moves around the field. To characterize the configuration analytically, we find a solution (, A) expressed by the polar coordinate in the form ¼ f ðrÞ expðid Þ;
AðrÞ ¼ ðrÞð sin ; cos Þ
Substituting these into [14]–[15], one obtains 2 1 0 0 d ðrf Þ f ¼ 2 ðf 2 1Þf r r 0 1 d 0 2 ðrÞ ¼ f r r (0 = d=dr) with the boundary conditions f ð0Þ ¼ 0;
f ð1Þ ¼ 1;
ð1Þ ¼ 0
Ginzburg–Landau Equation 549
This system of the equations has a solution for > 0. In addition to these types of solutions, when pffiffiffi = 1= 2, a special transformation reduces the system of [14]–[15] to a scalar nonlinear equation with a singular term. Then, it is proved that for an arbitrary d 2 Z, under the constraint of [16] there exists a minimizer of [13] with zeros of prescribed jdj points {aj }j¼1 (Jaffe and Taubes 1980).
Solutions for Persistent Current A current flowing in a superconducting ring with no decay even in the absence of an applied magnetic field is called a persistent current. Assume that a superconducting sample in R 3 is surrounded by vacuum and adopt the energy functional as Z 2 Eð; AÞ ¼ jDA j2 þ ð1 jj2 Þ2 dx 2 Z þ
R3
jcurl Aj2
½17
Although the functional [17] is minimized by a trivial solution (, A) = (exp (ic), 0)(c 2 R), which is the case for perfect diamagnetism, this is not the solution describing a persistent current since J = 0 everywhere. We have to look for a nontrivial solution that locally minimizes the energy, that is, a local minimizer of [17]. To characterize a solution representing the persistent current, we define a mapping from to S1 C by x 2 ! (x)=j(x)j for a solution (, A) of the corresponding Ginzburg–Landau equations to [17]. Consider a domain having infinitely many homotopy classes in the S1 ) (e.g., a space of continuous functions C0 (, solid torus). If (, A) is a local minimizer and =jj is not homotopic to a constant map of S1 ), then it is a solution describing a C0 (, persistent current. The existence of such a solution has been established mathematically for large (Jimbo and Morita 1996, Rubinstein and Sternberg 1996).
Configuration of Solutions under an Applied Magnetic Field In the presence of an applied magnetic field, according to the magnitude of the field, a sample exhibits the transition from the superconducting state to the normal state and vice versa. This transition can be considered mathematically as a bifurcation of solutions to the Ginzburg–Landau equations with a parameter measuring the magnitude of the applied magnetic field. In fact, let Hap be an applied magnetic field perpendicular to the
horizontal plane and assume that it is constant along the vertical axis, that is, Hap = (0, 0, Ha ). Then a rich bifurcation structure is suggested by numerical and analytical studies in the parameter space of (Ha , ). Mathematical developments for variational methods and nonlinear analysis reveal the configuration of the solutions and provide rigorous estimates for critical fields in a parameter regime for a two-dimensional model, predicted by physicists. Throughout this section, we consider the Ginzburg– Landau model in an infinite cylinder = D R (D R2 ) with a constant applied magnetic field Hap = Ha e3 = (0, 0, Ha ), Ha > 0. Assuming the uniformity along the vertical axis, we may write A = (A1 , A2 ) and H = curl A = @x1 A2 @x2 A1 as in the previous section. Then the Ginzburg–Landau energy on D is Z 2 Eð; AÞ ¼ jDA j2 þ ð1 jj2 Þ2 2 D 2 þ jcurl A Ha j dx ½18 With the London gauge div A ¼ 0
in D;
An¼0
on @D
the Ginzburg–Landau equations in the present setting are written as D2A ¼ ðjj2 1Þ
in D
½19
r A ¼ Imð DA Þ
in D
½20
n r ¼ 0
on @D
½21
curl A ¼ Ha
on @D
½22
2
where n denotes the outer unit normal. Meissner Solutions
As seen in the case of no applied magnetic field, the trivial solution (, A) = (exp(ic), 0) is a minimizer of [18]. This solution expresses no magnetic field in the sample. In a superconducting sample, the diamagnetism holds even in the presence of an applied magnetic field if the field is weak. Namely, the sample is shielded so that penetration of the field is only allowed near the surface of the sample. This phenomenon is called the Meissner effect. A solution expressing Meissner effect is called a Meissner solution. Mathematically, it is understood that as Ha increases, such a Meissner solution continues from the trivial solution. Then the solution preserves the configuration 0 < j(x)j < 1. A study of the asymptotic behavior of the Meissner solution as tends to 1 shows that the Meissner solution is a
550 Ginzburg–Landau Equation
minimizer up to Ha = O( log ) for sufficiently large (Serfaty 1999). Nucleation of Superconductivity
In an experiment, the Meissner state breaks down by a stronger applied magnetic field. Then the sample turns to be the normal state (in a type I conductor) or it allows a mixed state of superconductivity and normal state (in a type II conductor). In the former case, the critical magnitude of the field is denoted by Hc , which corresponds to the one of [1], while it is denoted by Hc1 in the latter case. Moreover, the mixed state eventually breaks down to be normal state by further increasing the applied field up to another critical field Hc2 . To characterize these two types mathematically, we consider a transition from the normal state to the superconducting state by reducing the magnitude of the field. Let Aap satisfy curl Aap = Ha (x 2 D) and Aap n = 0(x 2 @D). Then eqns [19]–[22] have a trivial solution (, A) = (0, Aap ), which stands for the normal state. Consider the second variation of the energy functional [18] at this trivial solution Z 1 d2 Eðs ; Aap þ sBÞ ¼ jðr iAap Þ j2 2 ds2 D s¼0 2 j j2 þ jcurl Bj2 dx If the minimum of this second variation for nonzero ( , B) is positive (or negative), then the trivial solution is stable (or unstable). The minimum gives the least eigenvalue of the linearized problem of [19]–[20] around the trivial solution. Seeking such a least eigenvalue is reduced to studying an eigenvalue problem of the Schro¨dinger operator L[ ] := (r iAap )2 . If the domain D is the whole space R 2 , it is proved that = Ha . Back to the original variable pffiffiffi of [2], we pffiffiffican define a critical field Hc2 = 2Hc ; = 1= 2 separatespaffiffiffi class of superconductors into type I pby ffiffiffi < 1= 2 (Hc2 < Hc ) and type II by > 1= 2 (Hc2 > Hc ). In the bounded domain D, however, the critical field at which superconductivity nucleates in the interior of a sample is larger than Hc2 (it is denoted by Hc3 ), since the eigenvalue problem of L is considered in the domain with the Neumann boundary condition. A study of the least eigenvalue shows that the critical field has the asymptotics as pffiffiffi Hc3 = 2Hc ¼ þ Oð1Þ; ! 1 where 0 < < 1. If the applied field is very close to Hc3 and is sufficiently large, the amplitude of the eigenfunction associated with the least eigenvalue of
L (with the Neumann boundary condition) is very small except for a 1= neighborhood of the boundary. This implies that the nucleation of superconductivity takes place at the boundary. This phenomenon is called surface nucleation (Del Pino et al. 2000, Lu and Pan 1999). Solutions of Vortices
In a type II superconductor, it is well known that there exists a mixed state of superconductivity and normal state in a parameter regime Hc1 < Ha < Hc2 . In the mixed state, the magnetic field penetrating in the sample is quantized such that it delivers a finite number of lines or curves in the sample. This configuration (called vortex) is characterized by zero sets of the order parameter of the Ginzburg– Landau equations. In a two-dimensional domain, isolating vanishing points of the order parameter are called vortices. Thus, it is quite an interesting problem how such a vortex configuration can be described mathematically by a minimizer of the energy functional. In the section ‘‘Ginzburg–Landau equations in R2 ,’’ a specific configuration for vortex solutions pffiffiffi is stated under very special conditions, = 1= 2, on the whole space and no applied magnetic field. However, this result is not generalized in the present setting. A standard approach to a solution with the vortex configuration is using a bifurcation analysis near the critical field Hc2 (or Hc3 ) by expanding a solution and the difference Ha Hc2 in a small parameter. Then the leading term is given by an eigenfunction of the least eigenvalue of the Schro¨dinger operator coming from the linearization. Under the doubly periodic conditions in the whole space R 2 , the spatial pattern of vortices, called Abrikosov’s vortex lattice, is studied by a local bifurcation theory. However, this kind of bifurcation analysis only works near the critical field and the trivial solution (, A) = (0, Aap ), which implies that only a smallamplitude solution can be found. To realize a sharp configuration of vortices, we need to consider a parameter regime far from the bifurcation point. As a matter of fact, mathematical and numerical studies for sufficiently large exhibit nice configurations of vortex solutions. In this case, in a neighborhood of each vortex, with radius O(1=), a sharp layer arises, and there exists a solution with multivortices in an appropriate parameter region for Ha . In addition, as Ha increases (up to Hc2 ), the number of vortices also increases. This implies that the minimizer of the energy functional [18] admits a larger number of zeros for a higher magnitude of applied magnetic field. However, it is a puzzle since
Ginzburg–Landau Equation 551
a solution with a smaller number of vortices seems to have less energy. Thus, there is some balance mechanism between contributions of the vortices and the applied magnetic field to the total energy. Mathematically, it is possible to estimate E(, A) for the vortex solution to [19]–[22] as follows: consider a family of square tiles Kj with side-length which are periodically arranged over the whole space. Assume each square in the domain D has a single vortex. For an appropriate test function, the energy over Kj is estimated as O( log ( )). Since the number of vortices in the domain is O(jDj= 2 ) (jDj: the measure of D), we obtain an upper bound O((jDj= 2 ) log ( )). This 2 2 bound is less than pffiffiffiffiffiffiE(0, Aap ) = jDj =2 for Ha = = o(1) and = 1= Ha . Although in a general case it is difficult to estimate the energy of the minimizer from below, the leading order can be precisely determined in some range of the interval (Hc1 , Hc2 ) if is sufficiently large (Sandier and Serfaty 2000).
1j;kjdj;j6¼k
where R is derived from a Green function satisfying some boundary condition depending on g. Moreover, as ! 0, the zeros converge to a minimizer of W, which implies that the asymptotic position of every zero (vortex) is determined by the explicit function W. The first term of W shows that vortices with the same sign of the degree are repulsive to one another and the optimal arrangement of vortices never allows the superposition of multivortices. Although the boundary condition is rather artificial, their mathematical formulation promoted the development of variational methods applied to the Ginzburg–Landau equation.
Time-Dependent Ginzburg–Landau Equations
A Simplified Model Since the Ginzburg–Landau equations [4]–[5] are coupled equations for and A, we often encounter mathematical difficulty in realizing a solution with the configuration shown by a numerical experiment. To look at a specific configuration, we may use a simpler model equation. A typical simplification is to neglect the magnetic field, which leads to the equation for the order parameter : r2 þ 2 ð1 j j2 Þ ¼ 0
where {a j } are zeros of and is a universal constant. The function W is explicitly given as X Wða1 ; . . . ; ajdj Þ ¼ 2 log jaj ak j þ R
in
½23
This equation is also called the Ginzburg–Landau equation and it is the Euler–Lagrange equation of the energy Z 2 Gð Þ ¼ jr j2 þ ð1 j j2 Þ2 dx ½24 2 in an appropriate function space. Under no constraint, a constant solution with j j = 1 is a minimizer. If a domain is topologically nontrivial, eqn [23] also allows local minimizers of [24] for large as seen in the section ‘‘Solutions for Persistent Current.’’ On the other hand, [23] in a simply connected domain D R 2 with a boundary condition = g(x)(x 2 @D) is used for a study of a vortex solution for large . Let = 1=. Under the constraint deg(g, @D) = d, a minimizer must have at least jdj zeros. The leading order of the energy around each vortex is estimated as 2 log (1= ). The result of Bethuel et al. (1994) describes the energy for a minimizer Gð Þ ¼ 2jdj logð1= Þ þ þ Wða 1 ; . . . ; a jdj Þ þ oð1Þ
The Ginzburg–Landau equations in the preceding sections are static models. We consider time evolution models called the time-dependent Ginzburg– Landau equations. The evolution equations serve various numerical simulations exhibiting dynamical properties of solutions. They also provide mathematical problems on global time behaviors of solutions, stability of stationary solutions, dynamical laws of vortices, etc. The Ginzburg–Landau energy is denoted by E(u), u = (, A). The simplest model for the time-dependent problem is the gradient flow for E(u) @t u ¼
E u
where E=u is the first variation of the energy. A more standard evolution equation in a nondimensional form is given by ð@t þ iÞ D2A ¼ 2 ð1 jj2 Þ 2
ð@t A þ rÞ þ curl A ¼ Imð DA Þþ curl Hap
½25 ½26
where (x, t) is the electric (scalar) potential and is a positive parameter with a physical quantity. In fact, this equation was derived by Gor’kov and Eliashberg from the Bardeen, Cooper, and Schrieffer (BCS) theory. The system of the equations [25]–[26] is invariant under the following time-dependent gauge transformation: ð; ; AÞ 7! ð expðiÞ; @t ; A þ rÞ
552 Ginzburg–Landau Equation
The equations in the bounded domain D R2 are considered subject to boundary and initial conditions DA n ¼ 0
on @ ð0; TÞ
curl A ¼ Ha
on @ ð0; TÞ
ðx; 0Þ ¼ 0 Aðx; 0Þ ¼ A0 ðxÞ
in in
½27
Then, besides the Coulomb gauge [8], we can choose the Lorentz gauge as follows: Z div A þ ¼ 0 in D; dx ¼ 0 D
An¼0
on @D
For a smooth solution u(x, t) to [25]–[26] with [27], Z d EðuÞ ¼ 2 jð@t þ iÞj2 þ j@t A þ rj2 dx 0 dt holds if Hap is time independent. This is also true in the case of the whole space R2 with a condition for the asymptotic behavior as jxj ! 1. Suppose that a domain R3 is occupied by a superconducting sample and it is surrounded by a medium (or vacuum). Then the electromagnetic behavior in the outside domain, caused by the induced magnetic field of a supercurrent in and an applied magnetic field, should be expressed by the Maxwell equations. With the electric field E = (@t A þ r), we obtain @t E E þ curl2 A ¼ curl Hap
in R3 n
where , , and are physical parameters (e.g., = 0 in the vacuum). To match the inside and the outside of , appropriate boundary conditions are required. From a point of the gauge theory as in the section ‘‘Ginzburg–Landau equations in R 2 ,’’ the following time-dependent equations in the whole space are also considered: ð@t þ iÞ2 D2A ¼ 2 ð1 jj2 Þ @t E þ curl2 A ¼ Imð DA Þ r E ¼ Imð ð@t þ iÞÞ
Other Topics In realistic problems, a superconducting sample contains impurities. This inhomogeneity is usually expressed by putting a variable coefficient into the Ginzburg–Landau energy and the equations. Such a model with a variable coefficient is useful in studies for pinning of vortices, Josephson effect through an
inhomogeneous media, etc. A model in a thin film with variable thickness is also described by the Ginzburg–Landau equations with a variable coefficient. Since the Ginzburg–Landau equations (or a modified model) can be considered in various settings, more applications to realistic problems would be treated by the development of nonlinear analysis. See also: Abelian Higgs Vortices; Bifurcation Theory; Evolution Equations: Linear and Nonlinear; High Tc Superconductor Theory; Image Processing: Mathematics; Integrable Systems: Overview; Interacting Stochastic Particle Systems; Ljusternik–Schnirelman Theory; Nonlinear Schro¨dinger Equations; Quantum Phase Transitions; Variational Techniques for Ginzburg–Landau Energies.
Further Reading Bethuel F, Brezis H, and He´lein F (1994) Ginzburg–Landau Vortices. Boston: Birkha¨user. Chapman SJ, Howison SD, and Ockendon JR (1992) Macroscopic models for superconductivity. SIAM Review 34: 529–560. Del Pino M, Felmer PL, and Sternberg P (2000) Boundary concentration for the eigenvalue problems related to the onset of superconductivity. Communications in Mathematical Physics 210: 413–446. Du Q, Gunzberger MD, and Peterson JS (1992) Analysis and approximation of the Ginzburg–Landau model of superconductivity. SIAM Review 34: 54–81. Helffer B and Morame A (2001) Magnetic bottles in connection with superconductivity. Journal of Functional Analysis 185: 604–680. Hoffmann K-H and Tang Q (2001) Ginzburg–Landau Phase Transition Theory and Superconductivity. Basel: Birkha¨user. Jaffe A and Taubes C (1980) Vortices and Monopoles. Boston: Birkha¨user. Jimbo S and Morita Y (1996) Ginzburg–Landau equations and stable solutions in a rotational domain. SIAM Journal of Mathematical Analysis 27: 1360–1385. Lin FH and Du Q (1997) Ginzburg–Landau vortices: dynamics, pinning and hysteresis. SIAM Journal of Mathematical Analysis 28: 1265–1293. Lu K and Pan X-B (1999) Estimates of the upper critical field for the Ginzburg–Landau equations of superconductivity. Physica D 127: 73–104. Rubinstein J and Sternberg P (1996) Homotopy classification of minimizers of the Ginzburg–Landau energy and the existence of permanent currents. Communications in Mathematical Physics 179: 257–263. Sandier E and Serfaty S (2000) On the energy of type-II superconductors in the mixed phase. Reviews in Mathematical Physics 12: 1219–1257. Serfaty S (1999) Stable configurations in superconductivity: uniqueness, multiplicity, and vortex-nucleation. Archives for Rational and Mechanical Analysis 149: 329–365. Tinkham M (1996) Introduction to Superconductivity, 2nd edn. New York: McGraw-Hill.
Glassy Disordered Systems: Dynamical Evolution
553
Glassy Disordered Systems: Dynamical Evolution S Franz, The Abdus Salam ICTP, Trieste, Italy
A Glimpse of Freezing Phenomenology
ª 2006 Elsevier Ltd. All rights reserved.
Spin Glasses
Introduction Many macroscopic systems if left to evolve in isolation or in contact with a bath, are able to relax, after a finite time, to history-independent equilibrium states characterized by time-independent values of the state variables and time-translation invariance correlations. In glassy systems, the relaxation time becomes so large that equilibrium behavior is never observed. On short timescales, the microscopic degrees of freedom appear to be frozen in far-from-equilibrium disordered states. On longer timescales slow, history-dependent, off-equilibrium relaxation phenomena become detectable. The list of physical systems falling in disordered glassy states at low temperature is long, just to mention a few examples one can cite the canonical case of simple and complex liquid systems undergoing a glass transition, polymeric glasses, dipolar glasses, spin glasses, charge density wave systems, vortex systems in type II superconductors, and many other systems. Experimental and theoretical research has pointed out the existence of dynamical scaling laws characterizing the off-equilibrium evolution of glassy systems. These laws, in turn, reflect the statistical properties of the regions of configuration space explored during relaxation. The goal of a theory of glassy systems is the comprehension of the mechanisms that lead to the growth of relaxation time and the nature of the scaling laws in off-equilibrium relaxation. A well-developed description of glassy phenomena is provided by mean-field theory based on spin glass models, which gives a coherent framework that is able to describe the dynamics of glassy systems and provides a statistical interpretation of glassy relaxation. Despite important limitations of the mean-field description for finite-dimensional systems, it allows precise discussions of general concepts such as effective temperatures and configurational entropies that have been successfully applied to the description of glassy systems. In the following, examples of two different ways of freezing will be discussed: spin glasses, where disorder is built in the random nature of the coupling between the dynamical variables, and structural glasses, where the disordered nature of the frozen state has a self-induced character. These systems are examples of two different ways of freezing.
The archetypical example of systems undergoing the complex dynamical phenomena described in this article is the case of spin glasses (Fischer and Hertz 1991, Young 1997). Spin glass materials are magnetic systems where the magnetic atoms occupy random position in lattices formed by nonmagnetic matrices fixed at the moment of the preparation of the material. The exchange interaction between the spin of the magnetic impurities in these materials is an oscillating function, taking positive and negative values according to the distance between the atoms. Spin glass models (see Spin Glasses, Mean Field Spin Glasses and Neural Networks, and ShortRange Spin Glasses: The Metastate Approach) are defined by giving the form of the exchange Hamiltonian, describing the interaction between the spins Si of the magnetic atoms. In the presence of an external magnetic field h, the exchange Hamiltonian can be written as H¼
X i; j2
Jij Si Sj h
X
Si
½1
i2
The spin variable can have classical or quantum nature. This article will be limited to the physics of classical systems. The most common choice in models is to use Ising variables Si = 1. The couplings Jij , which in real material depend on the distance, are most commonly chosen to be independent random variables with a distribution with support on both positive and negative values. Most commonly, one considers either a symmetric bimodal distribution on {1, 1} or a symmetric Gaussian. The sums are restricted to lattices of various types. The most common choices are = Zd for the Edwards–Anderson model, the complete graph = {(i, j)ji < j; i, j = 1, . . . , N} for the Sherrington– Kirkpatrick (SK) model, and the Erdos–Renyi random graph for the Viana–Bray (VB) model. The presence of interactions of both signs induces frustration in the system: the impossibility of minimizing all the terms of the Hamiltonian at the same time. One then has a complex energy landscape, where relaxation to equilibrium is hampered by barriers of energetic and entropic nature. Spin glass materials, which have a paramagnetic behavior at high temperature, show glassy behavior at low temperature, where magnetic degrees of freedom appear to be frozen for long times in apparently random directions. There is quite a
554 Glassy Disordered Systems: Dynamical Evolution
general consensus, based on the analysis of the experimental data and the numerical simulations, that in three dimensions and in the absence of a magnetic field, the two regimes are separated by a thermodynamic phase transition at a temperature Tc where the magnetic response exhibits a cusp (see Figure 1). By linear response, is related to the equilibrium spin correlation function ¼
1 X 2 Si hSi i2 KB TN i
having denoted by h i the Boltzmann–Gibbs average. A cusp in indicates a second-order transition where thePso-called Edwards–Anderson parameter q = (1=N) i hSi i2 becomes different from zero, indicating freezing of the spins in random directions. In the presence of a magnetic field, although the low-temperature phenomenology is similar to the one at zero field, the thermodynamic nature of the freezing transition is more controversial. Theoretically, mean-field theory, based on the SK model, predicts a phase transition with a cusp in the susceptibility both in the absence and in the presence of a magnetic field. Unfortunately, no firm theoretical result is available on the existence and the nature of phase transitions in finitedimensional spin glass models which is a completely open problem.
5.0 Ag – 1.0% Mn
χ (10–5 emu/g)
4.0
Au – 0.2% Cr (×10)
3.0
Au – 0.5% Mn 2.0
Ag – 0.5% Mn 1.0
0
Cu – 0.1% Mn
1
2
3
4
5
6
7
8
9
10
T(K) Figure 1 Magnetic susceptibility as a function of temperature in spin glass materials. Reproduced from Fischer KH and Hertz JA (1991) Spin Glasses. Cambridge, UK: Cambridge University Press.
Structural Glasses
Analogous freezing of dynamical variables is observed in a variety of systems. Some of them share with the spin glasses the presence of quenched disorder; in many others, this feature is absent. This is the case of structural glasses (Debenedetti 1996). Many liquids under fast enough cooling, instead of crystallizing, as dictated by equilibrium thermodynamics, form glasses. Simple liquids can be modeled as classical systems of particles with pairwise interactions. In the simplest example of a monoatomic liquid, the potential energy of a configuration is then written as X Vðr1 ; . . . ; rN Þ ¼ ðri rj Þ ½2 i 0 is referred to as the lapse function, and where the time-dependent spatial vector field X = X(x, t) with components Xi = (nþ1) g0j gij , where gij denotes the inverse of the spatial metric gij , is referred to as the shift vector field. Let ‘ denote the dimension length. In this article we use the convention that the spatial coordinates (x1 , . . . , xn ) are always dimensionless, but the time coordinate t may have a dimension (see [19] and [36]). Since the line element ds2 [5] has dimension ‘2 and the spatial coordinates are dimensionless, the physical spatial metric coefficients gij also have dimension ‘2 . If the time coordinate t has a dimension, then the dimension of the lapse function N is such that the quantity Ndt has dimension ‘ and the dimension of the shift vector field X is such that the quantity Xdt is dimensionless. We now briefly consider the canonical formulation of Einstein’s equations. For more information regarding this formulation, see Arnowitt, Deser, and Misner (1962) (ADM) or Fischer and Marsden (1972) for a global perspective. We remark that the canonical formulation of gravity itself is local and is valid for any spatial topology of M. However, as we shall see, Hamiltonian reduction of gravity along the lines described in this article requires the topological restriction that M be of negative Yamabe type. The standard definition of the second fundamental form k, or extrinsic curvature, induced on a t = constant hypersurface leads to the coordinate formula 1 @gij kij ¼ Xijj Xjji ½6 2N @t where the vertical bar signifies covariant differentiation with respect to the spatial metric g and spatial indices are raised and lowered using this metric. The
612 Hamiltonian Reduction of Einstein’s Equations
natural momentum variable conjugate to g turns out to be the 2-contravariant symmetric tensor density (that is, is a relative tensor of weight 1) whose components in a positively oriented local coordinate chart (x1 , . . . , xn ), that is, in a chart in the orientation atlas of M, are given by pffiffiffiffiffiffiffiffiffiffiffiffiffi ij ¼ det gkl kij ðtrg kÞgij ½7 where kij = gik gjl kkl is the contravariant form of k, and where ¼ ðg; kÞ ¼ trg k ¼ gij kij
½8
is the trace of the second fundamental form, or the mean (extrinsic) curvature. From the coordinate formula [6] for the extrinsic curvature, we see that the components kij have dimension ‘1 ‘2 = ‘ and thus the mean curvature = trg k = gij kij has the dimension ‘2 ‘ = ‘1 . pffiffiffiffiffiffiffiffiffiffi Let det g denote the (global) scalar density and dg denote the (global) Riemannian measure on M determined by the Riemannian metric g (note that here d is not the exterior derivative). Similarly, let g denote the volume element, a nonvanishing n-form on M, determined by g and the orientation on M. In a positively oriented local coordinate chartffi pffiffiffiffiffiffiffiffiffi ffi pffiffiffiffiffiffiffiffiffiffiffi (xi ) = (x1 , p . ..,ffiffiffiffiffiffiffiffiffiffiffi xn ) ffi on M, ( p detg ) det gij , i) = (x ffiffiffiffiffiffiffiffiffiffiffiffi (dg )(xi ) = det gij dx1 dx2 dxn = detgij dn x, where n dn x = dx1 dx2 p dx Lebesque measure in Rn , ffiffiffiffiffiffiffiffiffiffiffiffiis the 1 and (g )(xi ) = detgij dx ^ dx2 ^ ^ dxn . We adopt the convention of suppressing the coordinate-chart designation (xi ) so that oneffi can,pfor write pffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiexample, ffi pffiffiffiffiffiffiffiffiffiffiffi ffi with some ambiguity detg = ( detg)(xi ) = detgij . We let Z Z Z pffiffiffiffiffiffiffiffiffiffi volðM; gÞ ¼ g ¼ dg ¼ det g dn x ½9 M
M
M
denote the volume of the Riemannian manifold (M, g), given by either the integral of the volume n-form g or the Riemannian measure dg over M, which is given in the last integral in its coordinate form using the suppressed coordinate-chart convention adopted above. As expected, the spatial physical volume has dimension (‘2 )n=2 = ‘n . We shall refer to the canonical variables (gij , ij ) as the physical variables, in contrast to the reduced or conformal variables (ij , (pTT )ij ) to be introduced later. Note that the mean curvature = trg k is a scalar function on M whereas trg is a scalar density on M. Taking the trace of [7] expresses the mean curvature in terms of the canonical variables (g, ), ¼ ðg; Þ ¼ trg k ¼
1 pffiffiffiffiffiffiffiffiffiffi trg ðn 1Þ det g
½10
Using [10], eqn [7] can be inverted to give k in terms of g and , 1 1 ðtrg Þgij ½11 kij ¼ pffiffiffiffiffiffiffiffiffiffi ij ðn 1Þ det g and then combined with [6] to give the kinematical equation @gij 2N 1 ðtrg Þgij ¼ pffiffiffiffiffiffiffiffiffiffi ij ðn 1Þ @t det g þ Xijj þ Xjji
½12
In terms of the canonical variables (g, ), a Hamiltonian form for the action for Einstein’s vacuum field equations can be expressed as Z Z @gij IADM ðg; Þ ¼ dt NHðg; Þ ij @t I M Xi J i ðg; Þ dn x ½13 where I = [t0 , t1 ] R is a closed interval and where the Hamiltonian (scalar) density H(g, ) and the momentum (1-form) density J (g, ) are given by 1 1 2 ðtrg Þ Hðg; Þ ¼ pffiffiffiffiffiffiffiffiffiffi n1 det g ½14 pffiffiffiffiffiffiffiffiffiffi det g RðgÞ 1 1 ¼ pffiffiffiffiffiffiffiffiffiffi gij gkl ik jl ðgij ij Þ2 n1 det g pffiffiffiffiffiffiffiffiffiffi det g RðgÞ ½15 J i ðg; Þ ¼ 2ðg Þi ¼ 2gij jk jk
½16
where is the g-metric contraction of with itself, and where, as above, R(g) is the scalar curvature of the spatial metric. We also note that each of the three terms in the integrand of [13] are global scalar densities and thus can be integrated over M without any further involvement of the metric g. Variation of IADM with respect to the lapse function and shift vector field yields the constraint equations Hðg; Þ ¼ 0
½17
J i ðg; Þ ¼ 0
½18
which comprise that subset of the empty space (n þ 1)-Einstein field equations corresponding to the normal–normal and normal–tangential projections of the Einstein tensor relative to a t = constant initial hypersurface. Variation of IADM with respect to ij reproduces the kinematical equation [12], whereas
Hamiltonian Reduction of Einstein’s Equations
613
variation of IADM with respect to gij generates the complementary tangential–tangential projections of Einstein’s equations. There are no evolution or constraint equations for either the lapse function N or the shift vector field X and therefore these quantities must be fixed by either externally imposed or implicitly defined gauge conditions. A convenient choice, for which a local existence and well-posedness theorem for the corresponding field equations can be established in any dimension n 2, is given indirectly by imposing constancy of the mean curvature and a spatial harmonic gauge condition on each t = constant slice (see Andersson and Moncrief (2003, 2004)). These constant mean curvature spatial harmonic (CMCSH) gauge conditions are given, respectively, by the equations
solving the constraint equations on a constant mean curvature (CMC) hypersurface (see Choquet-Bruhat and York (1980) and Isenberg (1995)). Of major importance for the treatment of Hamiltonian reduction is that if n = 2 and M = S2p , p 2, or if n 3 and M is of negative Yamabe type, then every Riemannian metric g on M is uniquely globally pointwise conformal to a metric which satisfies R() = 1 (see remark above [2]). Thus, from now on, we assume this topological condition on M. In this case, every Riemannian metric g on M can be uniquely expressed as 8 2’ if n ¼ 2 and M ¼ S2p ; p 2 > :’4=ðn2Þ negative Yamabe type
t¼
½19
gij ðkij ðgÞ kij ð^ gÞÞ ¼ 0
½20
with the conformal metric normalized so that R() = 1 and with the specific form of the coefficient conformal factor being chosen to simplify calculations involving the curvature tensors. In the case n 3, ’ is positive and thus the space of all Riemannian metrics on M is parametrized by M1 and the space of scalar functions ’ > 0 on M. The function ’ is then determined by solving the Hamiltonian constraint [17] (see also the remark before [33]). In the given CMC slicing and imposing the vacuum field equations, since by the momentum constraint must have zero divergence (see [16] and [18]), one finds that ij must be expressible in the form
where from [10], is a function of the canonical variables (g, ) and where ^ g is some convenient fixed spatial reference metric (or background metric) on M. The latter condition corresponds to the requirement that the identity map between the Riemannian manifolds (M, g) and (M, ^ g) be harmonic. Neither of these conditions involves the lapse function or shift vector field directly but their preservation in time implemented by the demand that the time derivatives of the given conditions be enforced leads immediately to a linear elliptic system for (N, Xi ) which determines these variables. The foregoing formalism is easily extended to the nonvacuum field equations in the presence of suitable material sources whose field equations are amenable to a constrained Hamiltonian treatment. To simplify the analysis, such sources will be ignored in the present discussion. For the special case of Einstein gravity in (2 þ 1) dimensions, there is an elegant, alternative, triadbased formulation of the action functional as an Isom(R31 )-invariant gauge-theoretic Chern–Simons action, where Isom(R31 ) denotes the full isometry group, or the Poincare´ group (= the inhomogeneous Lorentz group), of (2 þ 1)-Minkowski space R31 . For nondegenerate triads the resulting field equations for this alternative formulation can easily be shown to be equivalent to those of the conventional formalism when the latter is re-expressed in terms of triads but the new formulation allows for meaningful field equations in the case of degenerate triads as well and thus suggests a potentially interesting generalization of the theory (see Carlip (1998) for details). In any dimension n 2, there is a well-known technique, pioneered by Lichnerowicz (1955), for
ij 1 ij ¼ TT þ ðtrg Þgij n
½22
where TT is transverse (i.e., divergence-free) and traceless with respect to g. In the nonvacuum case, ij picks up an additional summand determined by the sources in the modified momentum constraint [18]. Substitution of the foregoing decompositions of (gij , ij ) into the Hamiltonian constraint leads to a nonlinear elliptic equation for ’ which, under the conditions assumed here, determines this function uniquely, provided 6¼ 0. No solutions exist for = 0 (equivalently, trg = 0) since from [14], [17], and [22], the Hamiltonian constraint would then immediately imply that RðgÞ ¼
1 1 TT TT ð Þ ¼ 0 det g det g
½23
everywhere on M, which is not possible for a manifold M of negative Yamabe type. Instantaneous vanishing of the mean curvature, the defining property of a maximal hypersurface, would correspond to a moment at which an expanding universe
614 Hamiltonian Reduction of Einstein’s Equations
ceases to expand or a collapsing universe ceases to collapse. From [23], such behavior is topologically excluded here by the requirement that M be of negative Yamabe type (see also the discussion after [36]). In the unreduced formalism of IADM , the role of a super-Hamiltonian is played by the functional Z Hsuper ðg; Þ ¼ ðNHðg; Þ þ Xi J i ðg; ÞÞ dn x ½24 M
(A , A ) identified. The latter are equivalent, up to a diffeomorphism of the associated reduced phase space, to a complete set of traces of holonomies of the flat Isom(R31 )-connections defined in this Chern– Simons formulation (see Carlip (1998) for more details).
The Reduced Hamiltonian
which evidently vanishes whenever the constraints are satisfied. To achieve a fully reduced formulation wherein again the effective Hamiltonian would vanish, one could endeavor to solve the associated Hamilton–Jacobi equations Hðgij ; S=gij Þ ¼ 0
½25
J k ðgij ; S=gij Þ ¼ 0
½26
for a real-valued functional S = S(g, A ) of the metric g and a set of additional independent parameters A . A complete solution S(gij , A ) would be one for which an arbitrary solution (gij , ij ) of the constraints could be realized as (gij , S=gij ) for a suitable (unique) choice of the A . A complementary set of reduced canonical variables A (the momenta conjugate to the A ’s) could then be defined by A = S=A and one could in principle solve the equations ij ¼
S gij
½27
A ¼
S A
½28
for (A , A ) as functionals of the canonical variables (gij , ij ). This procedure, if it could be carried out, would ensure that these functionals (A (g, ), A (g, )) Poisson-commute with all of the constraints and hence are conserved for an arbitrary slicing of spacetime. Conversely, if a suitable set of gauge conditions such as the CMCSH conditions were imposed, one could in principle solve for the remaining independent canonical variables as functionals of the (A , A ) and an internal variable, such as the mean curvature , which plays the role of time, and hence solve the field equations for (gij , ij ) in the chosen gauge. This proposal is purely heuristic in (3 þ 1) and higher dimensions in that there is no known procedure for finding the needed complete solution of the Hamilton–Jacobi equations in these cases. However, by exploiting the Chern–Simons analogy discussed earlier in this section, a complete solution can be found in (2 þ 1) dimensions and the corresponding complete set of ‘‘observables’’
We continue with the assumption that M is a connected closed oriented n-manifold, with either n = 2 and M = S2p , p 2, or n 3 and M of negative Yamabe type. We now define the reduced phase space as the set of conformal variables given by Preduced ¼ fð; pTT Þ j 2 M1 and pTT is a 2-contravariant symmetric tensor density that is transverse and traceless with respect to g ½29 We remark that the fully reduced phase space is given by Preduced =D0 , where D0 is the group of diffeomorphisms of M isotopic to the identity. However, here, for clarity of exposition, we work on Preduced rather than the fully reduced phase space. Given a scalar function ’, with ’ > 0 if n 3, the physical variables (g, TT ) are related to the conformal variables (, pTT ) by g; TT ( ½30 e2’ ; e2’ pTT if n ¼ 2 ¼ 4=ðn2Þ 4=ðn2Þ TT if n 3 ; ’ p ’ We adopt the convention that raising and lowering of indices on either momentum variable TT or pTT will be with respect to its own conjugate metric, either g or , respectively. With this convention, the mixed forms of TT and pTT are equal, since for n 3, ðTT Þi j ¼ gjl TTil ¼ ’4=ðn2Þ jl ’4=ðn2Þ pTTil ¼ jl pTTil ¼ ðpTT Þi j
½31
(and similarly for the n = 2 case). Thus the squared norms of pTT and TT are equal, pTT pTT ¼ ik jl pTTij pTTkl ¼ gik gjl TTij TTkl ¼ TT TT
½32
where in the first term the center dot is -metric contraction and in the last term the center dot is g-metric contraction. The uniquely determined scalar factor ’ relating the physical metric g to the conformal metric is obtained by solving the Hamiltonian constraint
Hamiltonian Reduction of Einstein’s Equations
equation [17]. In the special case that pTT = 0 (or equivalently, from [30], that TT = 0), ’ is constant and is given in the n 3 case by ðn2Þ=4 n ’¼ ½33 ðn 1Þ 2 Thus in this case ¼ ’4=ðn2Þ g ¼
ðn 1Þ 2 g n
½34
In particular, since has the dimension ‘1 (see the remark after [8]) and the components gij have the dimension ‘2 , we see from this formula that the conformal metric ij is dimensionless. Although ’ is not constant in the general case when pTT 6¼ 0, its dimension, as in [33], is still ‘(n2)=2 and thus the components ij are still dimensionless in the general case. Since in the conventions used in this article, the spatial coordinates are dimensionless, the volume vol(M, ) of the Riemannian manifold (M, ), as well as all curvature tensors of , are also dimensionless. Having a dimensionless conformal metric with a dimensionless volume has its advantages over the physical metric g with dimension ‘2 inasmuch, as we shall see below, an infimum of the volume of the conformal metric is related to a dimensionless topological invariant of M (see [48] and the remark thereafter). If one now uses the conformal variables given by [30] and the decomposition [22] in the ADM action given by [13], one finds the reduced action to be Z Z @ij 2ðn 1Þ @ pffiffiffiffiffiffiffiffiffiffi Ireduced ¼ dt pTTij det g n @t @t I M 2 @trg n þ ½35 d x n @t In this expression one can discard the final time derivative which contributes only a boundary integral and so does not contribute to the equations of motion. Moreover, the conformal metric ij is constrained to lie in the intersection of M1 and a slice for the action of D0 on M1 . This space can be regarded as a local chart for the reduced configuration space T = M1 =D0 , under the technical assumption that T is a manifold. Thus, taken together, the conformal variables (ij , pTTij ) can be viewed as local canonical coordinates for the cotangent bundle T T of Teichmu¨ller space T , where T T now plays the role of the reduced phase space. For n = 2, these constructions can be carried out globally for the Teichmu¨ller space T p of an arbitrary closed oriented surface S2p , p 2 (see the remarks after [4]). Using these global constructions, the
615
reduced phase space T T p for the (2 þ 1)-reduced Einstein equations can be modeled explicitly. Having restricted the slices to be CMC, one need only choose the relationship between the time coordinate and the CMC in order to fix a corresponding reduced Hamiltonian. The most natural choice of time coordinate from the present point of view is to take t ¼ tðÞ ¼
2 nðÞn1
½36
Note that this choice of time coordinate, although also denoted t, is no longer dimensionless but has dimension ‘n1 . This choice of time coordinate is motivated by three considerations. Firstly, we remark that since = 0 is excluded in the setting used in this article (see [23] and the discussion after), can range in either the domain R = (1, 0) or Rþ = (0, 1). The usual convention on the sign of k, as adopted here, is that the sign of k is negative when the tips of the normals on a spacelike hypersurface are further apart than their bases, as for example in the expansion of a model universe, in which case = trg k < 0. Thus, with this convention, in the range R corresponds to an expanding universe and in the range Rþ corresponds to a collapsing one in the future direction of increasing t. Thus for manifolds of negative Yamabe type that we consider here, the expected maximal range of the CMC is R for which ! 1 corresponds to a ‘‘crushing singular’’ big bang of vanishing spatial volume and ! 0 corresponds to the limit of infinite volume expansion. Then, with the time function given by [36], the coordinate time t ranges in the interval Rþ , vanishes at the big bang, and tends to positive infinity in the limit of infinite cosmological expansion. We remark that to prove that a solution determined by Cauchy data prescribed at some initial coordinate time t0 2 Rþ actually exhausts the range Rþ is a difficult global existence problem that is not dealt with here. Nevertheless, one of the main motivations for this work is the hope that Hamiltonian reduction will lead to advances in the study of the global existence question for Einstein’s equations. We also remark that with the choice of temporal gauge function given by [36] and with in its natural range R , d n ¼ ðÞn > 0 dt 2ðn 1Þ
½37
so that this temporal coordinate choice preserves the time orientation of the flow for all n 2.
616 Hamiltonian Reduction of Einstein’s Equations
Secondly, with this choice of temporal gauge, the reduced action given by [35] simplifies to Z Z @ ij Ireduced ¼ dt pTTij @t I M p ffiffiffiffiffiffiffiffiffiffi ðÞn det g dn x ½38 from which one can read off an effective reduced Hamiltonian density, pffiffiffiffiffiffiffiffiffiffi ½39 Hreduced ð; ; pTT Þ ¼ ðÞn det g
measure where d is the Riemannian p ffiffiffiffiffiffiffiffiffiffi on M determined by (locally, d = det dn x) and ’ = ’(, , pTT ) is the conformal factor which, through the solution of the Hamiltonian constraint [17], is expressed as a function of the ‘‘time’’ t and the independent conformal (or canonical) variables (, pTT ): In the special case n = 2, M = S2p , p 2, a simple formula for Hreduced can be derived. In terms of the conformal variables (, pTT ), we find from [40], [10], [14], [17], [21], [22], and [32] that Hreduced ð; ; pTT Þ Z Z ¼ ðdetgÞ1 TT TT RðgÞ dg ðÞ2 dg ¼ 2
and an effective reduced Hamiltonian, Z pffiffiffiffiffiffiffiffiffiffiffi Hreduced ð; ; pTT Þ ¼ ðÞn det g dn x M Z dg ¼ ðÞn
2p
¼2
2p
M
¼ ðÞn volðM; gÞ
½40
R
where vol(M, g) = M dg is the volume of the Riemannian manifold (M, g). Thus in terms of the physical variables (gij , ij ), the reduced Hamiltonian Hreduced at ‘‘time’’ is simply the volume of the CMC slice with mean curvature rescaled by the factor ()n . With this reduced Hamiltonian density, the reduced action [38] takes the canonical form Z Z @ ij Hreduced dn x ½41 pTTij Ireduced ¼ dt @t I M As the third consideration for the given choice of the time function, we note that rescaling the physical volume vol(M, g) by the factor ()n yields a dimensionless quantity. Indeed, as we have seen, the spatial physical volume has the dimension ‘n and the constant mean curvature has the dimension ‘1 , so that the reduced Hamiltonian ()n vol(M, g) is dimensionless. The main advantage of having a dimensionless reduced Hamiltonian is that only such a reduced Hamiltonian can have a topological significance, and indeed, the infimum of Hreduced is closely related to a dimensionless topological invariant of M (see the remarks after [48]). In terms of the conformal variables (, pTT ), the reduced Hamiltonian is found from [21] and [40] to be given for n 3 by Z pffiffiffiffiffiffiffiffiffiffi n TT Hreduced ð; ; p Þ ¼ ðÞ det g dn x M Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detð’4=ðn2Þ Þ dn x ¼ ðÞn ZM pffiffiffiffiffiffiffiffiffiffi n ¼ ðÞ ð’4=ðn2Þ Þn=2 det dn x ZM n ¼ ðÞ ’2n=ðn2Þ d ½42 M
2p
Z
¼2
Z
2p
¼2
Z
2p
ðdetðe2’ ÞÞ1 pTT pTT dðe2’ Þ 2
Z 2p
RðgÞdg
ðe2’ Þ2 ðdet Þ1 pTT pTT e2’ d 8 ðS2p Þ e2’ ðdet Þ1 pTT pTT d þ 16ðp 1Þ
½43
where ’ = ’(, , pTT ), (S2p )= 2(1 p) is the Euler characteristic of the genus p surface S2p , and where we have used the Gauss–Bonnet theorem Z RðgÞ dg ¼ 4 ðS2p Þ ¼ 8ð1 pÞ ½44 2p
Since Hreduced ð; ; pTT Þ Z ¼2 e2’ ðdet Þ1 ðpTT pTT Þd 2p
þ 16ðp 1Þ 16ðp 1Þ
½45
the infimum of Hreduced is attained precisely when pTT = 0 and this infimum coincides with the topological invariant 8 (S2p ) = 16(p 1), which characterizes the surface S2p (see also [51] below). As we shall see shortly, an analogous result holds for n 3. A straightforward but lengthy calculation, which is valid in arbitrary dimensions, shows that the reduced Hamiltonian is strictly monotonically decreasing in the direction of cosmological expansion except for a family of continuously self-similar spacetimes for which this Hamiltonian is constant (Fischer and Moncrief 2002b). The latter solutions exist if and only if M admits a Riemannian metric 2 M1 which is an Einstein metric, that is, for which the Ricci tensor satisfies Ric() = (1=n). Using the mean curvature as a convenient time coordinate, that is, temporarily taking t = , the
Hamiltonian Reduction of Einstein’s Equations
corresponding self-similar vacuum spacetime metrics then have the line element n 2 n 1 ij dxi dxj ½46 ds2 ¼ 2 d 2 þ ðn 1Þ 2 In the case that n = 3, the Einstein metric is actually hyperbolic with constant sectional curvature K() = 1=6 and Ricci curvature Ric() = (1=3). Although the conformal variables (, pTT ) = (, 0) are static in this model, the physical variables (g, ) are not. In this case, the resulting spacetimes (which depend on the underlying topology of M) have expanding closed hyperbolic spacelike hypersurfaces where the physical volume vol(M, g) ‘‘starts’’ at zero at the big bang and expands to infinity in the forward time direction, as befits a universe endlessly expanding from the big bang. Such a universe is depicted in Figure 1, where the genus-2 surface is used to represent a generic closed hyperbolic 3-manifold. The Bianchi and Thurston types of this model are discussed in the next section. The line element [46] is locally isometric to the vacuum Friedmann–Lemaıˆtre–Robertson–Walker (FLRW) k = 1 spacetime, which is well known to be flat. Although these spatially compactified models are technically not classical FLRW spacetimes since the expanding compact hypersurfaces are not homogeneous (and thus not isotropic), they are Lorentz-covered by the FLRW k = 1 spacetime and thus are locally isometric to this classical spacetime. The same result leading to [46] holds even if matter sources are allowed, provided they satisfy a suitable energy condition, in which case the corresponding reduced Hamiltonian will only be stationary in the vacuum limit and then only when the metric is of the above type; otherwise it monotonically decays. This result even has a quasilocal
617
generalization expressible in terms of the corresponding quasilocal reduced Hamiltonian defined for an arbitrary domain D within the CMC slice = constant by restricting Hreduced in [42] to the domain D , so that for n 3, Z n TT HD ð; ; p Þ ¼ ðÞ dg D Z ¼ ðÞn ’2n=ðn2Þ d ½47 D
If D is determined from its specification on some initial slice = 0 , by letting the domain flow along the normal trajectories of the CMC foliation, one can then verify that HD is monotonically decreasing except for the vacuum solutions of self-similar type described above, in which case HD is constant. This result is independent of the initial domain chosen. We remark that one cannot use the quasilocal Hamiltonian to get equations of motion (even quasilocally) since the full true Hamiltonian is nonlocal and so one gets contributions from the whole manifold. Since the reduced Hamiltonian Hreduced as well as its quasilocal variant HD is monotonically decreasing for generic solutions of Einstein’s equations, it is natural to ask what its infimum is and whether this infimum is ever attained, at least asymptotically, by solutions of the field equations. The infimum of the reduced Hamiltonian for n 3 and for a spatial manifold M of negative Yamabe type can be characterized in terms of a certain topological invariant of M called the sigma constant (M) of M. For manifolds of negative Yamabe type, this quantity can be defined in terms of the infimum of the volume of all metrics which range over the space of conformal metrics M1 . The precise definition leads to the formula 2=n ½48 ðMÞ ¼ inf volðM; Þ 2M1
t
y
x Figure 1 Expansion of the physical universe in the Bianchi V, Thurston type H 3 , spatially compactified FLRW flat spacetime cosmology.
Interestingly, this equation defines the topological invariant (M) by a purely geometrical equation involving the volume functional restricted to M1 . We also remark that [48] is a dimensionless equation, the left-hand side being dimensionless since it is a topological invariant of M and the right-hand side being dimensionless since the conformal metric and its volume are dimensionless (see the remarks after [34]). Although the -constant can be defined for all Yamabe types, [48] holds only for manifolds of negative Yamabe type. From this equation, one can conclude that for such manifolds ðMÞ 0
½49
618 Hamiltonian Reduction of Einstein’s Equations
One can relate the foregoing to the reduced Hamiltonian by showing that the infimum of Hreduced defined for arbitrary < 0 as a functional on the reduced phase space M1 T T ¼T ½50 D0 is given by Hreduced ð; ; pTT Þ inf 8 > 8 ðS2p Þ ¼ 16ðp 1Þ > < n=2 ¼ n > > ððMÞÞ : n1
ð;pTT Þ2T T
if n ¼ 2 and p 2
½51
if n 3
where for n 3, M is of negative Yamabe type and thus (M) 0 (see [49]). One proves this result by first showing that within an arbitrary fiber of the cotangent bundle T (M1 =D0 ), one minimizes Hreduced by setting the fiber variable pTT to zero. In this case, the solution for the conformal factor ’ reduces to a spatial constant which is a function of alone (see [33]), and thus the formula for Hreduced given in [42] reduces to n n=2 Hreduced ð; ; 0Þ ¼ volðM; Þ ½52 n1 The infimum over all conformal metrics 2 M1 of this latter functional yields the -constant as outlined above. If matter sources obeying a suitable energy condition are allowed, the argument goes through in much the same way with the additional implication that the infimum is achieved only for a vacuum solution so that in fact the matter must be ‘‘turned off.’’ Thus, as a consequence of the above analysis, one has Hreduced ð; ; pTT Þ
n n=2 volðM; Þ Hreduced ð; ; 0Þ ¼ n1 n n=2 inf volðM; 0 Þ 0 2M1 n1 n n=2 ððMÞÞ ¼ n1
½53
where the last equality follows by inverting [48] to give inf volðM; Þ ¼ ððMÞÞn=2
2M1
1. If M is hyperbolizable, then (M) < 0 is attained by a hyperbolic metric h 2 M1 , unique up to diffeomorphism, and the sequence of conformal metrics {i } converges to this metric in a suitable function space topology. 2. If M is a pure graph manifold, then (M) = 0 and the sequence {i } of conformal metrics ‘‘volume collapses’’ M with bounded curvature. Typically this occurs through collapse of circular or toroidal fibers in the associated circle or 2-torus bundle structure (see examples 3, 4, and 5 in the section ‘‘Topological Background’’ and see also the penultimate section). The six manifolds of flat type are not included here as they are of zero Yamabe type. 3. If M is a generic K(p, 1)-manifold (not of type 1 or 2 above), then M can be decomposed along incompressible tori into its final finite-volumetype hyperbolizable and (possibly empty set of) graph-manifold pieces. In this case, (M) < 0 and the sequence {i } of conformal metrics collapses the graph-manifold components and converges to finite-volume complete hyperbolic metrics on the hyperbolizable components (normalized to have R() = 1) yielding a -constant that is entirely determined by the volumes of these final hyperbolic components (see the final section). We shall return to this conjectured characterization of sequences of conformal metrics in the next two sections.
½54
Moreover, if 2 M1 actually achieves the -constant, that is, if vol(M, ) = ((M))n=2 (and not just asymptotically approaches it as a curve or sequence), then must be an Einstein metric with 1 RicðÞ ¼ n
If, additionally, n = 3, then must be hyperbolic (with constant sectional curvature K() = 1=6). Although Thurston’s conjectures do not refer to the -constant, Anderson (1997) has been able to reformulate and somewhat refine the Thurston geometrization conjectures for 3-manifolds of arbitrary Yamabe type in terms of conjectured properties of the -constant. Additionally, if Perelman’s results are technically complete, they would provide a proof of Anderson’s conjectures as well as those of Thurston’s (see Anderson (2003)). The conjectured behavior for a sequence of conformal metrics {i }, i 2 M1 , i = 1, 2, . . . , which seeks to minimize the volume of a stand-alone K(p, 1) 3-manifold M of negative Yamabe type can be described as follows:
½55
Reduction of Bianchi Models and Conformal Volume Collapse For manifolds of negative Yamabe type, the strict monotonic decay of Hreduced in the direction of cosmological expansion along nonconstant integral curves of the reduced Einstein equations suggests
Hamiltonian Reduction of Einstein’s Equations
that the reduced Hamiltonian is seeking to achieve its infimum inf Hreduced = ((n=(n 1))((M)))n=2 . But does this ever happen? Does the reduced Einstein flow of the conformal geometry asymptotically approach inf Hreduced in the limit of infinite cosmological expansion? To answer this question, one can consider for n = 3 known locally homogeneous vacuum solutions of Einstein’s equations which spatially compactify to manifolds of negative Yamabe type. Applying the theory of Hamiltonian reduction to these classical models, one can show that the reduced Hamiltonian behaves as expected under the reduced Einstein flow defined by these models. Since these models existed long before this theory, it is somewhat satisfying to see that they can be interpreted in terms of Hamiltonian reduction and how, with this interpretation, new properties of these classical solutions can be found. Since Hreduced is a strictly monotonically decreasing function along nonconstant integral curves of the reduced Einstein flow, it is expected that under certain conditions, the reduced Hamiltonian is monotonically seeking to decay to its infimum. Thus, it is of interest to look at Hamiltonian reduction under the consequence of the following two assumptions:
As a consequence of these assumptions, it follows from [53] that the conformal volume vol(M,) must also decay to its infimum [54] (although not necessarily monotonically), volðM; ðtÞÞ ! inf volðM; Þ 2M1
¼ ððMÞÞ
½58
as t ! 1 ½59
Consequently, the curve of conformal metrics (t) must undergo some form of degeneration as its volume collapses. The details of this metric degeneration are of importance and are discussed below. Not all locally homogeneous vacuum Bianchi models admit spatially compact quotients. Fortunately, the general theory of which Bianchi models admit spatially compact quotients has been worked out in detail by Tanimoto, Koike, and Hosoya (see Tanimoto et al. (1997) and the references therein). These Bianchi models together with their corresponding Thurston classification and typical examples of their closed quotient manifolds are listed in Table 1, where ‘‘K–S’’ indicates ‘‘Kantowski–Sachs,’’ ‘‘P,’’ ‘‘Z,’’ and ‘‘N’’ denote manifolds of Yamabe type positive, zero, and negative, respectively (see the section ‘‘Topological Background’’), ‘‘Seifert’’ means Seifert fibered, ‘‘Hyper’’ means hyperbolizable, ‘‘?’’ indicates ‘‘unknown, but conjectured to be so,’’ and ‘‘manifold collapse’’ denotes the type of collapse that the conformal manifold (M, (t)) goes through as the conformal volume vol(M, (t)) collapses. We also remark that all of the manifolds
½56
From [40] and [51], in terms of the physical variables (g, ) (or (g, k)), [56] can be written equivalently as 3=2 3 3 3 volðM;gÞ ¼ ðtrg kÞ volðM;gÞ! ððMÞÞ 2 as t!1
as t ! 1
volðM; ðtÞÞ ! ððMÞÞ3=2 ¼ 0
Hreduced ððtÞ; ðtÞ; pTT ðtÞÞ as t ! 1
3=2
Now suppose that (M) = 0. A large class of manifolds for which this is true are the graph manifolds (and thus also the Seifert manifolds) of negative Yamabe type since (M) 0 for graph manifolds in general and since (M) 0 for manifolds of negative Yamabe type. In this case the curve (t) 2 M1 of conformal metrics must necessarily (conformally) volume collapse M in the direction of cosmological expansion,
1. The reduced Einstein field equations give rise to the existence of a positive semiglobal nonconstant solution ((t), pTT (t)) defined for all t 2 (0, 1) (or equivalently, for all 2 (1, 0); 2. The reduced Hamiltonian strictly monotonically decays to its infimum along nonconstant integral curves, ! inf Hreduced
619
½57
Table 1 Bianchi, Thurston, and Yamabe type of a connected closed oriented irreducible 3-manifold Bianchi type
Thurston type 2
Typical examples 2
1
Yamabe type
-constant
Manifold structure
K–S IX
S R S3
S S Nontrivial S 1 -bundles over S 2
P P
>0 >0
Seifert Seifert
I
R3
T3
Z
0
Seifert
II III VIII VI0 V, VIIh
Nil H2 R gR) SL(2, Sol H3
Nontrivial S 1 -bundles over T 2 S2p S 1 , p 2 Nontrivial S 1 -bundles over S2p Nontrivial T 2 -bundles over S 1 Closed hyperbolizable manifolds
N N N N N
0 0 0 0 0 and of a neighborhood of x0 , and so should violate the maximality (resp. minimality) condition for u. The previous argument can be easily adapted to show something more general: if u is a maximal (resp. minimal) subsolution then no subtangents (resp. supertangents) to u at any y 2 TN can be local strict subsolutions at y, that is, strict subsolutions in some neighborhood of y. The subtangency (resp. supertangency) condition of a function to u at a point x0 means that x0 is a local minimizer (resp. maximizer) of u . We denote by D u(x0 ) (resp. Dþ u(x0 )) the sets made up by the differentials of the C1 -subtangent (resp. supertangent) to u at x0 . They are (possibly empty) closed convex subsets of @u(x0 ). It is apparent that if Dþ u(x0 ) 6¼ ; 6¼ D u(x0 ) then u is differentiable at x0 and Dþ u(x0 ) = D u(x0 ) = {Du(x0 )}. It is an immediate consequence of the previous fact that no extremal subsolutions can exist in S a , whenever [1] admits a strict subsolution, say , since there are global minimizers and maximizers of u , for any u 2 S a , because of the compactness of TN . The function is then subtangent and supertangent, respectively, to u at such points. The unique value we can look at for finding extremal subsolutions is therefore c ¼ inf fa 2 R: S a 6¼ ;g
½6
The set on the right-hand side of [6] is nonempty since the null function belongs to S a when a > maxTN H(x, 0), and bounded from below by minTN H(x, 0). The value c is consequently well defined by [6]. Moreover, any sequence un 2 S an , with an decreasing and convergent to c, is equi-Lipschitz– continuous because of the coercivity of H, and equibounded, up to addition of suitable constants. It is therefore uniformly convergent, up to a subsequence, to some u, which belongs to S an , for any n, since these classes are stable for the uniform convergence. This implies that u is a subsolution to [4], so that S c 6¼ ;. The critical value c is then characterized by the property that the corresponding eqn [4] admits subsolutions but not strict subsolutions. Our aim is to show that extremal subsolutions do exist for the critical eqn [4].
Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects 639
For any supercritical value a, that is, a c, we can define the functional nonsymmetric semidistance: Sa ðy; xÞ ¼ supfuðxÞ uðyÞ: u 2 S a g
in some neighborhood y of y contained in 0y . Thanks to the compactness of TN , we can extract from {y } a finite subcover {yi }, i = 1, . . . , m, for some m 2 N, and define
¼ supfuðxÞ: u 2 S a ; uðyÞ ¼ 0g N
for any x, y in T . It is immediate that Sa satisfies the triangle inequality and Sa (y, y) = 0 for any y. But it fails, in general, to be symmetric and positive if x 6¼ y. We will nevertheless call it a distance, in the sequel, to ease terminology. The function x 7! Sa (y, x) is itself a subsolution to [1], for any y, being the pointwise supremum of a family of equibounded subsolutions. Taking into account the inequality uðxÞ uðyÞ Sa ðx; yÞ which holds for any u 2 S a , and the fact that it becomes an equality by setting u = Sa (x, ), we also get Sa ðx; yÞ ¼ inffuðxÞ uðyÞ: u 2 S a g ¼ inffuðxÞ: u 2 S a ; uðyÞ ¼ 0g and Sa ( , y) is, as well, a subsolution to [1]. Note that Sa ðx; yÞ þ Sa ðy; xÞ 0
for any y; x
½7
The interest of introducing the distance Sa in the present context is that, for any a c and y 2 TN , the function x 7! Sa (y, x) (resp. x 7! Sa (x, y)) satisfies the maximality (resp. minimality) condition for subsolutions of [1] in any open set not containing y. If, by contradiction, the maximality property of Sa (y, ) were violated in some open set with y 62 by a satisfying [5] then one could make the set {x: (x) > Sa (y, x)} nonempty and compactly contained in , by adding a suitable constant. Hence, the formula maxf; Sa ðy; Þg in u¼ ½8 Sa ðy; Þ otherwise could provide a subsolution to [1] with u(y) = Sa (y, y) = 0 and u > Sa (y, ) at some point of , which is in contrast with the very definition of Sa . One can similarly prove the minimality condition for Sa ( , y). We now focus our attention on the critical case. We derive from the previous considerations that if a maximal subsolution to [4] does not exist then, for any y, we can find a neighborhood 0y of y where Sc (y, ) fails to be maximal. We can thus construct, through a formula like [8], a uy 2 S c with ess supy Hð; Duy ðÞÞ < c
½9
u¼
X
i uyi
i
P where i are positive constants with m 1 i = 1. The convex character of the Hamiltonian and [9] imply that u is a strict critical subsolution, which cannot be. We therefore conclude that there is a nonempty subset of y, denoted henceforth by A, for which Sc (y, ) is indeed a maximal critical subsolution. It can also be proved, by exploiting some stability properties of the maximal subsolutions, that A is closed. Similarly, Sa ( , y) must be a minimal critical subsolution for some y. We denote by A the closed set made up by such points. The previous covering argument shows that if then there is a local strict y 62 A (resp. y 62 A) critical subsolution at y. The converse is also true: let in fact be such a strict subsolution satisfying (y) = Sc (y, y) = 0; then is subtangent to Sc (y, ) (resp. supertangent to Sc ( , y)) at y, by the very definition of the distance Sc . This shows that Sc (y, ) (resp. Sc ( , y)) is not a maximal (resp. minimal) Since critical subsolution, and so y 62 A (resp. y 62 A). the previous characterization holds for both A and it follows that A = A. This set is a generalization A, of the (projected) Aubry set. We will come back on this point later on. We also see from the covering argument that there is a critical subsolution , which is strict outside A, that is, such that ess sup H(x, D(x)) < c for any open set compactly contained in TN nA. This implies that any y such that {p: H(y, p) c} has empty interior, belongs to A. The empty interior condition in fact implies, thanks to the strict quasiconvexity of H, that the sublevel set reduces to a singleton, say {p0 }. We know that @u(y) {p: H(y, p) c}, for any u 2 S c ; therefore, @u(y) is a singleton and so any critical subsolution u is strictly differentiable at y with H(y, Du(y)) = H(y, p0 ) = c. Hence, there cannot be critical subsolutions which are strict around y. The previously described points will be called, in the sequel, equilibria, and the (possibly empty) closed set made up by them will be denoted by E. The reason of this terminology will be explained later. The differentiability property of the critical subsolutions at equilibria, can be extended, quite surprisingly, to any point of A, under more stringent assumptions on H. We will discuss this issue in the next section.
640 Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects
Qualitative Properties of Generalized Aubry Set
Duðx0 Þv ¼ lim
h!0þ
We introduce some dynamical aspects in the picture by showing that the distances Sa , defined in the previous section for any a c, are actually of length type, in the sense that Sa (y, x) equals, for any pair y, x, the infimum of the intrinsic length of absolutely continuous, or equivalently Lipschitz-continuous, curves joining y to x. By intrinsic length, we mean the total variation of Sa on the curve. It will be denoted by ‘a , while ‘ will indicate the natural (i.e., Euclidean) length. For this purpose, we proceed to give a lineintegral representation formula of Sa . To start with, we consider a C1 subsolution u to [1], some x, y TN and a (Lipschitz-continuous) curve , defined in some compact interval I, joining y to x. We have uðxÞ uðyÞ ¼
Z
Du _ dt I
Z
_ dt a ð; Þ
1
0
¼ a ðx0 ; vÞ This implies by Hahn–Banach theorem that Du(x0 ) 2 Za (x) or, in other terms, that u = Sa (y, ) 2 S a . We then derive, from [11] and the very definition of Sa , Z 1 _ dt: Sa ðy; xÞ ¼ inf ð; Þ defined in ½0; 1 with ð0Þ ¼ y; ð1Þ ¼ x
½10
Inequality [10] also holds for a Lipschitz-continuous subsolution to [1] through suitable replacement of the differential by the generalized gradient. The setvalued map Za is compact convex valued, by the coercivity and quasiconvexity assumptions on H, and continuous with respect to the Hausdorff metric. The function a is accordingly continuous in the first variable, and convex and positively homogeneous in the second, being a support function. This implies in particular that the integral on the right-hand side of [10] is invariant under change of parameter preserving the orientation. We derive, from [10], Z
h!0
I
Za ðxÞ :¼ fp: Hðx; pÞ ag
Sa ðy; xÞ inf
Sa ðx0 hv; x0 Þ h h!0þ Z 1 1 lim a ðx0 hvt; hvÞ dt h!0þ h 0 Z 1 ¼ limþ a ðx0 hvt; vÞ dt lim sup
0
(x, v) 2 TN RN , a (x, v) :=
where, for any maxp2Za (x) pv and
uðx0 hvÞ uðx0 Þ h
_ dt: defined a ð; Þ
0
in ½0; 1 and joining y to x
½11
for any y, x. We denote by Sa (y, x) the quantity on the right-hand side of [11]. It is immediate that the triangle inequality holds for Sa . The function u := Sa (y, ) is, moreover, Lipschitz-continuous since a (x, v)=jvj is bounded from above in TN (RN n{0}) because of the coercivity of H. Given v 2 RN , we exploit the definition of Sa , the continuity of a , and the triangle inequality for Sa , to get at any differentiability point x0 of u,
Taking into account that the integral functional appearing in the previous formula is lower semicontinuous for the uniform convergence of equiLipschitz-continuous sequence of curves, by standard variational results, we in turn infer that it equals the intrinsic length ‘a . Mathematically, Z _ dt ‘a ðÞ ¼ a ð; Þ I
for any compact interval I and any curve defined in I. Since Sa is just a semidistance, we do not have any a priori information on the sign of ‘a ; however, by [10], the intrinsic length of any cycle must be nonnegative. Furthermore, while j‘a ()j must be small for any curve with small natural length, by the coercivity condition on H, no converse estimates hold, in general. If a > c, some information in this direction can be gathered by taking a strict subsolution to [1], that it can be assumed smooth, up to regularization by mollification, then D(x)v a (x, v) jvj for any (x, v) 2 TN RN , and some > 0, and consequently Z _ DðÞÞ _ dt ‘a ðÞ ða ð; Þ I
þ ðxÞ ðyÞ ‘ðÞ Sa ðx; yÞ
½12
for any pair y, x and any curve , defined in some interval I, joining y to x. The previous formula says, in particular, that when jx yj is small then any curve whose intrinsic length approximates Sa (y, x) must have small natural length. The previous
Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects 641
argument cannot be extended to the critical case. This gap suggests the next definition. The main purpose for introducing it is to get a metric characterization of the Aubry set A. We say that Sc is localizable at some y if for every " > 0 there is 0 < " < " such that Sc ðy; xÞ ¼ inff‘c ðÞ: joins y to x and ‘ðÞ < "g
½13
whenever jx yj < " . If y 62 A, we adapt the argument previously used in the strict subcritical case to get that Sc is indeed localizable at y. In this case we have, in fact, at our disposal a critical subsolution, say , which is strict in some neighborhood of y, thanks to the characterization of the Aubry set given in the previous section. We assume, to simplify, to be C1 ; under the natural condition of Lipschitz-continuity, generalized gradients should be used in place of differentials. We have D(x)v (x, v) jvj for any x 2 , any v 2 RN , and some > 0, and D(x)v (x, v), for any x, v. Exploiting these inequalities, we obtain an estimate analogous to [12] for curves starting from y, which allows us to prove [13]. Conversely, let y 62 E be a point where Sc is localizable. We claim that Zc (y) D u(y), where u := Sc (y, ). It is enough to show that any p0 in the interior of Zc (y) belongs to D u(y), since D u(y) is closed. Note that the interior of Zc (y) is nonempty since we are assuming that y is not an equilibrium. Such a p0 belongs to the interior of Zc (x) for x sufficiently close to y, thanks to the continuity of Zc ; consequently, p(x y) < ‘c () for any x close to y and any curve joining y to x with ‘() sufficiently small. Taking into account [13], we then deduce pðx yÞ Sc ðy; xÞ for x close to y and so the linear function (x) := p0 (x y) is subtangent to u at y. This in turn implies that y is out of A since is a local strict critical subsolution at y, and so Sc (y, ) cannot be a maximal subsolution by the characterization given in the previous section. The fact that Sc is not localizable at any point of y 2 AnE leads to the announced metric characterization of A. If y is such a point, there is an " > 0, a point x, with jx yj, and so jSc (y, x)j, as small as desired, and a curve joining y to x with ‘c () Sc (y, x) and ‘() > ". We construct a cycle , passing through y, by juxtaposition of and the Euclidean segment joining x to y. We obtain, in this way, a sequence of cycles n , passing through y, with length ‘c (n ) ! 0 and ‘(n ) ", for any n. The same result can also be obtained for y 2 E. In this case we select " > 0 and v0 2 R N with c (y, v0 ) = 0, and denote by Bn a sequence of
Euclidean balls, centered at y, satisfying c ( , v0 ) < 1=n in Bn . We construct a sequence of cycles, passing through y, by going up and down on the line {y þ sv} in such a way that n (t) 2 Bn , for every t, and " < ‘(n ) < 2"; therefore 0 ‘c (n ) < 2"=n. Conversely, such a sequence of cycles cannot exist at any y 62 A because Sc is localizable at y. We emphasize that the previous definition of A through cycles and the fact that Sc is not localizable at any point y 2 A with intZc (y) 6¼ ; shows that, apart for the special case of equilibria, the property of being a point of A is definitively not of local nature. As pointed out already, if y 62 A, and so Sc is localizable at y, then Zc (y) D u(y), where u := Sc (y, ); on the other hand, we know that D u(y) @u(y) and @u(y) Zc (y), where the latter inclusion holds since u is a critical subsolution. We then derive D uðyÞ ¼ @uðyÞ ¼ Zc ðyÞ We interpret these inequalities as a convexity–type property, or, to use a more appropriate terminology, a semiconvexity property of the distance function Sc (y, ) at y. The same property holds for the Euclidean distance function jxj at 0. A contrasting phenomenon takes place if y 2 A, namely Sc (y, ) is semiconcave at y, which means that Dþ u(y) = @u(y). This is more complicated to prove (see Fathi 2005b), and requires, in addition, H to be strictly convex in p and locally Lipschitzcontinuous in (x, p). Under these assumptions one can, more generally, show that Sc (y, ) is semiconcave in TN , if y 2 A, while it is semiconcave in TN n{y} and semiconvex in y, if y 62 A. Some important consequences can be deduced. First, thanks to the semiconcavity property there are C1 supertangents to u := Sc (y, ) at y, whenever y 2 A. Such a function, say , is also supertangent to Sc ( , y), which is a minimal critical subsolution, at the same point. We know from the previous section that no supertangents to Sc ( , y) at y can be strict critical subsolution locally at y, and so H(y, D(y)) = c. This implies that Dþ u(y) is contained in the boundary of Zc (y). We then see, taking into account that Dþ u(y) is convex and Zc (y) strictly convex, that Dþ u(y) reduces to a singleton, and so, by the semiconcavity property, @u(y) reduces to a singleton. Therefore, Sc (y, ) is strictly differentiable at y, for any y 2 A. One can similarly show that Sc ( , y) is strictly differentiable at y. Second, given y 2 A and a critical subsolution w, which can be assumed, up to addition of a constant, to vanish at y, we see that Sc (y, ) (resp. Sc ( , y)) is supertangent (resp. subtangent) at y because of its extremality properties. Since both these
642 Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects
super(sub)-tangents are differentiable, by the previous point, we deduce that w itself is differentiable at y. Moreover, the differentials at y of all three functions under consideration, namely Sc (y, ), Sc ( , y), and w, coincide. In particular, H(y, Dw(y)) = c, and y 7! Dw(y) is continuous on A, since Sc (y, ) has been proved to be strictly differentiable at y, whenever y 2 A. Any critical subsolution, restricted to A, is consequently a continuously differentiable solution to [4]. Summing up, we have discovered (under the assumption of strict convexity and Lipschitzcontinuity for H) that every critical subsolution is differentiable on A, and the differential on A is the same for every critical subsolution. A continuous map G : A ! RN is then defined by taking G(y) equal to the common differential of any critical subsolution at y. We denote by A~ the graph of G, which is a subset of the cotangent bundle of TN , identified with TN RN . As we have already pointed out, the existence of a C1 subsolution to [1] is obvious when a > c, and such a subsolution can be obtained through a suitable regularization by mollification of any strict subsolution. The same construction cannot be performed at the critical level, since no strict critical subsolutions are available to start the regularization procedure. We can nevertheless show the existence of C1 critical subsolutions by exploiting the information gathered on the Aubry set. We start by considering a countable locally finite open cover of TN nA, {i }; we know from the previous section that there is a critical subsolution, say wi , which is strict on i , for any i. Loosely speaking, we have some space, also in this case, for regularizing wi in such a way that the regularized function is still a critical subsolution, at least on i . We can glue together, with some precautions, these regularized local critical subsolutions through a C1 partition of the unity, to produce a critical subsolution which is C1 outside A. Using the fact that any critical subsolution is differentiable on A, we can further adjust the previous construction so that the critical subsolution is C1 on the whole TN . We state this result in the following way: if the equation [1] has a subsolution then it also has a C1 subsolution. It is worth noticing that it holds even if the underlying manifold is noncompact (see Fathi (2004, 2005b)).
The Intrinsic Lengths and the Action Functional Here we assume H to satisfy all the usual assumptions in order to define the Hamilton’s equations [2]
and to have the completeness of the associated Hamiltonian flow. Namely, we require H to be C2 in both variables, C2 -strictly convex, that is, Hpp > 0 in TN RN and superlinear, in the sense that lim
jpj!þ1
Hðx; pÞ ¼ þ1 jpj
uniformly in x
We define the Lagrangian L as the Fenchel transform of H. It takes finite values thanks to the superlinearity condition, and, in addition, inherits, from H, C2 regularity, C2 -strict convexity and superlinearity. In our setting, the Fenchel transform is involutive. We call a vector v0 and a covector p0 conjugate at a point x if v0 = Hp (x, p0 ), and so L(x, v0 ) = p0 v0 H(x, p0 ). This also implies the relations p0 = Lv (x, v0 ) and H(x, p0 ) = p0 v0 L(x, v0 ). If H(x, p0 ) = a, for some a, then p0 v0 = a (x, v0 ), and p0 is the unique element of Za (x0 ) for which such a relation holds. Since the function y 7! p0 v0 H(y, p0 ) is subtangent to L( , v0 ) at x, we see that Lx (x, v0 ) = Hx (x, p0 ). We introduce, for any (Lipschitz-continuous) curve defined in [a, b], for some a < b, the action functional A() through Z _ dt AðÞ ¼ Lð; Þ I
We say that the curve is a minimizer of the action if A() A() for any defined [a, b] and with the same end points of . It is a classical result in calculus of variations that any of such minimizers is of class C2 and satisfies the Euler–Lagrange equation d _ ¼ Lx ð; Þ _ Lv ð; Þ dt
in a; b½
Consequently, and the conjugate curve ˙ satisfy the Hamilton’s equations [2]. = Lv (, ) Note that all the integral curves of [2] lie in a fixed level of the Hamiltonian, which is compact by the superlinearity condition. The corresponding Hamiltonian flow is consequently complete. We show that if x0 2 E, and Zc (x0 ) = {G(x0 )}, then (x0 , G(x0 )) is a steady state of the Hamiltonian flow. In this case, in fact, c = minp H(x0 , p) and so L(x0 , 0) = c and Hp (x0 , G(x0 )) = 0, or equivalently G(x0 ) and 0 are conjugate at x0 . Taking into account that c is the critical value, we have that Lðx; 0Þ ¼ min Hðx; pÞ c p
for any x 2 TN
so that x0 is a minimizer of x 7! L(x, 0) and Lx (x0 , 0) = Hp (x0 , G(x0 ) = 0. It is easy to see that,
Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects 643
conversely, if (x0 , p0 ) is a steady state of the Hamiltonian flow and H(x0 , p0 ) = c then x0 2 E and p0 = G(x0 ). We want to establish a relation between A( ) and the length functionals ‘a defined in the previous section for a c. This will allow, among other things, to show that the Aubry set A~ is invariant for the Hamiltonian flow and to analyze the properties of the integral curves lying on it. To this aim, we consider the minimal geodesics for Sa , a c, that is, the curves, defined on compact intervals, whose intrinsic lengths ‘a equal the distance Sa between their end points. If a > c, we claim that, given any pair of points in TN , there is a minimal geodesics joining them. Recalling the formula [12], whose validity depends on the fact that in the strict supercritical case there is a smooth strict subsolution to [1], we have ‘a ðÞ ! þ1
whenever
‘ðÞ ! þ1
The claim is then proved by using the Ascoli theorem and the lower-semicontinuity property of ‘a . In the critical case, given y 62 A, we can use the same argument to deduce the existence of minimizing geodesics for Sc between y and any point x sufficiently close to y (in the Euclidean sense). This comes from the fact that Sc is localizable at y, and so any sequence of curves n with ‘c (n ) ! Sc (y, x) has bounded natural length. For a general pair of points, we will show, on the contrary, that existence of a minimal geodesic is not guaranteed in the critical case. We consider a minimizing geodesics for Sa between a pair of points y and x. We assume a > c or a = c and \ E = ;. We want to show that is a minimizing curve for the action, up to a change of parameter. We choose the new parameter in such a way that _~ þ a ¼ ð; ~ Þ ~ Þ ~_ Lð; a
Furthermore, we show that the function ˜ for s 2 u := Sa (y, ) is strictly differentiable at (s), ]0, T[, and _~ ¼: ~ Þ ~ ¼ Lv ð; DuðÞ
˜ Du()) ˜ is a solution of the in [0, T]. Hence, (, Hamilton’s equations in ]0, T[. To see this, we start from the relations Z t Z t d ~ ~ ~ Þ ~_ ds uððsÞÞ ds ¼ uððtÞÞ ¼ c ð; 0 ds 0 Z t _~ds ½16 ¼ 0
which hold in [0, T] because u is Lipschitz-continuous, ˜ is a minimizing geodesic, and (s) is _~ ˜ conjugate to (s) at (s) for any s 2 [0, T]. We know that d ~ _~ uððsÞÞ ¼ pðsÞ ds ~ for a:e: s and some p 2 @uððsÞÞ ˜ We have that p 2 Zc ((s)), since u is a critical subsolution, and so _~ ððsÞ; _~ ~ ðsÞÞ ~_ pðsÞ ¼ ðsÞðsÞÞ c We see, in the light of [16], that equality must hold in the previous formula, for a.e. s. Therefore, d ~ _~ uððsÞÞ ¼ ðsÞ ðsÞ ds
~ ¼ ‘a ðÞ aT ‘a ðÞ aT AðÞ AðÞ for any defined in [0, T] with (0) = y, (T) = x. ˜ This proves the announced minimality property of .
for a:e: s
½17
_~ ) is we derive from the fact that the function ( )( ˜ continuous that (u(( )) is actually continuously differentiable in ]0, T[ and that [17] holds for any s. We finally exploit that u is semiconcave in TN n{y}, as pointed out in the previous section, and so ˜ ˜ Dþ u((s)) = @u((s)), for any s. If is a C1 ˜ supertangent to u at (s) then d ~ _~ ~ ðsÞÞ DððsÞÞ ¼ uððsÞÞ ds
½14
where we have denoted by ˜ the reparametrized _~ curve. Since ˜ stays away from E, the velocities jj are bounded from below by a positive constant and ˜ denoted by [0, T], is so the domain of definition of , ˜ since the a compact interval. Note that ‘a () = ‘a (), intrinsic length is invariant under change of parameter. We take into account that is a minimal geodesic and the inequality L(x, v) þ a a (x, v), which holds for any x, v, to get
½15
accordingly, _~ ¼ ðsÞðsÞ ~ pðsÞ
~ for any s and p 2 @uððsÞÞ
˜ ˜ ˜ Since @u((s)) Zc ((s)), this implies that @u((s)) = {(s)}. This actually gives the strict differentiability ˜ ˜ function u at (s), and Du((s)) = (s) for any s. The same argument works, with some adjustment, also when a = c and \ E 6¼ ;. If, for instance, y 62 E, t0 = min {t: (t) 2 E}, then by reparametrizing in [0, t0 ], as indicated in [14], we get a curve ˜ defined in [0, þ1[ which is a minimizer of the action functional in any compact interval contained in [0, þ1[. Moreover, u := Sc (y, ) is strictly
644 Hamilton–Jacobi Equations and Dynamical Systems: Variational Aspects
˜ Du()) ˜ is a solution differentiable in ]0, þ1[ and (, of the Hamilton’s equations. We proceed to investigate the properties of the Hamiltonian flow on A. We take a y0 in AnE, and consider a sequence n of cycles passing through y0 with ‘c (n )) ! 0, ‘(n ) 2 , for some positive . Such a sequence does exist in view of the characterization of A through cycles given in the previous section. Moreover, we assume that the n are parametrized by the natural arc length in [ Tn , Tn ], for some Tn , and satisfy n (0) = y0 for any n. There is then defined a uniform limit curve in [ , ], up to a subsequence, thanks to the Ascoli theorem. The idea is to construct a new sequence of cycles n by replacing the portion of the n between and
by , and pasting this new piece with the remainder of n through Euclidean segments at the end points. The n are still of infinitesimal intrinsic length ‘c , which shows, in particular, that is contained in A. By exploiting that Sc is a length distance, that the n are cycles, and the formula [7], with a = c, we get ‘c ðn Þ Sc ðð Þ; ð ÞÞ þ Sc ðð Þ; ð ÞÞ 0 for any n, and we at last derive
Note that the second equality is actually redundant. By reparametrizing , as in [14], with a = c, in some open interval containing 0 as interior point and contained in [ , ], we get a curve contained in AnE, denoted by , defined on some open interval I and satisfying Aðj½s;t Þ þ cðt sÞÞ ¼ ‘c ðj½s;t Þ for any t > s
See also: Control Problems in Mathematical Physics; Dynamical Systems in Mathematical Physics: An Illustratrion form Water Waves; KAM Theory and Celestial Mechanics; Minimax Principle in the Calculus of Variations; Optimal Transportation; Stability Theory and KAM.
Further Reading
‘c ðÞ ¼ Sc ðð Þ; ð ÞÞ ¼ Sc ðð Þ; ð ÞÞ
¼ Sc ððtÞ; ðsÞÞ
characterization since if, conversely, a curve satisfies [18] then it must be contained in A. As an application, we finally show that there cannot be minimal geodesics, for the critical metric Sc , joining a point of A, say y, to some x 62 A, at least when E = ;. If such a geodesic, say , exists, and is defined in [0, T], for some T > 0, then (, Du()) is a solution of the Hamilton’s equations, up to a change of parameter, where u := S(y, ), satisfying the initial conditions (0) = y0 , (0) = limt!0þ Du((t)). The last relation tells us that (0) 2 @u(y) and, since u is differentiable at y 2 A with Du(y) = G(y), we conclude that (0) = G(y). Therefore, (, Du()) is a part of the integral curve of the Hamiltonian flow starting at (y, G(y)) that we know, by the above ~ which is in reasoning, to be contained in A, contradiction with (T) = x 62 A.
½18
This, in particular, shows that is a minimizer of the action functional in any [s, t] I. If we denote, as ˙ we have, usual, by the curve conjugate to , arguing as above, that (t) is the differential of the function Sc ((s), ) at (t), but, since the differentials of all critical subsolutions coincide on A, we finally get that (t) = G((t) for every t 2 I. Therefore, (, G()) is a solution of the Hamilton’s equation in ~ The same properties can be I and is contained in A. extended on the whole R. Taking into account that if y 2 E then (y, G(y)) is a steady state of the Hamiltonian flow, we in the end see that A~ is foliated by integral curves of the Hamiltonian flow (, G()), with enjoying the variational property [18]. This is indeed a
Arnold VI, Kozlov VV, and Neishtadt AI (1988) Mathematical Aspects of Classical and Celestial Mechanics, Encyclopedia of Mathematical Sciences. Dynamical Systems III. New York: Springer. Contreras G and Iturriaga R (1999) Global Minimizers of Autonomous Lagrangians, 22nd Brazilian Mathematics Colloquium. IMPA: Rio de Janeiro. Evans LC (2004) A survey of partial differential methods in weak KAM theory. Communications on Pure and Applied Mathematics 57: 445–480. Evans LC and Gomes D (2001) Effective Hamiltonians and averaging for Hamilton dynamics I. Arch. Rat. Mech. Analysis 157: 1–33. Evans LC and Gomes D (2002) Effective Hamiltonians and averaging for Hamilton dynamics II. Arch. Rat. Mech. Analysis 161: 271–305. Fathi A Weak Kam Theorem in Lagrangian Dynamics. Cambridge: Cambridge University Press (to appear). Fathi A and Siconolfi A (2004) Existence of C1 critical subsolutions of the Hamilton–Jacobi equations. Inventiones Mathematicae 155(2): 363–388. Fathi A and Siconolfi A (2005) PDE aspects of Aubry–Mather theory for quasiconvex Hamiltonians. Calculus of Variations and Partial Differential Equations 22(1): 185–228. Forni G and Mather J (1994) In: Graffi S (ed.) Action Minimizing Orbits in Hamiltonian Systems (transition to Chaos in Classical and Quantum Mechanics), Lecture Notes in Mathematics, 1589. Springer. Weinan E (1999) Aubry–Mather theory and periodic solutions of the forced Burgers equation. Communications on Pure and Applied Mathematics 52: 811–828.
High Tc Superconductor Theory
645
Hard Hexagon Model see Eight Vertex and Hard Hexagon Models
High Tc Superconductor Theory ª 2006 Elsevier Ltd. All rights reserved.
Introduction
500
500
400
400
300
300 YBa2Cu3O236 + x
Nd2–xCexCuO 200
200
100
AFM
SC
Temperature (K)
The phenomenon of superconductivity is one of the most profound manifestations of quantum mechanics in the macroscopic world. The celebrated Bardeen–Cooper–Schrieffer (BCS) theory (Bardeen et al. 1957) of superconductivity (SC) provides a basic theoretical framework to understand this remarkable phenomenon in terms of the pairing of electrons with opposite spin and momenta to form a collective condensate state. This theory does not only quantitatively explain the experimental data of conventional superconductors, the basic concepts developed from this theory, including the concept of spontaneous broken symmetry, the Nambu–Goldstone modes and the Anderson–Higgs mechanism provide the essential building blocks for the unified theory of fundamental forces. The discovery of hightemperature superconductivity (HTSC) in the copper oxide material poses a profound challenge to theoretically understand the phenomenon of superconductivity in the extreme limit of strong correlations. While the basic idea of electron pairing in the BCS theory carries over to the HTSC, other aspects like the weak coupling mean field approximation and the phonon mediated pairing mechanism may not apply without modifications. Therefore, HTSC system provides an exciting opportunity to develop new theoretical frameworks and concepts for strongly correlated electronic systems. To date, a number of different HTSC materials have been discovered. The most studied ones include the hole-doped La2x Srx CuO4þ (LSCO), YBa2 Cu3 O6þ (YBCO), Bi2 Sr2 CaCu2 O8þ (BSCO), Tl2 Ba2 CuO6þ (TBCO) materials and the electron-doped Nd2x Cex CuO4 (NCCO) material. All these materials have a two-dimensional (2D) CuO2 plane, and have an antiferromagnetic (AF) insulating phase at half-filling. The magnetic properties of this insulating phase is well approximated by the antiferromagnetic Heisenberg model with spin S = 1=2 and an AF exchange constant J 100 meV. The Neel temperature for the 3D AF ordering is approximately given by TN 300
500 K. The HTSC material can be doped either by holes or by electrons. In the doping range of x< 15%, there is an SC phase with a dom-like 5% < shape in the temperature versus doping plane. The maximal SC transition temperature Tc is of the order of 100 K. The generic phase diagram of HTSC is shown in Figure 1. One of the main questions concerning the HTSC phase diagram is the transition region between the AF and the SC phases. Partly because of the complicated material chemistry in this regime, there is no universal agreement among different experiments. Different experiments indicate several different possibilities, including phase separation with an inhomogeneous density distribution, uniform coexistence phase between AF and SC and periodically ordered spin and charge distributions in the form of stripes or checkerboards. The phase diagram of the HTSC cuprates also contains a regime with anomalous behaviors conventionally called the pseudogap phase. This region of the phase diagram is indicated by the dashed lines in Figure 1. In conventional superconductors, a pairing gap opens up at Tc . In a large class of HTSC cuprates, however, an electronic gap starts to open up at a temperature much higher than Tc . Many experiments indicate that the pseudogap ‘‘phase’’ is
Temperature (K)
S-C Zhang, Stanford University, Stanford, CA, USA
100
AFM SC
0.30 0.20
0.1 x
0.0 0.2 0.4 0.6 0.8 1.0 x
Figure 1 Phase diagram of the of the NCCO and the YBCO superconductors.
646 High Tc Superconductor Theory
not a true thermodynamical phase, but rather the precursor towards a crossover behavior. The SC phase of the HTSC has a number of striking properties not shared by conventional superconductors. First of all, phase-sensitive experiments indicate that the SC phase for most of the cuprates has d wave like pairing symmetry. This is also supported by the photoemission experiments which show the existence of the nodal points in the quasiparticle gap. Neutron scattering experiments find a new type of collective mode, carrying spin 1, lattice momentum close to (, ), and a resolutionlimited sharp resonance energy around 20–40 meV. Most remarkably, this resonance mode appears only below Tc of the optimally doped cuprates. Another property uniquely different from the conventional superconductors is the vortex state. Most HTSCs are type II superconductors where the magnetic field can penetrate into the SC state in the form of a vortex lattice, where the SC order is destroyed at the center of the vortex core. In conventional superconductors, the vortex core is filled by the normal metallic electrons. However, a number of different experimental probes, including neutron scattering, muon spin resonance (sR), and nuclear magnetic resonance (NMR), have shown that the vortex cores in the HTSC cuprates are antiferromagnetic, rather than normal metallic. This phenomenon has been observed in almost all HTSC materials, including LSCO, YBCO, TBCO, and NSCO, making it one of the most universal properties of the HTSC cuprates. The HTSC materials also have highly unusual transport properties. While conventional metals have a T 2 dependence of resistivity, in accordance with the predictions of the Fermi liquid theory, the HTSC materials have a linear T dependence of resistivity near optimal doping. This linear T dependence extends over a wide temperature window, and seems to be universal among most of the cuprates. When the underdoped or sometimes optimally doped SC state is destroyed by applying a high magnetic field, the ‘‘normal state’’ is not a conventional conducting state, but exhibits insulator-like behavior, at least along the c-axis. This phenomenon may be related to the insulating AF vortices mentioned in the previous paragraph. The discovery of HTSC has greatly stimulated the theoretical understanding of superconductivity in strongly correlated systems. There are a number of promising approaches, partially reviewed in Dagotto (1994), Imada et al. (1998), and Orenstein and Millis (2000), but an universally accepted theory has not yet emerged. This article focuses on a particular theory, which unifies the AF and the SC phases of the HTSC cuprates based on an approximate SO(5)
symmetry (Zhang 1997). The SO(5) theory draws its inspirations from the successful application of symmetry concepts in theoretical physics. All fundamental laws of Nature are statements about symmetry. Conservation of energy, momentum, and charge are direct consequences of global symmetries. The form of fundamental interactions is dictated by local gauge symmetries. Symmetry unifies apparently different physical phenomena into a common framework. For example, electricity and magnetism were discovered independently, and viewed as completely different phenomena before the nineteenth century. Maxwell’s theory, and the underlying relativistic symmetry between space and time, unify the electric field E and the magnetic field B into a common electromagnetic field tensor F . This unification shows that electricity and magnetism share a common microscopic origin, and can be transformed into each other by going to different inertial frames. As discussed previously, the two robust and universal ordered phases of the HTSC are the AF and the SC phases. The central question of HTSC concerns the transition from one phase to the other as the doping level is varied. The SO(5) theory unifies the 3D AF order parameter (Nx , Ny , Nz ) and the 2D SC order parameter (Re, Im) into a single, 5D order parameter called ‘‘superspin,’’ in a way similar to the unification of electricity and magnetism in Maxwell’s theory: 0
0 B Ex F ¼B @ Ey Ez
1 Re B Nx C C C B C , na ¼ B Ny C ½1 C A B @ Nz A 0 Im 1
0 Bz By
0 Bx
0
This unification relies on the postulate that a common microscopic interaction is responsible for both AF and SC in the HTSC cuprates and related materials. A well-defined SO(5) transformation rotates one form of the order into another. Within this framework, the mysterious transition from the AF and the SC as a function of doping is explained in terms of a rotation in the 5D order parameters space. Symmetry principles are not only fundamental and beautiful, they are also practically useful in extracting information from a strongly interacting system, which can be tested quantitatively. The approximate SO(5) symmetry between the AF and the SC phases has many direct consequences, which can be, and some of them have been, tested both numerically and experimentally. The commonly used microscopic model of the HTSC materials is the repulsive Hubbard model, which describes the electronic degrees of freedom in
High Tc Superconductor Theory
the CuO2 plane. Its low-energy limit, the t J model is defined by H ¼
X
ðcy ðxÞc ðx0 Þ t 0 hx;x i
þJ
X
½2
where the term t describes the hopping of an electron with spin from a site x to its nearest neighbor x0 , with double occupancy removed, and the J terms describe the nearest-neighbor exchange of its spin S. The main merit of these models does not lie in the microscopic accuracy and realism, but rather in the conceptual simplicity. However, despite their simplicity, these models are still very difficult to solve, and their phase diagrams cannot be compared directly with experiments. The idea of the SO(5) theory is to derive an effective quantum Hamiltonian on a coarse-grained lattice, which contains only the superspin degrees of freedom. The resulting SO(5) quantum nonlinear -model is much simpler to solve using the standard field theoretical techniques, and the resulting phase diagram can be compared directly with experiments.
SO(4) Symmetry of the Hubbard Model Before presenting the full SO(5) theory, let us first discuss a much simpler toy model, namely the negative U Hubbard model, which has an SC ground state with s-wave pairing. However, it also has a charge-density-wave (CDW) ground state at half-filling. The competition between CDW and the SC states is similar to the competition between AF and SC states in the HTSC cuprates. In the negative U Hubbard model, the CDW/SC competition can be accurately described by a hidden symmetry, namely the SO(4) symmetry of the Hubbard model. The Hubbard model is defined by the Hamiltonian X
ðcy ðxÞc ðx0 Þ þ h:c:Þ
hx;x0 i
þU
X x
n" ðxÞ
X 1 1 n# ðxÞ n ðxÞ 2 2 x
ðÞx c" ðxÞc# ðxÞ;
½3
where c (x) is the fermion operator and n (x) = cy (x)c (x) is the electron density operator at site x with spin , t, U, and are the hopping, interaction, and the chemical potential parameters, respectively. The Hubbard model has a pseudospin SU(2) symmetry generated by the operators
þ ¼ ð Þy
x
1X 1 ¼ n ðxÞ ; 2 2
hx;x0 i
H ¼t
X
z
þ h:c:Þ
SðxÞ Sðx0 Þ
¼
647
½4
½ ; ¼ i
where = x iy and = x, y, z. The model is defined on any bipartite lattice, and the lattice function ()x takes the value 1 on even sublattice and 1 on odd sublattice. These operators commute with the Hubbard Hamiltonian at half-filling when = 0, that is, [H, ] = 0; therefore, they form the symmetry generators of the model (Yang and Zhang 1990). Combined with the standard SU(2) spin rotational symmetry, the Hubbard model enjoys an SO(4) = SU(2) SU(2)=Z2 symmetry. This symmetry has important consequences in the phase diagram and the collective modes in the system. In particular, it implies that the SC and CDW orders are degenerate at half-filling. The SC and the CDW order parameters are defined by X ¼ c" ðxÞc# ðxÞ; þ ¼ ð Þy x
z ¼
1X ð1Þx n ðxÞ; 2 x
½5 ½ ; ¼ i
where = x iy . The last equation of [5] shows that the operators perform the rotation between the SC and CDW order parameters. Thus, is the pseudospin generator and is the pseudospin order parameter. Just like the total spin and the Neel order parameter in the AF Heisenberg model, they are canonically conjugate variables. Since [H, ] = 0 at = 0, this exact pseudospin symmetry implies the degeneracy of SC and CDW orders at half-filling. The phase diagram of the U < 0 Hubbard model is identical to the phase diagram of the AF Heisenberg model in a uniform magnetic field. If the AF order parameter originally points along the z-direction, a magnetic field applied along the z-direction causes the AF order parameter to flop into the xy-plane. This transition is called the spinflop transition, and is depicted in Figures 2a and 2c. The chemical potential in the negative U Hubbard model plays a role similar to the magnetic field in the AF Heisenberg model. It transforms a CDW state at half-filling to an SC state away from halffilling, as depicted in Figures 2a and 2c. In the low-energy sector, both the AF Heisenberg model in a magnetic field and the negative-U Hubbard model with a chemical potential can be described by the SO(3) nonlinear -model, which is defined by the following Lagrangian density (in
648 High Tc Superconductor Theory
B 储 zˆ
μ B or µ
(a)
(b)
(c)
Figure 2 The spin-flop transition. (a) The spin-flop transition of the AF Heisenberg model. When a uniform magnetic field is applied along the direction of the AF moments, there is no net gain of the Zeeman energy. Therefore, after a critical value of the magnetic field, the AF spin component flops into the xy-plane, while a uniform spin component aligns in the direction of applied magnetic field. (b) The Mott insulator to superfluid transition of the hardcore boson model or the U < 0 Hubbard model. At half-filling, one possible state is the CDW state of ordered boson pairs. Upon doping, the pairs become mobile and form the superfluid state. (c) Both transitions can be described by the spin or the pseudospin flop in the SO(3) nonlinear -model, induced either by the magnetic field or by the chemical potential.
imaginary time coordinates) for a unit vector field n with n2 = 1:
L ¼ !2 þ ð@i n Þ2 þ VðnÞ 2 2 ½6 ! ¼ n ð@t n iB n Þ ð ! Þ where the magnetic field, or equivalently the chemical potential, is given by B = (1=2) B . and are the susceptibility and stiffness parameters, and V(n) is the anisotropy potential, which can be taken as V(n) = (g=2)n2z . Exact SO(3) symmetry is obtained when g = B = 0. g > 0 corresponds to easy axis anisotropy, while g < 0 corresponds to easy plane anisotropy. In the case of g > 0, there is a phase transition as a function of Bz with Bx = By = 0. To see this, let us expand out the first term in [6]. The timeindependent part contributes to an effective potential Veff
B2 2 ðnx þ n2y Þ ¼ VðnÞ 2
from which pffiffiffiffiffiffiffiffi we see that there is a phase transition at Bc1 = g= . For B < Bc1 , the system is in the Ising phase, while for B > Bc1 , the system is in the XY phase. Therefore, tuning B for a fixed g > 0 leads to the spin-flop transition. In D = 2, both the XY and the Ising phase can have a finite-temperature phase transition into the disordered state. However, because of the Mermin–Wagner theorem, a finitetemperature phase transition is forbidden at the point B = g = 0, where the system has an enhanced SO(3) symmetry. This SO(3) symmetric point leads to a large regime below the mean field transition temperature where the fluctuation dominates. This large fluctuation regime can be identified as the pseudogap behavior. The pseudospin SU(2) symmetry of the negative-U Hubbard model has another important consequence. Away from half-filling, the operators no longer commute with the Hamiltonian, but they are eigenoperators of the Hamiltonian, in the sense that ½H; ¼ 2
½7
This means that the operators create well-defined collective modes with energy 2. Since they carry charge 2, they usually do not couple to any physical probes. However, in an SC state, the SC order parameter mixes the operators with the CDW operator z , via eqn [5]. From this reasoning, a pseudo-Goldstone mode was predicted to exist in the density response function at wave vector (, ) and energy 2, which appears only below the SC transition temperature Tc .
Unification of Antiferromagnetism and Superconductivity through the SO(5) Theory Order Parameters and SO(5) Group Properties
The negative U Hubbard model and the SO(3) nonlinear -models discussed in the previous section give a nice description of the quantum phase transition from the Mott insulating phase with CDW order to the SC phase. On the other hand, these simple models do not have enough complexity to describe the AF insulator at half-filling and the SC order away from half-filling. Therefore, a natural step is to generalize these models so that the Mott insulating phase with the scalar CDW order parameter is replaced by a Mott insulating phase with the vector AF order parameter. The pseudospin SO(3) symmetry group considered previously arises from the combination of one real scalar component of the CDW order parameter with one complex, or two real components of the SC order parameter. After replacing the scalar CDW order parameter by the three components of the AF order parameter, and combining with the two components of the SC order parameters, we are naturally led to consider a five-component order parameter vector, and the SO(5) symmetry group which transforms it. It is simplest to define the concept of the SO(5) symmetry generator and order parameter on two sites with fermion operators c and d , respectively,
High Tc Superconductor Theory
where = 1, 2 is the usual spin index. The AF order parameter operator can be naturally defined in terms of the difference between the spins of the c and d fermions as follows: N ¼ 12 ðcy c dy dÞ; n3 N2 ;
n2 N1
n4 N 3
½8
where are the Pauli matrices. In view of the strong on-site repulsion in the cuprate problem, the SC order parameter should be naturally defined on a bond connecting the c and d fermions, explicitly given by i y y y 1 c d ¼ ðcy" d#y þ cy# d"y Þ; ¼ 2 2 y y þ ; n5 n1 2 2i y
649
The superspin order parameter na , the associated SO(5) generators Lab , and their commutation relations can be expressed compactly and elegantly in terms of the SO(5) spinor and the five Dirac matrices. The four-component SO(5) spinor is defined by c ¼ ½13 dy They satisfy the usual anticommutation relations fy ; g ¼ ;
f ; g ¼ fy ; y g ¼ 0
½14
Using the spinor and the five Dirac matrices, we can express na and Lab as na ¼ 12 y a ;
½9
Lab ¼ 12 y ab
½15
The Lab operators satisfy the commutation relation
We can group these five components together to form a single vector na = (n1 , n2 , n3 , n4 , n5 ) called the superspin, since it contains both superconducting and antiferromagnetic spin components. The individual components of the superspin are explicitly defined in the last parts of eqns [8] and [9]. The concept of the superspin is only useful if there is a natural symmetry group acting on it. In this case, since the order parameter is 5D, it is natural to consider the most general rotation in the 5D order parameter space spanned by na . In 3D, three Euler angles specify a general rotation. In higher dimensions, a rotation is specified by selecting a plane and the angle of rotation within this plane. Since there are n(n 1)=2 independent planes in n dimensions, the group SO(n) is generated by n(n 1)=2 elements, specified in general by antisymmetric matrices Lab = Lba , with a = 1, . . . , n. In particular, the SO(5) group has ten generators. The total spin and the total charge operator
½Lab ;Lcd ¼ iðac Lbd þ bd Lac ad Lbc cc Lad Þ ½16 The na and the operators form the vector and the spinor representations of the SO(5) group, satisfying the following equations: ½Lab ; nc ¼ iðac nb bc na Þ
½17
and
Lab ; ¼ 12 ab
½18
If we arrange the ten operators S , Q, and into Lab ’s by the following matrix form: 0
1
0
B y B x þ x B B y Lab ¼ B y þ y B B y þ z @ z Q
0 Sz
0
Sy
Sx
0
1 y 1 y 1 y i ðx x Þ i ðy y Þ i ðz z Þ
0
C C C C C ½19 C C A
½12
and group na as in eqns [8] and [9], we see that eqns [16] and [17] compactly reproduce all the commutation relations worked out previously. These equations show that Lab and na are the symmetry generators and the order parameter vectors of the SO(5) theory. Having introduced the concept of local symmetry generators and order-parameter-based sites in real space, we now proceed to discuss definitions of these operators in momentum space. The AF and dSC order parameters can be naturally expressed in terms of the microscopic fermion operators as X y i X N ¼ cpþ cp ; y ¼ dðpÞcyp y cyp 2 p ½20 p dðpÞ cos px cospy
The ten operators, the total spin S , the total charge Q, and the six operators form the ten generators of the SO(5) group.
where (, ) and d(p) is the form factor for d-wave pairing in 2D. They can be combined into the five-component superspin vector na by using the
S ¼ 12 ðcy c þ dy dÞ Q ¼ 12 ðcy c þ dy d 2Þ
½10
perform the function of rotating the AF and SC order parameters within each subspace. In addition, there are six so-called operators, defined by y ¼ 12 cy y dy ;
¼ ðy Þy
½11
which perform the rotation from AF to SC and vice versa. These infinitesimal rotations are defined by the commutation relations ½y ; N ¼ i y ;
½y ; ¼ iN
650 High Tc Superconductor Theory Table 1 Quantum number of the AF and the dSC order parameters, and the operator, which rotates the AF and the dSC order parameters into each other
, y or n1 , n5 N or n2, 3, 4 , y
Charge
Spin
Momentum
Internal angular momentum
2
0
0
d-Wave
0
1
(, )
s-Wave
2
1
(, )
d-Wave
quantum nonlinear -model takes the following form: 1 X 2
X Lab ðxÞ þ na ðxÞna ðx0 Þ H¼ 2 x 2 X X þ Bab ðxÞLab ðxÞ þ VðnðxÞÞ ½23 x
x
where the na vector field is subjected to the constraint n2a ¼ 1
same convention as before. The total spin and total charge operator are defined microscopically as X y 1X y cp cp ; Q¼ ðc cp 1Þ ½21 S ¼ 2 p p p and the -operators can be defined as X y ¼ gðpÞcypþ y cyp
½22
p
The form factor g(p) needs to be chosen appropriately to satisfy the SO(5) commutation relation [16], and this requirement determines g(p) = sgn(d(p)). The SO(5) symmetry generators perform the most general rotation among the five-order parameters. It is easy to see that the quantum number of the operators exactly patches up the difference in quantum numbers between the AF and the dSC order parameters, according to Table 1. The SO(5) quantum nonlinear -model
In the previous section we presented the concept of the local SO(5) order parameters and symmetry generators. These relationships are purely kinematic, and do not refer to any particular Hamiltonian. One can in fact construct microscopic models with exact SO(5) symmetry out of these operators. A large class of models, however, may not have SO(5) symmetry at the microscopic level, but their long-distance, low-energy properties may be described in terms of an effective SO(5) model. In the previous section, we have seen that many different microscopic models indeed all have the SO(3) nonlinear -model as their universal low-energy description. Similarly, we present the SO(5) quantum nonlinear -model as a general theory of AF and dSC in the HTSC. From eqn [17] and the discussions in the previous subsection, we see that Lab and na are conjugate degrees of freedom, very much similar to [q, p] = ih in quantum mechanics. This suggests that we can construct a Hamiltonian from these conjugate degrees of freedom. The Hamiltonian of the SO(5)
½24
This Hamiltonian is quantized by the canonical commutation relations [16] and [17]. Here, the first term is the kinetic energy of the SO(5) rotors, where has the physical interpretation of the moment of inertia of the SO(5) rotors. The second term describes the coupling of the SO(5) rotors on different sites, through the generalized stiffness . The third term introduces the coupling of external fields to the symmetry generators, while the V(n) can include anisotropic terms to break the SO(5) symmetry to the SO(3) U(1) symmetry. The SO(5) quantum nonlinear -model is a natural combination of the SO(3) nonlinear -model describing the AF Heisenberg model and the quantum XY model describing the SC to insulator transition. If we restrict to the values a = 2, 3, 4, then the first two terms describe the symmetric Heisenberg model, the third term describes easy plane or easy axis anisotropy of the Neel vector, while the last term represents the coupling to the uniform external magnetic field. On the other hand, for a = 1, 5, the first term describes Coulomb or capacitance energy, the second term is the Josephson coupling energy, while the last term describes coupling to external chemical potential. The first two terms of the SO(5) model describe the competition between the quantum disorder and classical order. In the ordered state, the last two terms describe the competition between the AF and the SC order. Let us first consider the quantum competition. The first term prefers sharp eigenstates of angular momentum. At an isolated site, C P the L2ab is the Casimir operator of the SO(5) group, in the sense that it commutes with all the SO(5) generators. The eigenvalues of this operator can be determined completely from group theory; they are 0, 4, 6, and 10, respectively, for the 1D SO(5) singlet, 5D SO(5) vector, 10D antisymmetric tensor, and 14D symmetric, traceless tensors. Therefore, we see that this term always prefers a quantum-disordered SO(5) singlet ground state, which is a total spin singlet. This ground state is separated from the first excited state, the fivefold
High Tc Superconductor Theory
SO(5) vector state with an energy gap of 2= . This gap will be reduced, when the different SO(5) rotors are coupled to each other by the second term. This term represents the effect of stiffness, which prefers a fixed direction of the na vector, rather than a fixed angular momentum. This competition is an appropriate generalization of the competition between the number sharp and phase sharp states in a superconductor and the competition between the classical Neel state and the bond or plaquette singlet state in the Heisenberg AF. The quantum phase transition occurs near ’ 1. In the classically ordered state, the last two anisotropy terms compete to select a ground state. To simplify the discussion, we can first consider the following simple form of the static anisotropy potential: VðnÞ ¼ gðn22 þ n23 þ n24 Þ
½25
At the particle-hole symmetric point with vanishing chemical potential B15 = = 0, the AF ground state is selected by g > 0, while the SC ground state is selected by g < 0 coupled with the constraint n2a = 1. g = 0 is the quantum phase transition point separating the two ordered phases. However, it is unlikely that the HTSC cuprates can be close to this quantum phase transition point. In fact, we expect the anisotropy term g to be large and positive, so that the AF phase is strongly favored over the SC phase at half-filling. However, the chemical potential term has the opposite, competing effect favoring SC. To see this, we transform the Hamiltonian into the Lagrangian density (in imaginary time coordinates) in the continuum limit:
L ¼ !ab ðx; tÞ2 þ ð@k na ðx; tÞÞ2 þ Vðnðx; tÞÞ ½26 2 2 where !ab ¼ na ð@t nb iBbc nc Þ ða ! bÞ
½27
is the angular velocity. We see that the chemical potential enters the Lagrangian as a gauge coupling in the time direction. Expanding the time derivative term, we obtain an effective potential Veff ðnÞ ¼ VðnÞ
ð2Þ2 2 ðn1 þ n25 Þ 2
½28
from which we see that the V term competes pffiffiffiffiffiffiffiffiwith the chemical potential term. For < c = g= , the AF ground state is selected, while for > c , the SC ground state is realized. At the transition point, even though each term strongly breaks SO(5) symmetry, the combined term gives an effective static potential which is SO(5) symmetric, as we can see from [28].
651
Even though the static potential is SO(5) symmetric, the full quantum dynamics is not. This can be most easily seen from the time-dependent term in the Lagrangian. When we expand out the square, the term quadratic in enters the effective static potential in eqn [28]. However, there is also a time-dependent term linear in . This term breaks the particle-hole symmetry, and it dominates over the second-order time derivative term in the n1 and n5 variables. In the absence of an external magnetic field, only second-order time derivative terms of n2, 3, 4 enter the Lagrangian. Therefore, while the chemical potential term compensates the anisotropy potential in eqn [28] to arrive at an SO(5) symmetric static potential, its time-dependent part breaks the full quantum SO(5) symmetry. This observation leads to the concept of the projected, or static SO(5) symmetry (Zhang et al. 1999). A model with projected or static SO(5) symmetry is described by a quantum effective Lagrangian of the form X L¼ ð@t na Þ2 ðn1 @t n5 n5 @t n1 Þ 2 a¼2;3;4 Veff ðnÞ
½29
where the static potential Veff is SO(5) symmetric, but the time-dependent part contains a first-order time derivative term in n1 and n5 . The SO(5) quantum nonlinear -model is constructed from two canonically conjugate field operators Lab and na . In fact, there is a kinematic constraint among these field operators: Lab nc þ Lbc na þ Lca nb ¼ 0
½30
This identity is valid for any triples a, b, and c, and can be easily proved by expressing Lab = na pb nb pa , where pa is the conjugate momentum of na . Geometrically, this identity expresses the fact that Lab generates a rotation of the na vector. The infinitesimal rotation vector lies on the tangent plane of the four sphere S4 , and is therefore orthogonal to the na vector itself. In a large class of materials, including the high-Tc cuprates, the organic superconductors, and the heavy fermion compounds, the AF and SC phases occur in close proximity to each other. The SO(5) theory is developed based on the assumption that these two phases share a common microscopic origin and should be treated on an equal footing. The SO(5) theory gives a coherent description of the rich global phase diagram of the high-Tc cuprates and its low-energy dynamics through a simple symmetry principle and a unified effective model based on a single quantum Hamiltonian. A number of theoretical predictions, including the intensity dependence of the neutron resonance
652 Holomorphic Dynamics
T
T TN Tbc AF
T TN
Tc
AF
TN Tbc
(a)
AF
SC
SC μc
Ttc
Tc
μ
δ
Phase c separation (b)
δ
μc1
Tc SC
μc2
μc0
μ
Uniform AF/SC (c)
Figure 3 The finite-temperature phase diagram of the SO(5) model in the temperature (T) versus chemical potential () plane. (a) and (b) are two different representations of the same phase diagram, corresponding to a direct first-order phase transition between AF and SC, as a function of the chemical potential and doping, respectively. (c) corresponds to two second-order phase transitions with a uniform AF/SC mix phase in between. The AF and the SC transition temperatures TN and Tc merge into a bicritical Tbc or a tetra-critical point Ttc . Both possibilities are allowed theoretically; it is up to experiments to determine which one is actually realized in the high-Tc cuprates.
mode, the AF vortex state, and the mixed phase of AF and SC, have been verified experimentally (Figure 3). The theory also sheds light on the microscopic mechanism of superconductivity and quantitatively correlates the AF exchange energy with the condensation energy of superconductivity. However, the theory is still incomplete in many ways and lacks full quantitative predictive power. While the role of fermions is well understood within the exact SO(5) models, their roles in the effective SO(5) models are still not fully worked out. As a result, the theory has not made many predictions concerning the transport properties of these materials. See also: Abelian Higgs Vortices; Effective Field Theories; Euclidean Field Theory; Ginzburg–Landau Equation; Hubbard Model; Quantum Phase Transitions; Quantum Spin Systems; Quantum Statistical Mechanics:
Overview; Renormalization: General Theory; Renormalization: Statistical Mechanics and Condensed Matter; Superfluids; Symmetry Classes in Random Matrix Theory; Variational Techniques for Ginzburg–Landau Energies.
Further Reading Bardeen J, Cooper LN, and Schrieffer JR (1957) Physical Review 108(5): 1175. Dagotto E (1994) Reviews of Modern Physics 66(3): 763. Imada M, Fujimori A, and Tokura Y (1998) Reviews of Modern Physics 70(4): 1039. Orenstein J and Millis AJ (2000) Science 288(5465): 468. Yang CN and Zhang SC (1990) Modern Physics Letters B 4(6–7): 759. Zhang SC (1997) Science 275(5303): 1089. Zhang SC, Hu JP, Arrigoni E, Hanke W, and Auerbach A (1999) Physical Review B 60(18): 13070.
Holomorphic Dynamics M Lyubich, University of Toronto, Toronto, ON, Canada and Stony Brook University, NY, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Subject
Holomorphic dynamics (in a narrow sense) is a theory of iterates of rational endomorphisms of the ^ = C [ {1}. The goal is to underRiemann sphere C stand the phase portrait of this dynamical system, that is, the structure of its trajectories, and the dependence of the phase portrait on parameters (coefficients of f ).
Holomorphic dynamics in a broader sense would include the theory of analytic transformations, local and global, in dimension 1 and higher, as well as the theory of groups and pseudogroups of analytic transformations, which would cover theory of Kleinian groups and holomorphic foliations. However, we will mostly focus on holomorphic dynamics in the narrow sense. Brief History
Local dynamical theory of analytic maps was laid down in the late nineteenth and early twentieth century by Ko¨nigs, Schro¨der, Bo¨ttcher, and Leau. Global theory of iterates of rational maps was founded by Fatou and Julia in comprehensive memoires of 1918–19. The theory had been
Holomorphic Dynamics
developed very little since then until early 1980s when it exploded with new methods, ideas, and computer images. Particularly influential were the works of D Sullivan who introduced ideas of quasiconformal deformations into the field, of A Douady and J Hubbard who gave a comprehensive combinatorial description of the Mandelbrot set, and W Thurston who linked holomorphic dynamics to three-dimensional hyperbolic geometry bringing to the field ideas of geometrization and rigidity. As a result, profound rigidity conjectures were formulated. Renormalization ideas introduced to the theory later on led to a significant progress towards these conjectures (see Universality and Renormalization). Another source of ideas came from ergodic theory and the general theory of dynamical systems, particularly hyperbolic dynamics and thermodynamical formalism. They led to constructions of natural geometric measures on the Julia sets that helped to penetrate into their fractal nature. General Terminology and Notations
N = {1, 2, . . . } is the set of natural numbers; D is the unit disk; Zþ = N [ f0g; T = @D. A topological disk is a simply connected domain in C. A topological annulus is a doubly connected domain in C (i.e., a domain homeomorphic to a round annulus). A Cantor set is a totally disconnect compact subset of Rn without isolated points. Given a map f : X ! X, f n will stand for its n-fold iterate. The semigroup of iterates form a dynamical system with discrete time. An orbit or trajectory of a point z is orbf (z) = (f n z)1 n = 0. ^ is called invariant if f (Y) Y and A subset Y C completely invariant if also f 1 (Y) Y. ^ is called periodic if f p = for A point 2 C some natural p. The smallest such p is called the period of . If p = 1, then is called a fixed point. The orbit of a periodic point is also called a cycle. Two maps f : X ! X and g : Y ! Y on topological spaces X and Y are called topologically conjugate if there exists a homeomorphism h : X ! Y such that h f = g h. If h has better regularity properties, for example, it is quasiconformal/conformal/affine, then f and g are called quasiconformally/conformally/ affinely conjugate. Let f (z) = P(z)=Q(z) be a rational function viewed ^ ! C. ^ Its topological degree deg f = as a map C 1 ^ #f (z), z 2 C, (where the preimages of z are counted with multiplicity), is equal to the algebraic degree max(deg P, deg Q). The dynamics of f is very
653
simple in degree 1, so in what follows we assume that deg f 2. Let Cf = {c : Df (c) = 0} stand for the set of critical points of f, and Vf = f (Cf ) be the set of critical values. A rational function of degree d has 2d 2 critical points counted with multiplicity. Moreover, Cf n ¼
n1 [
f k ðCf Þ;
Vf n ¼
k¼0
n [
f k ðCf Þ
k¼1
The latter formula explains why the behavior of the critical orbits crucially influences the global dynamics of f. The set Of = [Vf n is called postcritical.
Basic Dynamical Theory Local Theory
The local theory describes P the dynamics of an analytic map f : z 7! z þ n = 2 an zn near its fixed point 0. The derivative = f 0 (0) is called the multiplier of 0. The fixed point is called attracting, repelling, or neutral, depending on whether jj < 1, jj > 1, or jj = 1. It is called superattracting if = 0. In case when 0 is an attracting (but not superattracting) or repelling fixed point, the map is linearizable, that is, it is conformally conjugate to its linear part z 7! z; thus, there is a local conformal solution of the Schro¨der equation (fz) = (z). This solution is also called the linearizing coordinate near 0. In the superattracting case, the map is conformally conjugate to the map z 7! zd , where ad zd is the first nonvanishing term in the local expansion of f. Thus, in this case there is a local conformal solution of the Bo¨ttcher equation (fz) = (z)d . It is also called the Bo¨ttcher coordinate near 0. The situation in the neutral case (when = e2i , 2 R=Z) depends in a delicate way on the arithmetic properties of the rotation number . If = q=p is rational, the fixed point 0 is called parabolic. The local dynamics is then described in terms of the Leau–Fatou flower consisting of attracting petals alternating with repelling petals. In each petal, the map is conformally conjugate to the translation z 7! z þ 1. The quotients of the petals by dynamics are conformally equivalent to the cylinder C=< z 7! z þ 1>. They are called (attracting/ repelling) Ecalle–Voronin cylinders. In the irrational case, when 2 RnQ, the map can be either linearizable or not. Accordingly, 0 is called a Siegel or a Cremer fixed point. If the multiplier is Diophantine (i.e., there exist C > 0 and > 2 such that for all rational numbers q=p, we have: j q=pj Cp ), then 0 is linearizable (Siegel 1942).
654 Holomorphic Dynamics
Notice that almost all numbers are Diophantine. A sharper arithmetic condition for linearizability in terms of the continuous fraction expansion for was given by Bruno (1965). In the quadratic case, z 7! e2i z þ z2 , this condition was proved to be sharp (Yoccoz 1988). Fatou and Julia Sets
^ !C ^ is a rational endomorphism From now on, f : C of the Riemann sphere. The theory starts with the splitting of the sphere into two subsets now called Fatou and Julia sets based on the notion of a normal ^ family in the sense of Montel. A family ( : S ! C) of meromorphic functions on some Riemann surface S is called normal if it is precompact in the open– closed topology. The Fatou set F(f ) is the maximal ^ on which the family of iterates open subset of C (f n )1 is normal. The Julia set J(f ) is the complen=0 ment of the Fatou set. Both sets are completely invariant. The Julia set is always nonempty, and is either nowhere dense or coincides with the whole sphere. The trajectories on the Fatou set are Lyapunov stable (if z is close to z0 2 F(f ), then orbf (z) is uniformly close to orbf (z0 )), while the dynamics on the Julia set is ‘‘chaotic.’’ If f is a polynomial, then the Fatou and Julia sets can be defined in a more concrete way as follows. In this case, 1 is a superattracting fixed point for f. Let us consider its basin of attraction, Df ð1Þ ¼ fz : f n z ! 1 as z ! 1g Its complement, K(f ), is called the filled Julia set. Then, Jðf Þ ¼ @Kðf Þ ¼ @Df ð1Þ
Periodic Points
Let be a periodic point of f of period p. As a fixed point of f p , it is subject of the local theory. Thus, it (and its cycle) is classified as attracting, repelling, etc., according to the properties of the multiplier = (f p )0 () (that can be calculated in any local chart near ). The basin of attraction Df (a) of an attracting n cycle a = (f n )p1 n = 0 is the set {z : f z ! a as n ! 1}. The immediate basin of attraction Df (a) is the union of components of Df (a) containing the points of a. Theorem 1 (Fatou–Julia). The immediate basin of any attracting cycle contains a critical point. (Note that a superattracting cycle actually contains some critical point.)
It follows that a rational function of degree d has at most 2d 2 attracting cycles. A polynomial of degree d has at most d 1 attracting cycles in C. Attracting cycles belong to the Fatou set, while repelling cycles lie on the Julia set. Parabolic and Cremer points lie on the Julia set, while Siegel points belong to the Fatou set. The basin of attraction of a parabolic cycle a is defined as Df ðaÞ ¼ fz : f n z ! a as n ! 1gn
1 [
f n ðaÞ
n¼0
It is the union of some components of the Fatou set. The union of the components of Df (a) containing the petals of the Leau–Fatou flower is called the immediate basin of attraction Df (a) of a. As in the attracting case, the immediate basin Df (a) of a parabolic cycle contains a critical point of f. Components of the Fatou set containing Siegel periodic points are called Siegel disks. If D is a Siegel disk of period p, then f p jD is conformally conjugate to the irrational rotation z 7! e2i z of the unit disk. Theorem 2 (Shishikura 1987). A rational function of degree d has at most 2d 2 nonrepelling cycles. The proof of this result uses the methods of quasiconformal surgery. Examples
For f : z 7! zd , d 2, the Julia set J(f ) is the unit circle. ^ D, while D is the basin of Moreover, Df (1) = Cn attraction of the superattracting fixed point 0. For maps f" : z 7! z2 þ " with sufficiently small " 6¼ 0, the Julia set J( f ) is a nowhere-differentiable Jordan curve (see Figure 1). The domain bounded by this curve is the basin of attraction of an attracting fixed point " . The filled Julia set of the map f : z 7! z2 1 called the basilica is depicted in black in Figure 2. The
Figure 1 Nowhere-differentiable Jordan curve.
Holomorphic Dynamics
655
Figure 2 Basilica.
interior of the basilica is the basin of the superattracting cycle a = (0, 1) of period 2. For the map f : z 7! z2 2, the Julia set is the interval [2, 2]. It is affinely conjugate to the Chebyshev quadratic polynomial Ch2 : z 7! 2z2 1. More generally, for a Chebyshev polynomial Chd of any degree d, the Julia set is the interval. (By definition, the Chebyshev polynomial Chd is the solution of the functional equation cos dz = Chd ( cos z).) For quadratic maps fc : z 7! z2 þ c with c < 2, the Julia set is a Cantor set on R. For maps fc with c > 1=4, the Julia set is a Cantor set that does not meet R. For c 2 (2, 1=4], the Julia set contains an invariant interval on R, but is not contained in R. For f : z 7! z2 þ i, the Julia set is a ‘‘dendrite’’ (see Figure 3). For c 0.12 þ 0.74i, the map f : z 7! z2 þ c has an attracting cycle of period 3. Its Julia set in known as the Douady rabbit (Figure 4). No Wandering Domains Theorem and Dynamics on the Fatou Set
A component D of the Fatou set is called wandering if f n (D) \ f m (D) = ; for all natural n > m.
Figure 4 Douady rabbit.
Theorem 3 (Sullivan 1982). Rational functions do not have wandering domains. This theorem is analogous to Ahlfors theorem in the theory of Kleinian groups. Its proof introduced to holomorphic dynamics the methods of quasiconformal deformations that has become the basic tool of the subject. The ‘‘no wandering domains theorem’’ has completed the picture of dynamics on the Fatou set. Namely, for any z 2 F(f ), one of the following three events may happen:
z belongs to the basin of attraction of some attracting cycle;
z belongs to the basin of attraction of some parabolic cycle; and
for some n, f n z belongs to a rotation domain. Here a rotation domain is either a Siegel disk, or a Herman ring, that is, a topological annulus A such that f p (A) = A for some p 2 N and f p j A is conformally equivalent to an irrational rotation z 7! e2i z of a round annulus {z : 1 < jzj < R}. Note that Herman rings cannot occur for polynomial maps. More Properties of the Julia Set
There are two more useful characterizations of the Julia set:
If z is not an attracting periodic point and does not belong to a rotation domain, then the set of accumulation points of the full preimages f n z is equal to J(f ). The Julia set is the closure of the set of repelling periodic points.
Figure 3 Dendrite.
In the polynomial case, the Julia set J(f ) (and the filled Julia set K(f )) is connected if and only if the critical points do not escape to 1 (in other words,
656 Holomorphic Dynamics
Cf K(f )). In the quadratic case, the Basic Dichotomy holds: the Julia set (and the filled Julia set) is either connected or a Cantor set.
Bo¨ttcher Coordinate
Let f = zd þ a1 zd1 þ þ ad be a monic polynomial of degree d 2. Then 1 is a superattracting fixed point, and hence there is a univalent function B(z) = Bf (z) near 1 satisfying the Bo¨ttcher equation B(fz) = B(z)d (the Bo¨ttcher coordinate near 1). Moreover, B(z) z as z ! 1 since f is monic. If J(f ) is connected, B(z) can be analytically extended to the whole basin of 1, and provides us Otherwise, with the Riemann map CnK(f ) ! Cn D. B(z) extends to a conformal map from some invariant domain f whose boundary contains a R , where R = Rf > 1. critical point onto Cn D The B-preimage of a straight ray {re2i : 0 < r < 1} is called the external ray R of angle . The B-preimage of a round circle {re2i : 0 < 1} is called the equipotential E t of level t = log r. External rays and equipotentials form two orthogonal f-invariant foliations. We let R (t) = R \ E t .
Hyperbolic Maps and Fatou’s Conjecture
Hyperbolic maps form an important and bestunderstood class of rational maps (compare with Hyperbolic Dynamical Systems). A rational map f is called hyperbolic if one of the following equivalent conditions holds:
All critical points of f converge to attracting cycles;
The map is expanding on the Julia set: jDf n ðzÞj Cn ;
z 2 Jðf Þ
where C > 0, > 1. For instance, the maps z 7! z2 þ " for small ", z 7! z2 1, and z 7! z2 þ c for c 2 Rn[2, 1=4] are hyperbolic. It is easy to see from the first definition that hyperbolicity is a stable property, that is, the set of hyperbolic maps is open in the space Ratd of rational maps of degree d. One of the central open problems in holomorphic dynamics is to prove that this set is also dense. This problem is known as Fatou’s conjecture.
Postcritically Finite Maps and Thurston’s Theory Combinatorial Equivalence
Assume now that J(f) is connected. One says that an external ray R lands at some point z 2 J(f ) if R (t) ! z as t ! 0. Any external ray of rational angle = q=p with odd p lands at some repelling or parabolic periodic point of period dividing p (Douady and Hubbard 1982). Vice versa, any repelling or parabolic point is a landing point of at least one rational ray as above (Douady 1990s). Let us consider the following equivalence relation on the set of rational numbers with odd denominators: two such numbers and 0 are equivalent if the corresponding rays R and R0 land at the same point z 2 J(f ). Two polynomials f and ~f with connected Julia set are called combinatorially equivalent if the corresponding equivalence relations coincide. Notice that topologically equivalent polynomials are combinatorially equivalent.
Parameter Phenomena Spaces of Rational Functions
Let Ratd stand for the space of rational functions of degree d. As an open subset of the complex projective space CP2dþ1 , it is endowed with the natural topology and complex structure.
A rational map is called postcritically finite if the orbits of all critical points are finite. In this case, any critical point c is either a superattracting periodic point, or a repelling preperiodic point (i.e., f n c is a repelling periodic point for some n). If all critical ^ points of f are preperiodic, then J(f ) = C. Important examples of postcritically finite maps with ^ come from the theory of elliptic functions. J(f ) = C ^ be the Weierstrass Namely, let P : C= ! C P-function, where is the lattice in C generated by 1 and , Im > 0. It satisfies the functional equation P (nz) = f, n (P(z)), where f, n is a rational function. These functions called Latte´s examples possess the desired properties. (For some special lattices, n can be selected complex: the corresponding maps are also called Latte´s.) More generally, one can consider postcritically finite topological branched coverings f : S2 ! S2 . Two such maps, f and g, are called Thurston combinatorially equivalent if there exist homeomorphisms h, h0 : (S2 , Of ) ! (S2 , Og ) homotopic rel Of (and hence coinciding on Of ) such that h0 f = h g. A combinatorial class is called realizable if it contains a rational function. Thurston (1982) gave a combinatorial criterion for a combinatorial class to be realizable. If it is realizable, then the realization is unique, except for Latte´s examples (Thurston’s Rigidity Theorem).
Holomorphic Dynamics Structural Stability and Holomorphic Motions
A map f 2 Ratd is called J-stable if for any maps g 2 Ratd sufficiently close to f, the maps f j J(f ) and g j J(g) are topologically conjugate, and moreover, the conjugacy hg : J(f ) ! J(g) is close to id. Thus, the Julia set J(f ) moves continuously over the set of J-stable maps. The following result proves a weak version of Fatou’s conjecture: Theorem 4 (Lyubich and Man˜e´-Sad-Sullivan 1983). The set of J-stable maps is open and dense in Ratd . Moreover, the set of unstable maps is the closure of maps that have a parabolic periodic point. A map f 2 Ratd is called structurally stable if for any maps g 2 Ratd sufficiently close to f, the maps f and g are topologically conjugate on the whole ^ !C ^ is sphere, and moreover, the conjugacy hg : C close to id. The set of structurally stable maps is also open and dense in Ratd (Man˜e´-Sad-Sullivan). The proofs make use of the theory of holomorphic motions developed for this purpose but having much broader range of applications in dynamics and ^ and let h : X ! C ^ analysis. Let X be a subset of C, be a family of injections depending on parameter 2 in some complex manifold with a marked point . Assume that h = id and that the functions 7! h (z) are holomorphic in for any z 2 X. Such a family of injections is called a holomorphic motion. A holomorphic motion of any set X over extends to a holomorphic motion of the whole ^ over some smaller manifold 0 (Bers– sphere C Royden, Sullivan–Thurston 1986). If h is a holomorphic motion of an open subset of the sphere, then the maps h are quasiconformal (Man˜e`-SadSullivan). These statements are usually referred to as the -lemma. If = D, then the holomorphic motion of a set ^ extends to a holomorphic motion of C ^ over XC the whole disk D (Slodkowsky 1991).
Tz , z 2 X, such that Df (Lz ) = Lfz . Note that such a field can exist only if J(f ) has positive Lebesgue measure. No Invariant Line Fields Conjecture Let us consider two rational maps, f and ~f , that are not Latte´s examples. If they are quasiconformally conjugate and the conjugacy is conformal on the Fatou set, then they are conjugate by a Mo¨bius transformation. Equivalently, if f is not Latte´s, then there are no measurable invariant line fields on J(f ). This conjecture would imply Fatou’s conjecture. Mandelbrot Set
Let us consider the quadratic family fc : z 7! z2 þ c. (Note that any quadratic polynomial is affinely conjugate to a unique map fc .) The Mandelbrot set classifies parameters c according to the Basic Dichotomy of the subsection ‘‘More properties of the Julia set’’: M ¼ fc : Jðfc Þ is connectedg ¼ fc : fcn ð0Þ ! 7 1g Note that n (c) fcn (0) is a polynomial in c of degree 2n1 , and these polynomials satisfy a recursive relation nþ1 = 2n þ c. Moreover, M = {c : jn (c)j 2, n 2 Zþ }, which gives an easy way to make a computer image of M (see Figure 5). A distinguished curve seen at the picture of M is the main cardioid C = {c = e2i e4i =4}, 2 R=Z. For such a c = c() 2 C, the map fc has a neutral fixed point c with rotation number . For c inside the domain H0 bounded by C, fc has an attracting fixed point c , and the Julia set J(fc ) is a Jordan curve (see Figure 1).
Fundamental Conjectures
The above rigidity and stability results led to the following profound conjectures: QC Rigidity Conjecture If two rational maps are topologically conjugate, then they are quasiconformally conjugate. Let us consider the real projective tangent bundle ^ with a natural action of the map f. PT over C, A measurable invariant line field on the Julia set is an invariant measurable section X ! PT over an invariant set X J(f ) of positive Lebesgue measure. In other words, it is a family of tangent lines Lz
657
Figure 5 The Mandelbrot set.
658 Holomorphic Dynamics
At the cusp c = 1=4 = c(0) of the main cardioid, the map fc has a parabolic point with multiplier 1. This point is also called the root of C. Other parabolic points c = c(q=p) on C are bifurcation points: if one crosses C transversally at c, then the fixed point c ‘‘gives birth’’ to an attracting cycle of period p. This cycle preserves its ‘‘attractiveness’’ within some component Hq=p of int M attached to C. On the boundary of Hq=p , the above attracting cycle becomes neutral, and similar bifurcations happen as one crosses this boundary transversally, etc. In this way we obtain cascades of bifurcations and associated necklaces of components of int M. The most famous one is the cascade of doubling bifurcations that occur along the real slice of M. Components of int M that occur in these bifurcation cascades give examples of hyperbolic components of int M. More generally, a component H of int M is called hyperbolic of period p if the maps fc , c 2 H, have an attracting cycle of period p. Many other hyperbolic components become visible if one begins to zoom-in into the Mandelbrot set. Some of them are satellite, that is, they are born as above by bifurcation from other hyperbolic components. Others are primitive. They can be easily distinguished geometrically: primitive components have a cusp at their root, while satellite components are bounded by smooth curves. Given a hyperbolic component H, let us consider the multiplier (c), c 2 H, of the corresponding attracting cycle, as a function of c 2 H. The function univalently maps H onto the unit disk D (Douady and Hubbard 1982). Thus, there is a single parameter c0 2 H for which (c0 ) = 0, so that fc0 has a superattracting cycle. This parameter is called the center of H. Nonhyperbolic components of int M are called queer. Conjecturally, there are no queer components. This conjecture is equivalent to Fatou’s conjecture for the quadratic family. The boundary of M coincides with the set of J-unstable quadratic maps (see the subsection ‘‘Structural stability and holomorphic motions’’). Connectivity and Local Connectivity
Theorem 5 (Douady and Hubbard 1982). The Mandelbrot set is connected. The proof provides an explicit uniformization Namely, let Bc : c ! CnDR , c 2 RM : CnM ! Cn D. c CnM, be the Bo¨ttcher coordinate near 1. Then RM (c) = Bc (c). This remarkable formula explains the phase-parameter similarity between the Mandelbrot set near a parameter c 2 M and the corresponding Julia set J(fc ) near the critical value c.
The following is the most prominent open problem in holomorphic dynamics: MLC Conjecture connected.
The Mandelbrot set is locally
If this is the case, then the inverse map R1 M extends to the unit circle T, and the Mandelbrot set can be represented as the quotient of T modulo certain equivalence relation that can be explicitly described. Thus, we would have an explicit topological model for the Mandelbrot set (Douady and Hubbard, Thurston). The MLC conjecture is equivalent to the following conjecture: Combinatorial Rigidity Conjecture If two quadratic maps fc and fc0 with all periodic points repelling are combinatorially equivalent, then c = c0 . In turn, this conjecture would imply, in the quadratic case, the above fundamental conjectures. For a progress towards the MLC conjecture (see Universality in Mathematical Physics). Parabolic Implosion
Parabolic maps fc0 : z 7! z2 þ c0 are unstable in a dramatic way. In particular, the Julia set J(fc ) does not depend continuously on c near c0 . Instead, J(fc ) tends to fill in a good part of int J(fc0 ). This phenomenon called parabolic implosion has been explored by Douady, Lavaurs, Shishikura, and many others.
Geometric Aspects Area
One of the basic problems in holomorphic dynamics ^ is whether a Julia set that does not coincide with C can have positive area. It would give an example of ‘‘observable chaos’’ that occurs on a topologically small set. It is also related to the No Invariant Line Fields Conjecture. Maps with strong hyperbolic properties have zero area Julia set. A rational map f is called Collet– Eckmann if there exist constants C > 0 and > 1 such that: jDf n ðfcÞj Cn ;
n2N
for all critical points c. If f is a Collet–Eckmann map ^ then area J(f ) = 0 (Przytycki and with J(f ) 6¼ C, Rohde 1998) (see Universality and Renormalization for more examples). On the other hand, A Douady has set up a compelling program of constructing a Cremer quadratic polynomial f : z 7! e2i z þ z2 whose Julia set would have positive area. Buff and
Holomorphic Dynamics
Cheritat have recently announced that they have completed the program, thus constructing the first example of a Julia set of positive area. (It makes use of a renormalization theorem for parabolic implosion recently announced by Shishikura.) In the parameter plane, it would be interesting to know whether the boundary of the Mandelbrot set has zero area. Hausdorff Dimension
Hausdorff dimension (HD) gives us a further refinement of fractal sets of zero area. Any Julia set has positive HD. If f is a polynomial with connected Julia set, then HD( J(f )) > 1 unless f is affinely conjugate to z 7! zd or a Chebyshev polynomial (Zdunik 1990). If f is a Collet–Eckmann map ^ then HD J(f ) < 2 (Przytycki–Rohde with J(f ) 6¼ C, 1998). On the other hand, in the quadratic case fc : z 7! z2 þ c, HD(J(fc )) = 2 for a generic parameter c 2 @M. The corresponding parameter result is that HD(@M) = 2 (Shishikura 1998). It is based on the parabolic implosion phenomenon. Conformal Measure
^ is called Let 0. A Borel measure on C -conformal if Z ðf XÞ ¼ jDf j d
659
Moreover, preimages of any point z except at most two are equidistributed with respect to (meaning that the probability measures n, z that assign mass 1 to every f n -preimage of z converge weakly to as n ! 1). For polynomials, the balanced measure coincides with the harmonic measure on J(f ) (Brolin). (The latter is the charge distribution on the conductor J(f ) generated by the unit charge placed at 1.) In general, the balanced measure is the unique measure of maximal entropy of f, and moreover, periodic points are equidistributed with respect to (Lyubich). Measure of maximal entropy is supported on a relatively small measurable set: its HD is strictly less than HD(J(f )), unless f is conformally equivalent to z 7! zd , a Chebyshev polynomial, or a Latte´s example (Zdunik 1990). In the polynomial case, it is supported on a set of HD at most 1 (Manning 1984). In complex analysis, there has been an extensive study of fractal properties of harmonic measures, providing insights at the balanced measure and the other way around (Carleson, Makarov, Jones, Binder, Smirnov, . . .)
for any measurable set X such that f j X is injective.
See also: Fractal Dimensions in Dynamics; Geometric Analysis and General Relativity; Geometric Flows and the Penrose Inequality; Geometric Phases; Polygonal Billiards; Renormalization: General Theory; Renormalization: Statistical Mechanics and Condensed Matter; Universality and Renormalization.
Theorem 6 (Sullivan 1983). Any rational map f has a -conformal measure with 2 (0, 2] supported on J(f ).
Further Reading
X
This is a dynamical measure that captures well geometric properties of J(f ). For instance, for Collet– Eckmann maps, = HD(J(f )), and is equivalent to the Hausdorff measure on J(f ) in dimension . The hyperbolic dimension, HDhyp of J(f ) is the supremum of HD(X) over all compact invariant hyperbolic subsets of J(f ). Denker and Urbanski (1991) proved that HDhyp (J(f )) is equal to the smallest exponent of all -conformal measures supported on J(f ) (see Universality and Renormalization). Measure of Maximal Entropy
An f-invariant measure is called balanced if (f X) = d(X) for any measurable set X such that f j X is injective (where d = deg f ). Theorem 7 (Brolin 1965, Lyubich 1982). Any rational map f has a unique balanced measure .
Space limitations prohibit the inclusion of references to all the results quoted in this article. Where an author is quoted in the text, the reader can find the corresponding reference in one of the books listed in this section and in the MatSciNet. Buff X and Hubbard J Dynamics in One Complex Variable. Ithaca, NY: Matrix Editions (to appear). Carleson L and Gamelin W (1993) Complex Dynamics. Berlin: Springer. Eremenko A and Lyubich M (1990) The dynamics of analytical transformations. Leningrad Mathematical Journal 1: 563–633. Milnor J (1999) Dynamics in One Complex Variable: Introductory Lectures. Braunschweig: Vieweg and Sohn. Pilgrim KM (2003) Combinatorics of Complex Dynamical Systems, Lecture Notes in Mathematics no 1827. Berlin– Heidelberg–New York: Springer-Verlag. Przytycki F and Urbanski M, Fractals in the Plane – the Ergodic Theory Methods. (www.math.unt.edu/uranski/book1.html) Cambridge: Cambridge University Press (to appear). Tan Lei (ed.) (2000) The Mandelbrot Set, Theme and Variations, London Math. Soc. Lecture Note Ser., vol. 274. Cambridge: Cambridge University Press.
660 Holonomic Quantum Fields
Holonomic Quantum Fields J Palmer, University of Arizona, Tucson, AZ, USA
The Two-Dimensional Ising Model
ª 2006 Elsevier Ltd. All rights reserved.
The SMJ theory was inspired by, and provides an attractive setting for, an earlier result of Wu, McCoy, Tracy, and Baruch (WMTB), concerning the spin–spin scaling functions of the two-dimensional Ising model (Wu et al. 1976). Since the Ising model is the example with the most direct significance for physics, we will take some time to explain the WMTB result and to sketch the way in which it fits into the SMJ theory. The Ising model is a statistical model of magnestism on a lattice that incorporates ferromagnetic interactions of nearest-neighbor spins. In the 1920s, Ising solved the model for the one-dimensional lattice and showed that there was no phase transition in the infinite volume limit. Interest in the twodimensional model intensified dramatically following Onsager’s calculation of the specific heat in the infinite volume limit (see Palmer and Tracy (1981) and references within). His formula for the specific heat was the first instance of a thermodynamic quantity in a nearest-neighbor model which exhibits the sort of discontinuity in temperature dependence expected at a phase transition. For many years, the Ising model served as a testbed for the now accepted notion that the infinite volume limit of Gibbsian statistical mechanics provides a suitable setting for the study of phase transitions. A configuration for the Ising model on a finite subset, , of the integer lattice, Z2 , is a map C : ! {þ1, 1}, which assigns to each site on the lattice either an up spin (þ1) or a down spin (1). The energy function of the Ising model, E (), is defined by X E ðÞ ¼ J ðiÞðjÞ
Introduction The term, holonomic field, was coined by Sato, Miwa, and Jimbo (SMJ) in 1978 and the subject was investigated by them in a series of five long papers and many shorter notes in the period from 1978–81 (Sato et al. 1979a, 1979b, 1979c, 1980, Tracy and Widom 1994). The term refers to a special class of two-dimensional interacting quantum field theories whose n point correlations can be expressed in terms of the solution to a holonomic system of differential equations. A holonomic system is an overdetermined system of differential equations with only a finitedimensional family of solutions. There is a sense in which these interacting systems with infinitely many degrees of freedom have a finite-dimensional substrate (at the level of n point functions for fixed n). After developing their theory, SMJ realized that such quantum fields made an earlier appearance in work of Thirring and Federbush. The models considered by Thirring and Federbush are self-interacting fermionic systems whose nonlinear classical field equations have solutions that are an explicit nonlinear transformation of solutions to the free field equations. This inspired the idea of trying to study these models by ‘‘quantizing’’ the nonlinear transformation. Expressions were obtained for the correlations and S-matrix but the connection with deformation theory was not understood until the SMJ work. In what follows we will sketch the SMJ theory and discuss some of its offshoots. There is one circumstance that it might help the reader to be aware of even though it will be mostly glossed over. Quantum fields in one space and one time dimension have correlations which transform under the symmetries of spacetime with metric signature (1,1). Since the work of Osterwalder and Schrader, it is customary to pass back and forth between this Minkowski regime and the Schwinger functions obtained by analytically continuing the n point functions to pure imaginary values for the time variable where they possess the rotational symmetries associated with a positivedefinite metric. The Ising model, which we take up next, is naturally considered in the Euclidean domain where the correlations have an interpretation in statistical mechanics as the expected value of a product of random variables. Ultimately, the SMJ deformation analysis is done in the Euclidean domain.
hi;ji2
for J > 0 and a spin configuration is a sum over pairs of nearest-neighbor sites i, j in (boundary terms require special consideration). This energy function tends to favor spin configurations, , in which the nearest-neighbor spins are aligned in the sense that the Boltzmann weight, e E ()=kT , is larger for such configurations. In the Gibbs ensemble, which is expected to describe systems in equilibrium at temperature T, the configuration occurs with a probability proportional to the Boltzmann weight. The factor k which appears is a conversion factor between thermal and kinetic energy called the Boltzmann constant. It is clear from the formula for the Boltzmann weight that small temperatures (near 0) tend to accentuate the difference in
Holonomic Quantum Fields
statistical weights assigned to configurations with different energies, and large temperatures tend to wash out the difference in statistical weights associated to configurations with different energies. Remarkably, there is a sharp critical temperature 0 < Tc < 1 so that for T < Tc the propensity for order built into the energy triumphs in the infinite volume limit " Z2 , and for T > Tc the randomness or disorder associated with high temperatures governs the infinite volume behavior. More specifically, if T < Tc and the infinite volume limit is taken with plus spins assigned to the boundary of , the system exhibits a residual magnetism (there is a positive expected value, hi, for the spin per site). This infinite volume plus state is the quintessential example of symmetry breaking – the spin flip symmetry possessed by the bulk energy is broken below Tc in the thermodynamic limit. For T > Tc , the spin per site is 0 no matter what boundary conditions are imposed on the infinite volume limit. Pure equilibrium states both above and below Tc exhibit clustering in the thermodynamic limit (uniqueness for the ground state in field theory). This is the tendency of spin variables (a) and (b) at sites a, b 2 Z2 to become statistically independent as the distance ja bj tends to 1. In such a pure state the two-point function, which is the expected value of the product of spin variables, h(a)(b)i, will tend to the square hi2 both below (hi 6¼ 0) and above (hi = 0) the critical temperature Tc as ja bj ! 1. To leading order, this clustering takes place at an exponential rate, ejabj=(T) , for a function (T) called the correlation length. The correlation length (T) ! 1 as T ! Tc . The scaling limit (from below Tc ) of the spin–spin correlation is the leading-order correction to the clustering behavior of the correlations when these correlations are examined at the scale of the correlation length. It is the limit hðaÞðbÞi ¼ lim
T"Tc
hððTÞaÞððTÞbÞiT hi2T
where the correlations on the right-hand side are thermodynamic correlations on the lattice at temperature T. Since hiT tends to 0 as T ! Tc , the normalization by hi1 T on the right produces an ‘‘infinite wave function renormalization’’ in the limit. Equivalently, one may think of this continuum limit being achieved by letting the lattice spacing shrink to 0 as T approaches Tc so that the correlation length stays fixed on the new scale. The scaling limit from above Tc turns out to be different from the scaling limit from below Tc and since hiT = 0 for T > Tc , it is defined by a different wave
661
function renormalization. The resulting asymptotics are expected to capture quite a lot about what is interesting in the behavior of the correlations near the phase transition. In the late 1970s, the scaling behavior in this model was also a prototype for the emerging connection between renormalization group ideas in quantum field theory and statistical mechanics. Wu et al. (1976) showed that the two-point scaling function, h(0)(x)i, is a function of r = jxj and can be written as 8 coshð =2Þ > > > 2 ! > Z > > 1 1 d > 2 > sds sinh exp > > > 4 r dt > > > > < ðT "T Þ c hð0ÞðxÞi ¼ > sinhð =2Þ > > > > 2 ! Z > > 1 1 d > 2 > sinh exp sds > > > 4 r dt > > > : ðT #Tc Þ ½1 where
= (r) satisfies the differential equation, d d r r ¼ sinhð2 Þ dr dr 2
The substitution = e transforms this differential equation into a Painleve´ equation of the third kind. This was used by McCoy, Tracy, and Wu (see Palmer and Tracy (1981) and references within) to study the short-distance behavior, r ! 0, of the scaling functions – behavior which is far from manifest in the infinite series expansions obtained for the scaling functions.
Deformation Theory Sato, Miwa, and Jimbo showed that there was a class of quantum field theories that included the scaling limits of the Ising model which have the property that the n-point correlations are ‘‘tau functions’’ for monodromy-preserving deformations of the Dirac equation in two dimensions. The twodimensional (Euclidean) Dirac operator is m 2@ D¼ 2@ m with 1 @ @ @¼ i ; 2 @x @y
1 @ @ þi @¼ 2 @x @y
662 Holonomic Quantum Fields
the usual complex derivatives acting on smooth functions on R2 . The monodromy-preserving deformations mentioned above are families of (multivalued) solutions w(x) to Dw ¼ 0
½2
which are branched at points aj 2 R2 (j = 1, 2, . . . , n) and change by a factor e2i‘j as x makes a small circuit about aj . SMJ (1979b) show that the L2 (R2 ) (square-integrable) solutions w(x) of the Dirac equation with this prescribed branching behavior comprise an n-dimensional subspace of L2 (R2 ). The constants e2i‘j are called the monodromy of the solutions and the term ‘‘deformation’’ in the description refers to the fact that the monodromy constants do not change as the branch points aj are varied. SMJ show that it is possible to choose a basis wj (x, a) = wj (x, a1 , . . . , an ) (j = 1, . . . , n) so that the vector Wðx; aÞ :¼ ½w1 ðx; aÞ; w2 ðx; aÞ; . . . ; wn ðx; aÞ becomes a section for a flat (Dirac compatible) connection in the (x, a) variables. That is, dx;a W ¼ ðx; aÞW where dx, a is the exterior derivative in the x and a variables and is a matrix-valued 1-form that satisfies the zero curvature condition, dx;a ¼ ^ They also introduced the notion of a tau function, (a), for such deformations. The logarithmic derivative da log (a) = !, where ! is a 1-form on R2 n{a1 , a2 , . . . , an } expressed in terms of the matrix elements of . The 1-form ! introduced by SMJ is shown to be closed when satisfies the zero curvature condition above. The scaling limit of the Ising model is related to the situation for monodromy multipliers 1 and when the scaling limits of correlations are identified as suitable -functions in this case, the WMTB result emerges when the nonlinear zero curvature condition is identified with a Painleve´ equation. The connection between the deformation theory and quantum field theory is developed in the computationally intensive paper SMJ (1980). Extensive use is made of local operator product expansions, analytic continuation, and formal series expansions that are infinite-dimensional analogs of Wick-type theorems for finite-dimensional spin representations (developed by SMJ (1978)). One can get a feeling for the source of the connection by recalling that in one of the ‘‘exact solutions’’ of the twodimensional Ising model the spin operators, (a), are identified as elements of an infinite spin
representation of the orthogonal group and are characterized by their linear action on Fermionic variables (Palmer and Tracy 1981). In the physics literature, the (a) are referred to as Bogoliubov transformations. In the scaling limit the associated representation space is the home to a free Fermi field (x), an operator-valued solution to the Dirac equation. Of course, (x) has components j (x) but for simplicity we will suppress such details in the mostly schematic discussion that follows. For coincident second coordinates x2 = a2 the Fermi field (x) and (a) satisfy the commutation relation ðaÞ ðxÞ ¼ sgnðx1 a1 Þ ðxÞðaÞ
½3
which is a surviving remnant of the linear action of (a) on lattice fermions. In the transfer matrix formalism, which is natural for statistical mechanics, translation in the ‘‘space’’ variable x1 is unitary, but translation in the ‘‘time’’ variable, x2 , is governed by the transfer matrix, the generator of a contractive semigroup. Because of this, the quantities that are well behaved in this formalism are ‘‘time-ordered vacuum expectations’’; these involve only ‘‘positive’’ powers of the transfer matrix. Let T denote the ‘‘time’’-ordering operator; a sequence of operators depending on coordinates in R2 is reordered following T so that the second coordinates appear in increasing order from left to right. Sign changes are incorporated whenever it is necessary to exchange Fermi type operators like (x) and (y) to put them in the correct order. In the Euclidean setting (pure imaginary time) it is well known that Gðx; yÞ ¼ hT
ðxÞ ðyÞi
is a Green function for the Dirac operator D (the distribution kernel for D1). This observation and [3] suggests that the hybrid vacuum expectation Gðx; y; aÞ ¼
hT
ðxÞ ðyÞða1 Þ ðan Þi hT ða1 Þ ðan Þi
should be the Green function for a Dirac operator with a domain containing ‘‘functions’’ branched at the points aj having ‘‘monodromy’’ 1 there. It is possible to recast the SMJ analysis so that a Dirac operator, D(a), on a suitable vector bundle with base R2 n{a1 , . . . , an } becomes the central player (see Palmer et al. (1994) and references therein). The data for the vector bundle includes the factors e2i‘j incorporated in transition functions for the bundle. The -function becomes an infinite determinant (or Pfaffian in the Ising case) ðaÞ ¼ det DðaÞ
½4
Holonomic Quantum Fields
in the Segal–Wilson sense (see Palmer et al. (1994) and references therein). The Green function G(x, y; a) has a finite-rank derivative, X da Gðx; y; aÞ ¼ rj ðx; aÞsj ðy; aÞ daj j
þ uj ðx; aÞvj ðy; aÞ d aj
½5
which is the key result in this version of the SMJ analysis (this observation appears in SMJ (1980) but does not have a central role there). The ‘‘wave functions’’ r, s, u, and v are closely related to the L2 wave functions wj described above. Equation [5] is both the source of the deformation equations for r, s, u, and v which arise from da2 G = 0 coupled with the rotational and translational symmetries of the Green function, and also of the expression for da log (a) in terms of data associated with the deformation theory. A ‘‘transfer matrix’’ calculation of the determinant allows one to make the connection with the scaling limits of lattice fields including the Ising model (see Palmer et al. (1994) and references therein). The short-distance behavior of the two-point function for the Ising model scaling functions has been rigorously calculated by Tracy and later by Tracy and Widom (see Harnad and Its (2002) and references therein). A less detailed analysis of the short-distance behavior of the n point functions that uses the deformation analysis of the correlations in a crucial way can be found in Palmer (2000).
The Riemann–Hilbert Problem In SMJ (1979b), a ‘‘massless’’ version of holonomic fields is developed. This concerns monodromypreserving deformations of the Cauchy–Riemann The techniques used to study this lead operator @. back to the Riemann–Hilbert problem – the problem of determining a linear differential equation in the complex plane with rational coefficients and prescribed monodromy at the poles of the coefficients. More specifically, suppose one is given n distinct points {a1 , . . . , an } in P1 , the Riemann sphere, and a base point a0 distinct from the aj , j 6¼ 0. Let j denote a simple closed curve based at a0 which winds counterclockwise once around aj but has winding number 0 for the other points ak , k 6¼ j. Choose n invertible p p matrices Mj which satisfy the single condition M1 M2 Mn = I. Then, the homotopy classes of the curves j are the generators for the fundamental group of the punctured sphere P 1 n{a1 , . . . , an } with base point a0 and the map which sends j ! Mj determines a representation of the fundamental group. One version of the
663
Riemann–Hilbert problem is to find p p complex matrices Aj for j = 1, . . . , n so that the linear differential equation dY X Aj ¼ Y dz z aj j
½6
has monodromy representation given by j ! Mj . This means that the fundamental solution Y(z) defined in a neighborhood of z = a0 and normalized so the Y(a0 ) = I (the identity) will become the fundamental solution Y(z)M1 after analytic conj tinuation around the curve j . This form of the problem does not always have a solution but when it does, it is interesting to consider deformations a ! Aj (a) that preserve the monodromy multipliers Mj . Such monodromy-preserving deformations were first considered by Schlesinger in 1912 (see Palmer and Tracy (1981) and references therein) and he discovered that the coefficients Aj must satisfy nonlinear differential equations that, for a0 = 1, can be written as da A j ¼
X Ak Aj k6¼j
ak aj
dðak aj Þ
SMJ introduced -functions associated with these deformations and they gave these -functions a quantum field theory interpretation as n point functions. Eventually this theory was extended to include the Birkhoff generalization of the Riemann– Hilbert problem, a generalization which incorporates the additional information needed to fix local holomorphic equivalence at higher-order poles (formal asymptotics and Stokes’ multipliers) (Jimbo and Miwa 1981, Sato et al. 1978). Roughly speaking, the problem is to reconstruct a global connection with specified singularities from its local holomorphic equivalence data and its global monodromy representation. Thinking of the differential equation [6] as a holomorphic connection proved very helpful in a geometric reworking of the SMJ analysis given by Malgrange (1983a, 1983b) who showed that the zeros of the -function occurred at points where a suitably defined Riemann–Hilbert problem fails to have a solution (see also Palmer (1999) references within). The mathematical significance of massless holonomic quantum fields as (quantized) singular elements of a gauge group is apparent from the SMJ work and later work of Miwa but the possibility of interesting physics in these models does not seem to have been much investigated at this time. These quantum fields are also conformal fields; however, a comprehensive integration into the highly developed formalism of conformal field theory on compact Riemann surfaces has not currently been developed
664 Holonomic Quantum Fields
(an analog of [5] should survive on compact Riemann surfaces but the deformation analysis of the correlations is likely limited to symmetric spaces).
Further Developments This work on massless holonomic fields and the connection with the Riemann–Hilbert problem is doubtless the aspect of holonomic fields with the most ‘‘spin offs’’ in the mathematics and physics literature. These include an analysis of the deltafunction gas done by Jimbo, Miwa, Mori, and Sato in 1981, random matrix models first looked at by Jimbo, Miwa, Mori, and Sato and later systematically investigated by Tracy and Widom (1994), the deformations of line bundles on Riemann surfaces that led to KdV in the work of Segal and Wilson (1985), which emerged from work of Sato, Miwa, Jimbo and collaborators, the analysis of Painleve´ equations starting with work of McCoy, Tracy and Wu (see Palmer and Tracy (1981) and references within) and more systematically developed by Its and Novokshenov (1986), and the revival of interest in monodromy-preserving deformations (Harnad and Its 2002). Holonomic fields are related to free fields in a well-understood way and it is natural to study them in situations where free fields make sense. In particular, they are an interesting testbed for the nonperturbative investigation of the influence of geometry (or curvature) on quantum fields. In Palmer et al. (1994), the deformation analysis of -functions for holonomic fields is carried out for the Poincare´ disk. The two-point functions are shown to be expressible in terms of solutions to the family of Painleve´ VI equations. A quantum field theory interpretation of these -functions is given by Doyon and there are natural analogs of the scaling limit of the Ising model on the Poincare´ disk as well. The role of ‘‘spacetime’’ symmetries in the deformation theory suggests that such analysis will be limited to symmetric spaces. In addition to the plane and the Poincare´ disk, the cylinder, the sphere, and the torus round out the possibilities in two dimensions. Lisovyy has recently worked out the analysis for the cylinder, which is important for the study of thermodynamic correlations. It should be possible to recast the analysis of the continuum Ising model on the torus (Zuber and Itzykson 1977) in deformation theoretic terms. It does not appear that the holonomic fields associated with the Dirac operator for the constant curvature metric on the 2-sphere have been studied yet.
See also: Deformation Theory; Integrable Systems: Overview; Isomonodromic Deformations; Riemann–Hilbert Problem; Two-Dimensional Ising Model.
Further Reading Harnad J and Its A (2002) CRM Workshop: Isomonodromic Deformations and Applications in Physics, vol. 31 American Mathematical Society. Its AR and Novokshenov VYu (1986) The Isomonodromic Deformation Method in the Theory of Painleve´ Equations, Lecture Notes in Mathematics, vol. 1191. Berlin: Springer. Jimbo M and Miwa T (1981) Monodromy preserving deformations of linear ordinary differential equations with rational coefficients II. Physica D 2: 407–448. Jimbo M, Miwa T, and Ueno K (1981) Monodromy preserving deformations of linear ordinary differential equations with rational coefficients I. Physica D 2: 306–352. Malgrange B (1983a) Sur les de´formations isomonodromiques I, Singularite´ s re´ gulie` res, Progress in Mathematics vol. 37, pp. 401–426. Boston: Birkha¨ user. Malgrange B (1983b) Sur les de´formations isomonodromiques II, Singularite´s irre´gulie`res, Progress in Mathematics vol. 37, pp. 427–438. Boston: Birkha¨user. Palmer J (1999) Zeros of the Jimbo, Miwa, Ueno tau function. Journal of Mathematical Physics 40: 6638–6681. Palmer J (2000) Short distance asymptotics of Ising correlations. Journal of Mathematical Physics 7: 1–50. Palmer J, Beatty M, and Tracy C (1994) Tau functions for the Dirac operator on the Poincare´ disk. Communications in Mathematical Physics 165: 97–173. Palmer J and Tracy CA (1981) Two dimensional Ising correlations: Convergence of the scaling limit. Advances in Applied Mathematics 2: 329–388. Sato M, Miwa T, and Jimbo M (1978) Holonomic quantum fields I. Publications of the Research Institute for Mathematical Sciences, Kyoto University 14: 223–267. Sato M, Miwa T, and Jimbo M (1979a) Holonomic quantum fields II. Publications of the Research Institute for Mathematical Sciences, Kyoto University 15: 201–278. Sato M, Miwa T, and Jimbo M (1979b) Holonomic quantum fields III. Publications of the Research Institute for Mathematical Sciences, Kyoto University 15: 577–629. Sato M, Miwa T, and Jimbo M (1979c) Holonomic quantum fields IV. Publications of the Research Institute for Mathematical Sciences, Kyoto University 15: 871–972. Sato M, Miwa T, and Jimbo M (1980) Holonomic quantum fields V. Publications of the Research Institute for Mathematical Sciences, Kyoto University 16: 531–584. Tracy CA and Widom H (1994) Fredholm determinants, differential equations and matrix models. Communications in Mathematical Physics 163: 33–72. Wu TT, McCoy B, Tracy CA, and Barouch E (1976) Spin–spin correlation functions for the two dimensional Ising model: exact theory in the scaling region. Physical Review B 13: 316–374. Wilson G and Segal G (1985) Loop groups and equations of KdV type. Publications of Mathe´matiques IHES 61: 5–65. Zuber JB and Itzykson C (1977) Quantum field theory and the two dimensional Ising model. Physical Review D 15: 2875–2884.
Homeomorphisms and Diffeomorphisms of the Circle
665
Homeomorphisms and Diffeomorphisms of the Circle A Zumpano and A Sarmiento, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil ª 2006 Elsevier Ltd. All rights reserved.
Introduction In this article we consider the following question: which homeomorphisms of the circle transport one given class of continuous functions into another? The allowed classes of functions are Banach spaces contained in C(T), the space of continuous functions on the unit circle T, and will be defined by the properties of the Fourier series of the functions. Next, we will develop the theory of Poincare´– Denjoy which describes some basic geometric properties about diffeomorphisms of the circle such as existence and properties of the rotation number, classifications of possible orbits of diffeomorphisms, and Denjoy counterexample. A homeomorphism of the circle is regarded here as a change of variables for periodic functions. So, it will be our major concern to describe the changes of variables that do not affect ‘‘too much’’ the behavior of the Fourier series of the functions in the given class. We say that a function h : R ! R is a homeomorphism of the circle T = {(x, y) 2 R 2 : x2 þ y2 = 1}, if h itself is a homeomorphism such that h(t þ 2) = h(t) 2 for all t 2 R. It is clear that such function h induces a unique homeomorphism e h : T ! T that makes the following diagram commutative: e h T ! T # eit; i:e:; e hðeit Þ ¼ eihðtÞ eit # R ! R h In the same way, we identify functions e : T 7! C with 2-periodic functions : R 7! C. Let U(T) be the space of all continuous functions on T that have uniformly convergent Fourier series, and let A(T) be the space of all continuous functions on T with absolute convergent Fourier series. In 1953, Beurling and Helson proved an important result about the homeomorphisms that preserve the space A(T): they are rotations and symmetries, that is, if f h 2 A(T) for all f 2 A(T), then the homeomorphism h must have the form h(t) = t þ or h(t) = t þ . It is quite obvious that rotations and symmetries preserve A(T), since the Fourier coefficients of f h and f have the same modulus, but to prove the converse is very hard. So, homeomorphisms that preserve A(T) are a very restrict class.
A wider class is obtained when we transport A(T) into U(T), that is, f h 2 U(T) for all f 2 A(T). The major object of this article is to study such changes of variables. We say that a homeomorphism of the circle h is of finite type, if there is an integer , satisfying 3 < 1, such that h is of class C and jh00 ðtÞj þ jhð3Þ ðtÞj þ þ jhðÞ ðtÞj 6¼ 0;
for all t 2 R
In the realm of Fourier analysis, the most important and general result about homeomorphisms of the circle is due to R Kaufman, who showed in 1974 that a finite-type homeomorphism h transports A(T) into U(T). We shall analyze in detail such seminal result.
Homeomorphism of the Circle of Finite Type In this section we prove the theorem of R Kaufman mentioned before, which means that it is sufficient for a homeomorphism of the circle h to have a certain amount of curvature in order to transport A(T) into U(T). We present a simple proof of this fact, based on a result due to Stein and Wainger. If f : T ! C is a continuous function and if Z ^fn ¼ 1 f ðtÞeint dt; n 2 Z 2 denote the Fourier coefficients of f, then f 2 A(T) if and only if X n2Z
j^fn j ¼ lim
N!1
N X
j^fn j < 1
N
Of course, A(T) is a Banach space with the norm X j^fn j kf kAðTÞ ¼ n2N
The space U(T) is defined as the space of all continuous functions f : T ! C such that N X
^fn eint ! f ðtÞ;
when N ! 1;
for all t 2 ½;
N
uniformly on T, that is, U(T) is the space of continuous functions from T to C that are the uniform limit of their Fourier partial sums SN ðf ; tÞ ¼
N X N
^fn eint
666 Homeomorphisms and Diffeomorphisms of the Circle
Hence, under the natural norm given by kf kUðTÞ ¼ supfjSN ðf ; tÞj: N 2 N ¼ f0; 1; . . .g and t 2 ½; g
Lemma 3 (Kaufman 1974). Let f be a real function of class Ck on the interval [r, r], with k 2. Suppose that 1 jf (k) (t)j b for all t 2 [r, r]. Then Z
the space U(T) is a Banach space. We shall prove:
if ðtÞ
e
r
Theorem 1 (Kaufman 1974). Let h be a homeomorphism of the circle of class C , with 3. Suppose that jh00 ðtÞj þ jhð3Þ ðtÞj þ þ jhðÞ ðtÞj 6¼ 0;
for all t 2 R
Then, h transports A(T) into U(T), that is, f h 2 U(T), whenever f 2 A(T). It follows from the theorem that an analytic homeomorphism of the circle transports A(T) into U(T). To see this, suppose that h is not of finite type. Then, for each n 3, there exists tn 2 [, ] such that h(j) (tn ) = 0 for all j 2 {2, . . . , n}. Since {tn } has a convergent subsequence, there exists t 2 [, ] such that h(j) (t) = 0 for all j 2. This implies that h00 must be a constant function and, therefore, h(t) = t þ . Since we know that this kind of homeomorphism preserves A(T), we are done. One can ask why to demand 3. The answer is easy. Since h(t þ 2) = h(t) 2 for all t 2 R, it follows that h0 (t þ 2) = h0 (t) for all t 2 R, that is, h0 is a periodic function of period 2. So, it will always exist a point t 2 (, ) such that h00 (t) = 0. We can also infer from the theorem that a C1 homeomorphism of the circle that has no flat point, that is, no point t such that h(j) (t) = 0 for all j 2, transports A(T) into U(T). This is obvious, because the negation of being of finite type implies the existence of a flat point. It is not true, however, that every C1 homeomorphism of the circle transports A(T) into U(T). The proof of the theorem is based on the two lemmas that follows. The first lemma was obtained by Stein and Wainger, who proved it in a more general setting in 1965, although that proof was only published five years later. The second lemma was proved by R Kaufman in 1974. Lemma 2 (Stein and Wainger 1970). Let p(t) be a real polynomial of degree d. Then Z r ipðtÞ dt 6ð2dþ1 Þ 2d 10 e t r d X ¼ 2d þ 2 ½3ð2k Þ 2 k¼0
for all r > 0.
r
dt Cðk; bÞ t
where C(k, b) is a constant that depends only on k and b. We shall see that Lemma 3 can be proved from Lemma 2 in a quite simple way. The proof given by R Kaufman for Lemma 3 does not make use of Lemma 2 at all. Also, it is not difficult to see that Lemma 2 follows from Lemma 3, if we consider d 2. So, they are indeed equivalent results. Before getting into the proof of these two lemmas, let us state a result which is the primary tool in dealing with oscillatory integrals as those in the lemmas. Lemma 4 (Van der Corput lemma). Let f be real valued and smooth in [a, b], with 0 < a < b. Suppose that jf (k) (t)j > 0 for all t 2 [a, b]. Then Z b 1=k if ðtÞ dt ½3ð2k Þ 2 e a t a holds if (i) k 2, and (ii) k = 1 and f 0 (t) is monotonic. Now, let us prove the two lemmas and Theorem 1. Proof of Lemma 2 The proof is by induction on the degree of the polynomial. Suppose that p(t) is a polynomial of degree 0, that is, p(t) is a constant function. In this case the result is trivial, since the integral is equal to zero. By induction, assume that the statement is true for polynomials of degree less than or equal to d. Let pðtÞ ¼ adþ1 tdþ1 þ ad td þ þ a1 t þ a0 ;
adþ1 6¼ 0
Make the change of variables t = jadþ1 j1=(dþ1) s. Then we have Z
r
r
eipðtÞ
dt ¼ t
Z
eiðqðtÞt
dþ1
Þ
dt t
Homeomorphisms and Diffeomorphisms of the Circle
where = jadþ1 j1=(dþ1) r and q(t) is a polynomial of degree at most equal to d. Suppose > 1. Then Z Z iðqðtÞtdþ1 Þ dt iðqðtÞðtÞdþ1 Þ dt e e t t 1 Z dþ1 dt þ eiðqðtÞt Þ t 1 Z 1 iðqðtÞt dþ1 Þ dt þ e t 1
I þ II þ III
To evaluate III, we proceed as following: f ðtÞ ¼ f ð0Þ þ f 0 ð0Þt þ þ ¼ pðtÞ þ f ðkÞ ðt tÞ
f ðk1Þ ð0Þ k1 f ðkÞ ðt tÞ k t t þ ðk 1Þ! k!
tk k!
where the number t depends on t and 0 < t < 1. So Z 1 if ðtÞ dt e t 1 Z 1 h i dt Z 1 ðkÞ tk ipðtÞ dt þ eiðpðtÞþf ðt tÞ k! Þ eipðtÞ e t t 1
By Van der Corput lemma, I [3(2dþ1 ) 2] and II [3(2dþ1 ) 2], so I þ II 6(2dþ1 ) 4. Now Z 1 h i dt Z 1 dþ1 iqðtÞ dt þ eiðqðtÞt Þ eiqðtÞ e III t t 1 1 Z 1 jtjd dt þ 6ð2dþ1 Þ 2d 10 1
2 þ 6ð2dþ1 Þ 2d 10 since the degree of q(t) is at most equal to d. So I þ II þ III 6ð2dþ1 Þ 4 þ 2 þ 6ð2dþ1 Þ 2d 10
667
1
2 b þ 6ð2k Þ 2ðk 1Þ 10 k! by Lemma 2, since p(t) is a polynomial of degree at most equal to k 1. On the other hand, if r 1, it also follows from Lemma 2 that Z r if ðtÞ dt e t r Z r h i Z r k iðpðtÞþf ðkÞ ðt tÞtk! ipðtÞ dt ipðtÞ dt þ e e e t t r r 2 b þ 6ð2k Þ 2ðk 1Þ 10 k!
¼ 6ð2dþ1 Þ 2ðd þ 1Þ 10 On the other hand, if 1, then Z iðqðtÞt dþ1 Þ dt e t Z h i Z iðqðtÞt dþ1 Þ iqðtÞ dt þ e e t
2 þ 6ð2
dþ1
2 þ 6ð2 6ð2
Z
e
iqðtÞ
dþ1
dþ1
Hence
dt t
Þ 2d 10 þ 6ð2
Þ4
Þ 2ðd þ 1Þ 10
and the proof is completed.
e r
if ðtÞ
dt Cðk; bÞ t
and we are done.
Þ 2d 10 dþ1
r
&
Proof of Lemma 3 Assume first that r > 1. Then Z r if ðtÞ dt e t r Z r Z Z dt r if ðtÞ dt 1 if ðtÞ dt þ eif ðtÞ þ e e t t 1 t 1 1 ¼ I þ II þ III Since jf (k) (t)j 1 and k 2, then by Van der Corput lemma, I [3(2k ) 2] and II [3(2k ) 2]. (Note that we have to assume k 2 in order to apply Van der Corput lemma, since we know nothing about the monotonicity of f 0 (t).)
&
Proof of Theorem 1 Let h be a homeomorphism of the circle satisfying the hypotheses of the theorem. We claim: there exists > 0 such that, for all x 2 [, ], there is k depending on x, with 2 k , such that jh(k) (t þ x)j for all t satisfying jtj . The proof of the claim is simple: suppose that there is no such . Then, for each n 2 N and each k with 2 k , there exist xn 2 [, ] and tkn such that jtnk j 1=n and jh(k) (tnk þ xn )j < 1=n. Taking a subsequence if necessary, we have xn ! x 2 [, ]. Also, tkn ! 0 when n ! 1 for all such k. So, h(k) (tkn þ xn ) ! h(k) (x) when n ! 1. Since jh(k) (tkn þ xn )j < 1=n, we conclude that h(k) (x) = 0 for all k with 2 k , thus reaching aPcontradiction. 1 ^ Now, let f 2 A(T). So, 1 jf n j < 1, thus implying that f ðtÞ ¼
1 X 1
^f eint ¼ lim n
N!1
N X N
^f eint n
668 Homeomorphisms and Diffeomorphisms of the Circle
Hence f ðhðtÞÞ ¼
1 X
^f einhðtÞ ¼ lim n
N!1
1
N X
where C is a constant that does not depend on m, n, and x. To prove that the oscillatory integral above is bounded, we make use of Lemma 3. We have that
^f einhðtÞ n
N
P ^ inh(t) . Since gN is smooth, Put gN (t) = N n = N f n e we have gN 2 U(T) for all N 2 N. If g(t) stands for f(h(t)), then gN ! g uniformly, since f 2 A(T). Thus, it suffices to prove that P g 2 U(T).ikxThis happens if and only if Sm (g, x) = m gk e converges unik = m ^ formly to g as m ! 1, that is, given > 0, there exists m0 2 N such that jSm (g, x) g(x)j < for all m > m0 and x 2 [, ]. We have jSm ðg; xÞ gðxÞj jgN ðxÞ gðxÞj þ jSm ðgN ; xÞ gN ðxÞj þ jSm ðgN ; xÞ Sm ðg; xÞj for all m, n 2 N. Since gN ! g uniformly and gN 2 U(T), the last inequality shows that we need to demonstrate that, for each > 0, there exists N0 2 N such that jSm ðgN ; xÞ Sm ðg; xÞj < 8 N > N0 ; x 2 ½; and
m2N
thus proving that Sm (gN , x) ! Sm (g, x) uniformly in x and m when N ! 1. But, if K > N 2 N, we have jSm ðgN ; xÞ Sm ðgK ; xÞj Z 1 ðgN ðt þ xÞ gK ðt þ xÞÞDm ðtÞ dt ¼ 2 Z ! 1 X N K X inhðtþxÞ inhðtþxÞ ^f e ^f e ¼ n 2 n¼N n n¼K Dm ðtÞ dt Z ! 1 X inhðtþxÞ ^f e ðtÞ dt D m n 2 KjnjN Z 1 X ^ jf n j einhðtþxÞ Dm ðtÞ dt 2 KjnjN
m X k¼m
eikt ¼
sinðm þ ð1=2ÞÞt sinðt=2Þ
is the Dirichlet kernel. Hence, we are done if we show that Z inhðtþxÞ e Dm ðtÞ C
2 sinðmtÞ þ Oð1Þ t
on any compact subset of (2, 2), that is, sinðm þ ð1=2ÞÞt 2 sinðmtÞ sinðt=2Þ t t cosðt=2Þ 2 sinðt=2Þ þ 1 C
t sinðt=2Þ where the constant C does not depend on m, on any compact subset of (2, 2). In order to prove [1], consider x 2 [, ]. We have already proved that there exists k (depending on x), with 2 k , such that jh(k) (t þ x)j > 0 for all t such that jtj . Therefore, Z Z inhðtþxÞ sinðmtÞ inhðtþxÞ sinðmtÞ dt e dt e t t þ 2 log We can assume that n is a positive integer: if n is negative, we take complex conjugate; and if n = 0, the integral is trivially bounded, as we see by integration by parts or by Van der Corput lemma. (Indeed, we do not need to worry about n = 0, since it is necessary to bound the integral only for large n.) So, assuming that n is a positive integer, we change variables: define t = rs, where r = n1=k 1=k . Since sin(mt) = (eimt eimt )=(2i), we have Z einhðtþxÞ sinðmtÞ dt t Z Z i½nhðtþxÞþmt dt i½nhðtþxÞmt dt e þ e t t Z =r ds ¼ ei½nhðrsþxÞþmrs =r s Z =r i½nhðrsþxÞmrs ds þ e =r s
where Dm ðtÞ ¼
Dm ðtÞ ¼
Put (t) = nh(rt þ x) þ mrt and (t) = nh(rt þ x) mrt. We have (k) (t) = nrk h(k) (rt þ x) and (k) (t) = nrk h(k) (rt þ x). But, since nrk = 1=, we conclude that
½1
ðkÞ ðkÞ ðtÞ ¼ ðtÞ 1 ¼ hðkÞ ðrt þ xÞ 1;
8t 2 ; r r
Homeomorphisms and Diffeomorphisms of the Circle
Also, ðkÞ ðkÞ ðtÞ ¼ ðtÞ bk n o 1 ¼ max jhðkÞ ðsÞj: 2 s 2 for all t 2 [=r, =r]. Therefore, by Lemma 3, we get Z =r dt ei ðtÞ Cðk; bk Þ =r t
These conditions immediately yield the following: the transformation hk := h h is monotonically increasing and hk (t þ r) = hk (t) þ r, t 2 R, k 2 N, r 2 Z. The rotation number gives an asymptotic indication (i.e., in the limit) of the average amount of rotation of a point along an orbit. We start by defining, for a lift h of e h, the number hk ðtÞ t k k!1
maxfCð j; bj Þ: 2 j g
0 ðh; tÞ ¼ lim
and Z =r ei =r
This limit exists and does not depend on the choice of the point t 2 R; so, we denote it by
0 (h). If h1 h2 are two lifts of e h, then 0 (h1 , t)
0 (h2 , t) is an integer, so
dt ðtÞ Cðk; bk Þ t maxfCð j; bj Þ: 2 j g
This concludes the proof.
669
ðe hÞ :¼ 0 ðh; tÞmod 1
&
Diffeomorphisms of the Circle In this section we study the circle diffeomorphisms. This theory goes back to Poincare´ (1885), who studied circle diffeomorphisms to decide when differential equations on the torus have periodic orbits of a specified type. For this he introduced the rotation number as an important dynamical invariant, which later turned out to be very fruitful in the theory of dynamical systems, and proved that a diffeomorphism with an irrational rotation number is combinatorially equivalent to a rotation with the same rotation number. Denjoy (1932) constructed examples of diffeomorphisms of class C1 with irrational rotation number having wandering intervals, in opposition to early ideas of Poincare´. It was necessary to assume that a diffeomorphism without periodic points is more smooth, in fact C2 , to prove that it is topologically conjugate to the rotation.
is well defined. The number (e h) 2 [0, 1) is called the rotation number of e h, and depends continuously on e h. For detailed proof, see Katok and Hasselblatt (1995) or Robinson (1999). Theorem 5 The rotation number (e h) is rational if and only if e h has a periodic point, this is, there exist z0 2 S1 and k 2 N such that e hk (z0 ) = z0 . Proof Take a lift h of e h such that h(0) 2 [0, 1). Suppose that (e h) = q=m. If e h has no fixed point. Then h(t) t 2 R n Z for all t 2 R, since h(t) t 2 Z implies that p(t) is a point fixed for e h. In particular, h(t) t6 = q for all t 2 R, since h id is continuous and periodic, there exist real numbers a > 0 such that h(t) t < q a for all t 2 R. Then hkm ðtÞ hðk1Þm ðtÞ ¼ hm ½hðk1Þm ðtÞ ½hðk1Þm ðtÞ < q a;
8k 2 N
+ hkm ðtÞ t
The Poincare´ Rotation Number
Let e h : T ! T be an orientation-preserving homeomorphism. Given such a map, there is a (nonunique) map h : R ! R, which is called a lift of e h, such that e h p = p h, where p : R ! T is covering map p(t) = e2it . A lift, h, of e h satisfies: 1. h is monotonically increasing, that is, h(t1 ) h(t2 ) if t1 < t2 . 2. h(t þ 1) = h(t) þ 1 for all t 2 R, so (h id) has period 1. 3. If h1 h2 are two lifts of e h, then there is an integer k such that h2 (t) = h1 (t) þ k for all t 2 R.
¼ fhm ½hðk1Þm ðtÞ ½hðk1Þm ðtÞg þ fhm ½hðk2Þm ðtÞ ½hðk2Þm ðtÞg þ fhm ½hðk3Þm ðtÞ ½hðk3Þm ðtÞg þ þ fhm ðtÞ tg < kðq aÞ So q hmk ðtÞ t ¼ ðe hÞ ¼ lim m mk k!1 1 kðq aÞ q a lim ¼ m k!1 k m proving the claim by contraposition.
670 Homeomorphisms and Diffeomorphisms of the Circle
To see the converse, assume that there exists a periodic point t0 2 R, that is, there are m, q 2 Z such that hm (t0 ) = t0 þ q then hkm ðt0 Þ ¼ t0 þ kq hmk ðt0 Þ t0 q & ) ðe hÞ ¼ lim ¼ m mk k!1 Corollary 6 A homeomorphism e h : T ! T does not have periodic points if and only if the rotation number (e h) is irrational. Let R be defined on T by R (e2it ) = e2i(tþ) . This map is called a rigid rotation of angle and it is easy to see that h (t) = t þ is lift of R and that
(R ) = (h ) = mod 1. In this example we can see the connection between the rationality of the rotation number and the existence of a periodic orbit. Assume = m=q is q rational. Then h (t) = t þ q = t þ m. Therefore, every point is periodic with period q. Now, assume that is irrational. Since hn (t) = t þ n for all n, then R has no periodic points. In this case, show that every point in T has a dense orbit. Now, again let e h : T ! T be any orientationpreserving homeomorphism. Lemma 7 If the rotation number of e h is rational, then all periodic orbits have the same period. e = m=q with m, q 2 Z relatively prime, Proof If (h) then we need to show that for any periodic point z0 = p(t) (where p(t) = e2it is a covering space projection of T) there is a lift h of e h such that h(0) 2 [0, 1) for which hq (t) = t þ m. If z0 is periodic point, then hr (t) = t þ s for some r, s 2 Z and m hrn ðtÞ t ns s ¼ ðe hÞ ¼ lim ¼ lim ¼ n!1 q nr nr r So that s = km and r = kq. Then by monotonicity of & h, we have that hq (t) = t þ m as claimed. The Poincare´ Denjoy Theory
A homeomorphism of the circle with rational rotation number has all its orbits asymptotic to periodic ones and this, together with Theorem 5, yields a complete classifications of the possible asymptotic behavior when the rotation number is rational. This motivates the study of the asymptotic behavior of orbits of homeomorphisms with irrational rotation number. The !-limit set of a point z0 2 T with respect to e h is the set !(z0 ) = {z 2 T; e hnk (z0 ) ! z as nk ! 1, for same sequence {nk }1 k = 1 }. The -limit set (z0 ) of an arbitrary point z0 2 T is defined similarly (with nk ! 1 instead nk ! þ1).
Any orbit of a rotation R with irrational is dense in T, that is, !(z0 ) = (z0 ) = T for all z0 2 T. Theorem 8 (Poincare´ 1885). Let e h : T ! T be an orientation-preserving homeomorphism with irrational rotation number. Then the !-limit set is independent of x and is either T or perfect and nowhere dense. The preceding proposition says that maps with irrational rotation number have either all orbits dense or all orbits asymptotic to a Cantor set. We say that two maps f , g : T ! T are topologically conjugate if there exists a homeomorphism h : T ! T such that h f = g h. This implies that h f n = gn h for every integer n. Hence, the conjugacy h maps orbits of f into orbits of g. If a monotone map l : T ! T satisfies l f = g l but is not a necessarily homeomorphism, we only have that inverse image of each point is either a point or a closed interval. We say that l is a semiconjugacy between f and g; this case l maps orbits or pack of orbits of f into orbits of g. Theorem 9 (Denjoy 1932). Let e f : T ! T be an orientation-preserving diffeomorphism of class C2 , with irrational rotation number ( (e f ) = ). Then e f is topologically conjugate to the rigid rotation R . Note that in spite of the hypothesis of e f being C2 , we obtain only a continuous conjugacy. It took almost 50 years until Michael Herman (1979) was able to solve the more difficult problem of obtaining a smooth conjugacy for rotation number satisfying extra arithmetic conditions. If e f is a circle homeomorphism which does not have periodic points, then there exists a semiconjugacy h between e f and a rotation R . If h is not a conjugacy, then there exists a point x of the circle whose inverse image by h is an interval J. Since n he f = R h, we have that h(~f (J)) = Rn (x). It follows that the intervals of the family { J, f ( J), f 2 ( J), . . . } are pairwise disjoint, and the !-limit set of J does not reduce to a periodic orbit. We say that J is a wandering interval of the map e f. Thus, C2 -differentiability implies that e f does not have a wandering interval. For details of the proof of Theorem 9, see Melo and Strien (1993). The Denjoy Example
Denjoy also proved the following result, which shows that the hypothesis of class C2 is essential. Theorem 10 (Denjoy 1932). For any irrational number 2 [0, 1), there exists a C1 -circle diffeomorphism f which has a wandering interval, and rotation number equal to .
Homeomorphisms and Diffeomorphisms of the Circle
Proof The construction of a diffeomorphism with wandering interval will be done in the following manner. Given an irrational rotation R (e2it ) = e2i(tþ) , cut the circle T at all the points of an orbit {zn = Rn (e2it0 ); n 2 Z} of R . In P each cut insert a 1 segment Jn of length ln where n = 1 ln = 1. We obtain in this manner a new circle longer than the first. The open intervals correspond to the gaps of the Cantor set. In order to construct f formally. Let ln be a sequence of positive real numbers with n 2 Z satisfying (i) (ii) (iii) (iv) (v)
lim P1n ! 1 (lnþ1 /ln ) = 1 n = 1 ln = 1 ln > lnþ1 for n 0 ln < lnþ1 for n < 0 and 3lnþ1 ln > 0 for n 0
For example ln ¼ Tðjnj þ 2Þ1 ðjnj þ 3Þ1
671
Thus, f 0 (an ) = 1 = f 0 (bn ). Notice that for n < 0, lnþ1 ln > 0, that 6ðlnþ1 ln Þ ln 2 3lnþ1 ln 1 f 0 ðxÞ 1 þ ¼ ln3 2 2ln and (3lnþ1 ln )=(2ln ) goes to 1 as n ! 1. Similarly for n 0 and x 2 Jn , 1 f 0 ðxÞ
3lnþ1 ln >0 2ln
so f 0 (x) goes to 1 as n ! þ1 uniformly for x 2 Jn . From these facts, it follows that f is uniformly C1 on the union of the interiors of the Jn and has a C1 extension to all of T. Let = Tn[n2Z int(Jn ). This is a Cantor set. The orbit of a point x 2 is dense in since it is like the orbit of 0 for R . Thus, !(x) = . If x 2 int(Jn ), then there is a smaller interval I whose closure is contained in int(Jn ). Since the interval Jn never returns to Jn but wanders among the other Jk , then & Jn is a wandering interval.
where 1 X
T 1 ¼
ðjnj þ 2Þ1 ðjnj þ 3Þ1
n¼1
Let Jn be a closed interval of length ln . We place these intervals on the circle in the same order as the order of the orbit Rn (0). So to place an interval Jn , consider the sum of the lengths of the intervals Ji where Ri (0) is between Rn (0) and 0. This determines the placement of Jn . The next step is to define f on the union of the Jn . It is necessary and sufficient for f 0 (t) = 1 on the endpoint in order for the map to have a continuous derivative when it is extended to the closure. Assume Jn = [an , bn ], so ln = bn an . The integral Z bn l3 ðbn tÞðt an Þdt ¼ n 6 an so 6ðlnþ1 ln Þ ln3
Z
bn
ðbn tÞðt an Þdt ¼ lnþ1 ln an
Therefore, if we define f for x 2 Jn by f ðxÞ ¼ anþ1 Z x 6ðlnþ1 ln Þ þ 1þ ðbn tÞðt an Þ dt ln3 an then f (bn ) = anþ1 þ ln þ lnþ1 ln = bnþ1 . Also, f is differentiable on Jn with f 0 ðxÞ ¼ 1 þ
6ðlnþ1 ln Þ ðbn xÞðx an Þ ln3
Further Results In this section we shall state some additional results about homeomorphisms of the circle in the area of Fourier analysis. The first result is a theorem of Pa´l (1914) and Bohr (1935): let f : T ! R be a real continuous function; then, there exists a homeomorphism of the circle h such that f h 2 U(T). The best proof of this theorem is due to Salem (1945). In 1978, Kahane and Katznelson showed that the result is still valid for f : T ! C continuous. A similar question was posed by Lusin: given a continuous function f : T ! R, is there a homeomorphism of the circle h such that f h 2 A(T)? The problem remained open until 1981, when Olevskii, Kahane, and Katznelson answered negatively the question: there exists a real (or complex) continuous function f on the circle, such that, for all homeomorphism of the circle h, f h 62 A(T). It was proved by the author that there are C1 homeomorphisms of the circle, not necessarily of finite type, that transport A(T) into U(T). It is a very technical work, published in 1998, and it gives a necessary and sufficient condition for a homeomorphism of the circle with a flat point to transport A(T) into U(T). Finally, the Denjoy theorem (Theorem 9) is rather close to being optimal. The example constructed here can be improved by obtaining a circle diffeomorphism whose first derivatives have Ho¨lder exponent arbitrarily close to 1 (see Katok and Hasselblatt (1995)). Recent
672 Homoclinic Phenomena
work has dealt with the existence of a differentiable conjugacy between a diffeomorphism f with irrational rotation number and R . Arnol, Moser, and Herman have obtained results (see Melo and Strien (1993) for a discussion of this results and references).
Acknowledgments The author was supported in part by CNPqBrazil. and was partially supported by FAPESP Grant # TEMA´TICO 03/03107-9. See also: Chaos and Attractors; Ergodic Theory; Generic Properties of Dynamical Systems; Random Dynamical Systems; Wavelets: Mathematical Theory.
Further Reading Denjoy A (1932) Sur les courves de´finies par les e´quations differe´ntialles a la surface du tore. Journal de Mathe´matiques Pures et Applique´es 11(9): 333–375. Herman M (1979) Sur les conjugation diffe´rentiable des diffe´omorphisms du cercle a` des rotations. Publ. Math. IHES 49: 5–233. Kahane JP (1970) Se´ries de Fourier Absolument Convergentes, p. 84. Berlin: Springer. Kahane JP (1983) Quatre Lec¸ ons sur les Home´omorphismes du Circle et les Series de Fourier. In: Topics in Modern Harmonic Analysis, Proceedings of a Seminar Held in Torino and Milano, May–June 1982, vol. II, pp. 955–990. Roma.
Katok A and Hasselblatt B (1995) Introduction to the Modern Theory of Dynamical Systems. Cambridge: Cambridge University Press. Katznelson Y (1976) An Introduction to Harmonic Analysis, 2nd edn, p. 217. New York: Dover. Melo W and Strien S (1993) One-Dimensional Dynamics, ch. 1. Berlin: Springer. Poincare´ H (1881) Me´moire Sur les courves de´finies par les e´quations differe´ntialles I. Journal of Mathe´matiques Pures et Applique´es 3(7): 375–422. Poincare´ H (1882) Me´moire Sur les courves de´finies par les e´quations differe´ntialles II. Journel de Math. Pure et Appl. 8: 251–286. Poincare´ H (1885) Me´moire Sur les courves de´finies par les e´quations differe´ntialles III. Journel de Math. Pure et Appl. 4(series 1): 167–244. Poincare´ H (1886) Me´moire Sur les courves de´finies par les e´quations differe´ntialles IV. Journel de Math. Pure et Appl. 2: 151–217. Robinson C (1962) Dynamical Systems: Stability, Symbolic Dynamicas, and Chaos. London: CRC Press. Robinson C (1999) Dynamical Systems: Stability, Symbolic Dynamics, and Chaos. London: CRC Press. Rudin W (1962) Fourier Analysis on Groups, ch. 4. New York: Wiley. Stein EM (1986) In: Stein EM (ed.) Oscillatory Integral and Fourier Analysis, Beijing Lectures in Harmonic Analysis, pp. 307–355. Princeton: Princeton University Press. Stein EM and Wainger S (1970) The estimation of an integral arising in multiplier transformation. Studia Math. 35: 101–104. Zumpano A (1998) Infinite type homeomorphisms of the circle and convergence of Fourier series. Transactions of the American Mathematical Society 350(10): 4023–4040.
Homoclinic Phenomena S E Newhouse, Michigan State University, E. Lansing, MI, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Homoclinic orbits (or motions) were first defined by Poincare´ in his treatise on the ‘‘restricted three-body problem.’’ (Poincare´ 1987) Further advances were made by Birkhoff (Birkhoff 1960) in the 1930s, and, by Smale in the 1960s. Since that time, they have been studied by many people and have been shown to be intimately related to our understanding of nonlinear dynamical systems. There are many systems which possess homoclinic orbits. In one striking example (as discussed in the book of Moser (1973), they can be used to account for the unbounded oscillatory motion discovered by Sitnikov in the three-body problem. They also commonly occur in two-dimensional mappings derived from periodically forced oscillations (e.g., see the book by Guckenheimer and Holmes (1983).
Roughly speaking, a homoclinic orbit is an orbit of a mapping or differential equation which is both forward and backward asymptotic to a periodic orbit which satisfies a certain nondegeneracy condition called ‘‘hyperbolicity.’’ On its own, such an orbit is only of mild interest. However, these orbits induce quite interesting structures among nearby orbits, and this latter fact is responsible for the main importance of homoclinic orbits. In addition, when homoclinic orbits are created in a parametrized system, many interesting and unexpected phenomena arise. In this article, we first describe the history and basic properties of homoclinic orbits. Next, we consider some simple polynomial diffeomorphisms of the plane (the so-called He´non family) which exhibit homoclinic orbits. Subsequently, we discuss a general theorem due to Katok which gives sufficient conditions for the existence of such orbits. Finally, we briefly consider issues related to homoclinic bifurcations and some of their consequences.
Homoclinic Phenomena 673
Homoclinic Orbits in Diffeomorphisms
W s(p) r
Consider a discrete dynamical system given by a C diffeomorphism f : M ! M where M is a C1 manifold and r is a positive integer. That is, f is bijective and both f and f 1 are r-times continuously differentiable. Given a point x 2 M, set x0 = x. For non-negative integers n we inductively define xnþ1 = f (xn ) and xn1 = f 1 (xn ). We also write f n (x) = xn for n in the set Z of all integers. The ‘‘orbit’’ of x is the set O(x) = {f n (x): n 2 Z}. A ‘‘periodic point’’ p of f is a point such that there is a positive integer N > 0 such that f N (p) = p. The least such number (p) is called the ‘‘period’’ of p. If (p) = 1, we call p a ‘‘fixed point.’’ The periodic point p with period is called called ‘‘hyperbolic’’ if all eigenvalues of the derivative Df (p) at p have absolute value different from 1. For convenience, we refer to the eigenvalues of Df (p) as eigenvalues associated to p. If p is a hyperbolic periodic point all of whose associated eigenvalues have norm less than one, we call p a ‘‘sink’’ or ‘‘attracting periodic point.’’ The opposite case in which all associated eigenvalues have norm larger than one is called a ‘‘source.’’ A hyperbolic periodic point p which is neither a source nor a sink is called a ‘‘saddle’’ or ‘‘hyperbolic saddle.’’ Given a saddle p of period , we consider the set W s (p) = W s (p, f ) of points y 2 M which are forward asymptotic to p under the iterates f n . That is, the points y 2 M such that f n (y) ! p as n ! 1. This is called the ‘‘stable set’’ of p. Similarly, we consider the ‘‘unstable set’’ of p which we may define as W u (p) = W u (p, f ) = W s (p, f 1 ). The stable manifold theorem guarantees that W s (p) and W u (p) are injectively immersed submanifolds of M whose dimensions add up to dim M. In these cases, they are called the stable and unstable manifolds of p, T respectively. A point q 2 W s (p) W u (p) n {p} is called a ‘‘homoclinic point’’ of p (or of the pair (f, p)). If the submanifolds W s (p) and W u (p) meet transversely at q, then q is called a ‘‘transverse homoclinic point.’’ Otherwise, q is called a ‘‘homoclinic tangency.’’ In the special case when M is a two-dimensional manifold, the stable and unstable manifolds of a saddle periodic point p are injectively immersed curves in M. A transverse homoclinic point q of p is a point of intersection off p where the curves are not tangent to each other. This is depicted in Figure 1 for the case of a saddle fixed point for the map H(x, y) = (7x2 y, x), a member of the so-called He´non family, which we will discuss later. The figure was made using the numerical package ‘‘Dynamics’’ which comes with the book by Nusse and Yorke (1998).
q
W u(p)
p
Figure 1 Stable and unstable manifolds in the map H(x , y) = (7 x 2 y , x ) for the fixed point p (3.83, 3.83).
One easily sees that every point in the orbit of a transverse homoclinic point q of a hyperbolic saddle fixed point p is again a transverse homoclinic point of p. Also, the curves W u (p) and W s (p) are invariant; that is, f (W u (p)) = W u (p) and f (W s (p)) = W s (p). This implies that the curves W u (p) and W s (p) extend, wind around, and accumulate on each other forming a complicated web. Upon seeing this complicated structure in the restricted three-body problem, Poincare´ very poetically wrote (p. 389, Poincare´ 1987) Que l’on cherche a` se repre´senter la figure forme´e par ces deux courbes et leurs intersections en nombre infini dont chacune correspond a` une solution doublement asymptotique, ces intersections forment une sorte de treillis, de tissu, de re´seau a` mailles infiniment serre´es; chacune des deux courbes ne doit jamais se recouper elle-meˆme, mais elle doit se replier sur elle-meˆme d’une manie`re tre´s complexe pour venir recouper une infinite´ de fois toutes les mailles du re´seau. On sera frappe´ de la complexite´ de cette figure, que je ne cherche meˆme pas a` tracer. Rien n’est plus propre a` nous donner une ide´e de la complication du proble`me des trois corps et en ge´ne´ral de tous les proble`mes de Dynamique ou` il n’y a pas d’inte´grale uniforme . . .
The next major advance concerning homoclinic orbits was made by Birkhoff (1960), who proved that in every neighborhood of a transverse homoclinic point of a surface diffeomorphism, one can find infinitely many distinct periodic points. Birkhoff also presented a symbolic description of the nearby orbits and noticed the analogy with Hadamard’s description of geodesics on a surface. Birkhoff’s analysis was generalized by Smale to arbitrary dimension, and, in addition, Smale gave a simpler analysis of the associated nearby orbits in terms of compact zero-dimensional
674 Homoclinic Phenomena
symbolic spaces which we now call ‘‘shift spaces’’ or ‘‘topological Markov chains.’’ Once one knows that a diffeomorphism f has a transverse homoclinic point for a saddle periodic point p, it is interesting to consider the closure of the orbits of all such homoclinic points. This turns out to be a closed invariant set containing a dense orbit and a countable dense set of periodic saddle points (Newhouse 1980). It is usually called a ‘‘homoclinic closure’’ or h-closure. These sets form the basis of chaotic or irregular motions in nonlinear systems.
Q T1 Q1
q f(Q)
R2 Q R1
The Smale Horseshoe Map and Associated Symbolic System To understand the geometric picture discovered by Smale, it is best to start with a concrete example of a diffeomorphism of the plane known as the ‘‘Smale horseshoe diffeomorphism.’’ Given any homeomorphism f : X ! X on a space X and a subset U X, let us define I(f , U) to be the set of points x 2 X such that f n (x) 2 U for every integer n. Thus, we have \ Iðf ; UÞ ¼ f n ðUÞ n2Z
We call I(f , U) the invariant set of f in U, or, alternatively, the invariant set of the pair (f , U). We now construct a special diffeomorphism f of the Euclidean plane to itself in which U = Q is the unit square and for which I(f , U) has a very interesting structure. It is this map which is usually known as the Smale horseshoe map. Let Q = [0, 1] [0, 1] be the unit square in the plane R2 . Let 0 < < 1=2, and consider a diffeomorphism f : R2 ! R2 which is a composition of two diffeomorphisms f = T2 T1 as follows. The map T1 (x, y) = (1 x, y) contracts vertically, expands horizontally, and maps Q to the thin rectangle Q1 = {(x, y) : 0 x 1 , 0 y } which is short and wide. The map T2 bends the right side of Q1 up and around so that T2 (Q1 ) = f (Q) has the shape of a ‘‘horseshoe’’ or ‘‘rotated arch.’’ We arrange for T2 to take the lower-right corner of Q1 up to the upper-left corner of Q in such a way that f (Q) meets Q in two full width subrectangles which we call R1 and R2 . This can be done in such a way that the preimages 1 1 1 1 R1 1 = T1 (R1 ) and R2 = T1 (T2 (R2 )) are both fullheight subrectangles of Q, and the restricted maps 1 1 def f1 def = f j R1 and f2 = f j R2 are both affine. Thus, we arrange that f1 is simply the restriction of T1 to R1 1 , and the map f2 can be expressed in formulas as f2 (x, y) = ( 1 x þ 1 , y þ 1). This construction implies that f will have the origin p = (0, 0) as a
p Figure 2 The horseshoe map.
hyperbolic fixed point. We label the upper-left corner (0, 1) of Q with the letter q. It follows that the bottom and left edges of Q will be in the unstable and stable manifolds of p, respectively, and we have indicated this in Figure 2 with small arrows. The above construction gives us a diffeomorphism T 2 þ def f of S the plane R such that Q1 = f (Q) Q = R1 R2 is the union of two full-width subrectangles of Q. We wishTto describe I(f , Q). We T begin with the sets Qþ = n0 f n (Q) and Q = n0 f n (Q). Thus, Qþ is simply the set of points in Q whose backward orbits stay in Q, and Q is the set of points whose forward orbits stay in Q. For i = 1, 2, each rectangle Ri is mapped to a thin horseshoe in f (Q) which meets Q in two full-width subrectangles. Combining these for i = 1, 2 gives four full-width rectangles T T as shaded in Figure 3. Thus, Q f (Q) f 2 (Q) consists of these four subrectangles. Figure 3 shows the sets f 2 (Q), f 2 (Q) as well as the shaded rectangles we just mentioned. Continuing in this way, one T T seesT that, for each n > 0, the set Qþ f (Q) . . . f n (Q) consists n =Q of 2n full-width subrectangles of Q, each with height
f –2(Q) q
f 2(Q) p
Figure 3 The sets f 2 (Q) and f 2 (Q) for the horseshoe map f.
Homoclinic Phenomena 675
T n . It follows that Qþ = n f n (Q) is an interval times a Cantor set. Analogously, Q is a Cantor set times an interval, and the set I(f , Q) is a Cantor set in the plane. Let us recall the definition of a Cantor set C in a metric space X. We first define a Cantor space C to be a compact, perfect, totally disconnected metric space. That is, C is a compact metric space, whose connected components are points such that every point x in C is a limit point of C n {x}. A Cantor set C in a metric space X is a subset which is a Cantor space in the induced subspace (relative) topology. The dynamics of f on the invariant set I(f , Q) can be conveniently described as follows. Let 2 = {1, 2}Z be the set of doubly infinite sequences of 1’s and 2’s. Writing elements a 2 2 as a = (ai ) = (ai )i 2 Z , we define a metric on 2 by ða; bÞ ¼
X 1 jai bi j 2jnj n2Z
The pair (2 , ), then, is a Cantor space. The ‘‘left-shift automorphism’’ on 2 is the map : 2 ! 2 defined by (a)i = aiþ1 for each i 2 Z. This is a homeomorphism from 2 to itself. It has a dense orbit and a dense set of periodic points. For a point x 2 I(f , Q), define an element (x) = a = (ai ) 2 2 by ai = j if and only if f i (x) 2 Rj . It turns out that the map : I(f , Q) ! 2 is a homeomorphism such that = f . In general, given two discrete dynamical systems f : X ! X, and g : Y ! Y, a homeomorphism h : X ! Y such that gh = hf is called a topological conjugacy from the pair (f , X) to the pair (g, Y). When such a conjugacy exists, the two systems have virtually the same dynamical properties. In the present case, one sees that the dynamics of f on I(f , Q) is completely described by that of on 2 . It turns out the the Smale horseshoe map contains essentially all of the geometry necessary to describe the orbit structures near homoclinic orbits. To begin to see this, recall that the left and bottom boundaries of Q were in the stable and unstable manifolds of p. Extending these curves as in Figure 4, one sees that the three corners of Q different from p are, in fact, all transverse homoclinic points of p. It was a great discovery of Smale that, in the case of a general transverse homoclinic point, one sees the above geometric structure after taking some power f N of the diffeomorphism f. Thus, we have Theorem 1 (Smale). Let f : M ! M be a C1 diffeomorphism of a manifold M with a hyperbolic periodic point p and a transverse homoclinic point q of the pair (f, p). Then, one can find a positive
Figure 4 Stable and unstable manifolds in the horseshoe map.
integer N and a compact neighborhood U of the points p and q such that the pair (f N , I(f N , U)) is topologically conjugate to the full 2-shift (, 2 ). In modern language, we can assert that more S is true. Let (f ) = 0j a0 , then the set Ba, b of bounded orbits of Ha, b is a compact zero-dimensional set and the pair (Ha, b , Ba, b ) is topologically conjugate to (, 2 ). In addition, it can be shown that the invariant set Ba, b is a single hyperbolic h-closure. Analogous results are true for the complex He´non family and proofs were originally given in the thesis of Ralph Oberste–Vorth (unpublished) under the supervision of John Hubbard at Cornell University. More recent proofs are in Newhouse (2004) and Hruska (2004). Many interesting results have been obtained for the complex He´non map by Bedford and Smillie and Sibony and Fornaess (see the references in Hruska (2004).
Homoclinic Points in Systems with Positive Topological Entropy There is an invariant of topological conjugacy which is known as the topological entropy. In a certain sense, this gives a quantitative measurement of the amount of complicated or chaotic motion in the system.
Homoclinic Phenomena 677
Let f : X ! X be a continuous self-map of the compact metric space (X, d). For a positive integer n > 0, we define an n-orbit to be a finite sequence O(x, n) = {x, f (x), . . . , f n1 (x)}. Given a positive real number > 0, we say that two n-orbits O(x, n) and O(y, n) are ‘‘-distinguishable’’ if there is a 0 j < n such that d(f j x, f j y) > . Another way to look at this is the following. Define the so-called dn -metric on X by setting dn (x, y) = max0j . It follows from compactness of X and the uniform continuity of each of the maps f j , 0 j < n, that the number r(n, , f ) of -distinguishable n-orbits is finite for each given > 0 and each positive integer n. We define the number 1 hðf Þ ¼ lim lim sup log rðn; ; f Þ !0 n!1 n This means that, for some sequence of integers n1 < n2 < . . . , the map f has roughly eni h(f ) -distinguishable ni -orbits for i large and small. The number h(f ) is called the topological entropy of the map f. It may be infinite for homeomorphisms, but it is always finite for smooth maps on finite-dimensional manifolds. The number h(f ) has many nice properties. For instance, h(f N ) = Nh(f ) for every positive integer N, and, if f is a homeomorphism, then h(f 1 ) = h(f ). Further, if f and g are topologically conjugate, then h(f ) = h(g). The socalled ‘‘variational principle for topological entropy’’ asserts that h(f ) is the supremum of the measure-theoretic entropies of the invariant probability measures for f. Our interest in this invariant here is the following theorem of Katok. Theorem 2(Katok). Let f be a C2 diffeomorphism of a compact two-dimensional manifold M to itself with positive topological entropy. Then, f has transverse homoclinic points. In fact, Katok extended this theorem (see the supplement in Hasselblatt and Katok (1995)) to show that, if h(f ) > 0 and > 0, then there is a compact zero-dimensional hyperbolic basic set for h such that h(f , ) > h(f ) . Thus, one can find nice invariant topologically transitive sets for f (i.e., sets with dense orbits) on which the topological entropies of restriction of f are arbitrarily close to that of f. This theorem has the interesting consequence that the map f ! h(f ) is lower-semicontinuous on the space of C2 diffeomorphisms of a surface. It was proved in Newhouse (1989) (and, independently by Yomdin (1987)) that the map f ! h(f ) is uppersemicontinuous on the space of C1 diffeomorphisms
of any compact manifold. Combining these results gives the theorem that the map f ! h(f ) is continuous on the space of C1 diffeomorphisms on a compact surface, and that positivity of h(f ) implies the existence of transverse homoclinic points. It is also worth noting that, for any continuous self-map f : M ! M on a compact manifold M, one has the inequality h(f ) log j j where is the eigenvalue of largest norm of the induced map f? on the first real homology group (Manning 1975). Putting this together with Theorem 2 gives the fact that there are whole homotopy classes of diffeomorphisms on surfaces all of whose elements have transverse homoclinic points. For instance, consider a 2 2 matrix a b L¼ c d with integer entries, determinant 1, and eigenvalues ~ : T 2 ! T 2 be 1 , 2 with 0 < j1 j < 1 < j2 j. Let L the induced diffeomorphism on the two-dimensional torus T 2 . This is an example of what is called an ‘‘Anosov’’ diffeomorphism. In this case the number above is simply 2 , and this holds for any diffeomorphism f of T 2 which can be continuously ~ Hence, any such f must have deformed into L. transverse homoclinic points.
Homoclinic Tangencies Let {f , 2 [0, 1]} be a parametrized family of Cr diffeomorphisms of the plane with an external parameter. It frequently occurs that there is a hyperbolic saddle fixed point p for each parameter moving continuously with such that, at some value 0 , a homoclinic tangency is created at a point q0 . This means that there are an > 0, a small neighborhood U of q0 , T and curves u W u (p ), s s s
WT(p ) such that Tu = ; for 0 < < 0 , s 0 u0 = {q0 }, and s u consists of two distinct points for 0 < < þ . In most cases, the tangency of u0 and s 0 at q0 will be of the second order, and we will assume that occurs here. The geometry is as in Figure 7.
γu
γu
γu
γs λ < λ0
γs λ = λ0
γs λ > λ0
Figure 7 Creation of a homoclinic tangency.
678 Hopf Algebra Structure of Renormalizable Quantum Field Theory
The creation of homoclinic tangencies is part of the general subject of ‘‘homoclinic bifurcations.’’ A recent survey of this subject is in the book by Bonatti et al. (2005). Typical results are the following. If p = p0 is a saddle fixed point whose derivative is area-decreasing (i.e., jDet(Df (p))j < 1), then there are infinitely many parameters near 0 for which each transverse homoclinic point of p is a limit of periodic sinks (asymptotically stable periodic orbits) (Newhouse 1979, Robinson 1983). In addition, so-called strange attractors and SRB measures appear (Mora and Viana 1993). Finally, we mention that recently it has been shown that, generically in the Cr topology for r 2, homoclinic closures associated to a homoclinic tangency (in dimension 2) have maximal Hausdorff dimension (Theorem 1.6 in Downarowicz and Newhouse (2005)). See also: Chaos and Attractors; Fractal Dimensions in Dynamics; Generic Properties of Dynamical Systems; Hyperbolic Dynamical Systems; Lyapunov Exponents and Strange Attractors; Saddle Point Problems; Singularity and Bifurcation Theory; Solitons and Other Extended Field Configurations.
Further Reading Birkhoff GD (1960) Nouvelles recherches sur les syste`mes dynamiques. In: George David Birkhoff, Collected Mathematical Papers, Vol. II, pp. 620–631. New York: American Mathematical Society. Bonatti C, Diaz L, and Viana M (2005) Dynamics Beyond Uniform Hyperbolicity, Encyclopedia of Mathematical Sciences, Subseries: Mathematical Physics III, vol. 102, Berlin–Heidelberg–New York: Springer. Devaney R and Nitecki Z (1979) Shift automorphism in the henon mapping. Communications in Mathematical Physics 67: 137–148. Downarowicz T and Newhouse S (2005) Symbolic extensions in smooth dynamical systems. Inventiones Mathematicae 160(3): 453–499.
Friedland S and Milnor J (1989) Dynamical properties of plane polynomial automorphisms. Ergodic Theory and Dynamical Systems. 9: 67–99. Guckenheimer J and Holmes P (1983) Nonlinear Oscillations, Dynamical Systems, and Bifurcation of Vector Fields, Applied Mathematical Sciences, vol. 42, New York: Springer. Hasselblatt B and Katok A (1995) Introduction to the Modern Theory of Dynamical Systems, Encyclopedia of Mathematics and Its Applications, vol. 54, Cambridge: Cambridge University Press. Hruska SL (2004) A numerical method for proving hyperbolicity in complex he´non mappings (http://arxiv.org). Manning A (1975) Topological entropy and the first homology group. In: Manning A (ed.) Dynamical Systems – Warwick 1974, Lecture Notes in Math. vol. 468, pp. 567–573. New York: Springer. Moser J (1973) Stable and Random Motions in Dynamical Systems, Annals of Mathematical Studies, Number 77. Princeton: Princeton University Press. Mora L and Viana M (1993) Abundance of strange attractors. Acta Mathematica 171: 1–71. Newhouse S (1979) The abundance of wild hyperbolic sets and non-smooth stable sets for diffeomorphisms. Publications Mathe´matiques de l’I.H.E.S. 50: 101–151. Newhouse S (1980) Lectures on dynamical systems. In: Coates J and Helgason S (eds.) Dynamical Systems, CIME Lectures, Bressanone, Italy, June 1978, Progress in Mathematics, vol. 8, pp. 1–114. Cambridge, MA: Birkha¨user. Newhouse S (1989) Continuity properties of entropy. Annals of Mathematics 129: 215–235. Newhouse S (2004) Cone-fields, domination, and hyperbolicity. In: Brin M, Hasselblatt B, and Pesin Y (eds.) Modern Dynamical Systems and Applications, pp. 419–432. Cambridge: Cambridge University Press. Nusse HE and Yorke JA (1998) Dynamics: Numerical Explorations. Applied Mathematical Sciences, vol. 101, New York: Springer. Poincare´ H (1899) Les methodes nouvelles de la Mecanique Celeste–Tome 3. Guathier-Villars, Original publication (New Printing: Librairie Scientifique et Technique Albert Blanchard 9, Rue Medecin 75006, Paris, 1987). Robinson C (1983) Homoclinic bifurcation to infinitely many sinks. Communications in Mathematical Physics 90: 433–459. Robinson C (1999) Dynamical Systems, Stability, Symbolic Dynamics, and Chaos, Studies in Advanced Mathematics, 2nd edn. New York: CRC Press. Yomdin Y (1987) Volume growth and entropy. Israel Journal of Mathematics 57: 285–300.
Hopf Algebra Structure of Renormalizable Quantum Field Theory D Kreimer, IHES, Bures-sur-Yvette, France ª 2006 Elsevier Ltd. All rights reserved.
Overview Renormalization theory is a venerable subject put to daily use in many branches of physics. Here, we focus on its applications in quantum field theory, where a standard perturbative approach is provided through an expansion in Feynman diagrams. Whilst
the combinatorics of the Bogoliubov recursion, solved by suitable forest formulas, has been known for a long time, the subject regained interest on the conceptual side with the discovery of an underlying Hopf algebra structure behind these recursions. Perturbative expansions in quantum field theory are organized in terms of one-particle irreducible (1PI) Feynman graphs. The goal is to calculate the corresponding 1PI Green functions order by order in the coupling constants of the theory, by applying Feynman rules to these 1PI graphs of a
Hopf Algebra Structure of Renormalizable Quantum Field Theory
renormalizable theory under consideration. This allows one to disentangle the problem into an algebraic part and an analytic part. For the algebraic part, one studies Feynman graphs as combinatorial objects which lead to the Lie and Hopf algebras discussed below. Feynman rules then assign analytic expressions to these graphs, with the analytic structure of finite renormalized quantum field theory largely dictated by the underlying algebra. The objects of interest in quantum field theory are the 1PI Green functions. They are parametrized by the quantum numbers – masses, momenta, spin, and such – of the particles participating in the scattering process under consideration. We call a set of such quantum numbers an external leg structure r. For example, the three terms in the Lagrangian of massless quantum electrodynamics correspond to r2f
;
;
g
½1
Note that the Lagrangian L of massless quantum electrodynamics is obtained accordingly as ^ ^ ^ L ¼ ð Þ1 þ ð Þ þ ð ¼ @ þ A þ 14 F2
Þ1 ½2
where ˆ are coordinate space Feynman rules. The renormalized 1PI Green function in momenr tum space, GR ({g}; {p}, {m}; ), is obtained as the image under renormalized Feynman rules R applied to a series of graphs: 1 X X r ½3 r ¼ 1 þ gk c k 1 þ gjj SymðÞ resðÞ¼r k¼1 r
Here r is a given such external leg structure, while ck is the finite sum of 1PI graphs having k loops, X r ½4 ck ¼ SymðÞ resðÞ¼r jj¼k
and 0 < g < 1 is a coupling constant. The generalization to the case of several couplings {g} and masses {m} is straightforward. In the above, the sum is over all 1PI graphs with the same given external leg structure. We have denoted the map which assigns r to a given graph a residue, for example, resð
Þ¼
½5
The unrenormalized but regularized Feynman rules assign to a graph a function ðÞðfgg; fpg; fmg; ; zÞ 0 1 Z Y X Y d4 ke ¼ ð4Þ @ kf A Propðke Þ 2 4 ½0 ½1 f incident v v2
e2int
½6
679
and formally the unrenormalized Green function Gru ðfgg; fpg; fmg; ; zÞ ¼ ðr Þðfgg; fpg; fmg; ; zÞ
½7
which is a function of a suitably chosen regulator z. Note that in [6] the four-dimensional Dirac- distribution guarantees momentum conservation at each vertex and restricts the number of fourdimensional integrations to the number of independent cycles in the graph. It is assumed that the reader is familiar with the readily established fact that these integrals suffer from UV singularities, which render the integration over the momenta in internal cycles ill-defined. We also remind the reader that the problem persists in coordinate space, where one confronts the continuation of products of distributions to regions of coinciding support. We restrict ourselves here to a discussion of the situation in momentum space and refer the reader to the literature for the situation in coordinate space. Ignoring problems of convergence in the sum over all graphs, the problem of renormalization is to make sense of these functions term by term: We have to determine invertible series Zr ({g}, z) in the couplings g such that the modified Lagrangian X ~¼ ^ Zr ðfgg; zÞ ðrÞ ½8 L r
produces a perturbation series in graphs that allows for the removal of the regulator z. This amounts to a transition from unrenormalized to renormalized Feynman rules ! R . Let us first describe how this transition is achieved using the Lie and Hopf algebra structure of the perturbative expansion, which is described in detail below:
Decide on the free fields and local interactions of the theory, appropriately specifying quantum numbers (spin, mass, flavor, color, and such) of fields, restricting interactions so as to obtain a renormalizable theory. Consider the set of all 1PI graphs with edges corresponding to free-field propagators. Define vertices for local interactions. This allows one to construct a pre-Lie algebra of graph insertions. Antisymmetrize this pre-Lie product to get a Lie algebra L of graph insertions and define the Hopf algebra H which is dual to the enveloping algebra U(L) of this Lie algebra. Realize that the coproduct and antipode of this Hopf algebra give rise to the forest formula, which generates local counter-terms upon introducing a Rota–Baxter map, a renormalization scheme in physicists’ parlance.
680 Hopf Algebra Structure of Renormalizable Quantum Field Theory
Use the Hochschild cohomology of this Hopf algebra to show that one can absorb singularities in local counter-terms. Determine the corepresentations of this Hopf algebra to identify the sub-Hopf algebras corresponding to time-ordered products in physical fields. This is most easily achieved by rewriting the Dyson–Schwinger equations using Hochschild 1-cocycles. The last point exhibits close connections, in particular, between the structure of gauge theories and the corepresentation theory of their perturbative Hopf algebras which we discuss below in brief. This program can be carried out in coordinate space as well as momentum space renormalization. It has given a firm mathematical background to the process of renormalization, justifying the practice of quantum field theory. The notion of locality has achieved a precise formulation in terms of the Hochschild cohomology of the perturbation expansion. In momentum space, this approach emphasizes the connections to number theory, which emerge when one investigates the role of the Hopf algebra primitives, which in turn furnish the Hochschild 1-cocycles underlying locality. The next sections describe the above setup in some detail.
Lie and Hopf Algebras of Graphs All algebras are supposed to be over some field K of characteristic zero, associative and unital, and similarly for coalgebras. The unit (and, by abuse of notation, also the unit map) will be denoted by I, the counit map by e. All algebra homomorphisms are L supposed to be unital. A bialgebra (A = 1 e) is called i = 0 Ai , m, I, , L graded connected if Ai Aj Aiþj and (Ai ) jþk = i Aj Ak , and if (I) L1 = I I and A0 = kI, e(I) = 1 2 K and e = 0 on e the augmentation ideal of A i = 1 Ai . We call ker and denote by P the projection A ! ker e onto the augmentation ideal, P = id Ie. P Furthermore, we use Sweedler’s notation, (h) = h0 h00 , for the coproduct. We define 0 1 AugðkÞ ¼@|fflfflfflfflfflfflffl P ffl {zfflfflfflfflfflfflffl fflP}Ak1 ; k times k
½9
A ! fker eg
as a map into the k-fold tensor product of the augmentation ideal. We let A(k) = ker Aug(kþ1)= ker Aug(k) , 8 k 1. All bialgebras considered here are bigraded in the sense that
A¼
1 M i¼0
AðiÞ ¼
1 M
AðkÞ
½10
k¼0
where A(k) kj= 1 A(j) for all k 1. A(0) ’ A(0) ’ K. The first construction we have to study is the preLie algebra structure of 1PI graphs. The Pre-Lie Structure
For each Feynman graph we have vertices as well as internal and external edges. External edges are edges that have an open end not connected to a vertex. They indicate the particles participating in the scattering amplitude under consideration and each such edge carries the quantum numbers of the corresponding free field. The internal edges and vertices form a graph in their own right. For an internal edge, both ends of the edge are connected to a vertex. We consider 1PI Feynman graphs. A graph is 1PI if and only if all graphs, obtained by removal of any one of its internal edges, are still connected. Such 1PI graphs are naturally graded by their number of independent loops, the rank of their first homology group H[1] (, Z). We write jj for this degree of a graph . Note that jres()j = 0, where we let res() be the graph obtained when all edges in [1] int shrink to a point, as before. Note that the graph obtained in this manner consists of a single vertex, to which the edges [1] ext are attached. For a 1PI graph , [0] denotes its set of [1] vertices and [1] = [1] int [ ext its set of internal and external edges. In addition, let !r be the number of spacetime derivatives appearing in the corresponding monomial in the Lagrangian. Having specified free quantum fields and local interaction terms between them, one immediately obtains the set of 1PI graphs. One can then consider for a given external leg structure r the set of graphs with that external leg structure. For a renormalizable theory, we can define a superficial degree of divergence, X !¼ !r 4jH½1 ð; ZÞj ½11 ½1
r2int [½0
for each such external leg structure: !() = !(0 ) if res() = res(0 ); all graphs with the same external leg structure have the same superficial degree of divergence, and only for a finite number of distinct external leg structures r will this degree indeed signify a divergence. This leaves a finite number of external leg structures to be considered to which we restrict ourselves from now. Our first observation is that there is a natural pre-Lie algebra structure on 1PI graphs.
Hopf Algebra Structure of Renormalizable Quantum Field Theory
To this end, we define a bilinear operation X 1 2 ¼ nð1 ; 2 ; Þ
by dividing out the ideal generated by the relations
where the sum is over all 1PI graphs . Here, n(1 , 2 ; ) is a section coefficient which counts the number of ways in which a subgraph 2 can be reduced to a point in such that 1 is obtained. The above sum is evidently finite as long as 1 and 2 are finite graphs, and the graphs which contribute necessarily fulfill jj = j1 j þ j2 j and res() = res(1 ). One then has the following theorem. The operation is pre-Lie:
½1 2 3 1 ½2 3 ¼ ½1 3 2 1 ½3 2
½13
which is evident when one rewrites the -product in suitable gluing operations. To understand this theorem, note that the equation claims that the lack of associativity in the bilinear operation is invariant under permutation of the elements indexed 2, 3. This suffices to show that the antisymmetrization of this map fulfills a Jacobi identity. Hence, we get a Lie algebra L by antisymmetrizing this operation: ½1 ; 2 ¼ 1 2 2 1
a b b a ¼ ½a; b 2 L
½12
Theorem 1
Note that in U(L) we have a natural concatenation product m . Furthermore, U(L) carries a natural Hopf algebra structure with this product. For that, the Lie algebra L furnishes the primitive elements: ðaÞ ¼ a 1 þ 1 a;
8a2L
½23
It is, by construction, a connected finitely graded Hopf algebra which is co-commutative but not commutative. We can then consider its graded dual, which will be a Hopf algebra H(m, I, , e) that is commutative but not cocommutative. One finds it upon using a Kronecker pairing 1; ¼ 0 0 < Z ; > ¼ ½24 0; else The space of primitives of U(L) is in one-to-one correspondence with the set Indec(H) of indecomposables of H, which is the linear span of its generators. One finds the following theorem. Theorem 2 ¼ ½25
For example, one finds *
+
Z
Z
¼2
½16
¼
½17
Z
Z
;
¼ Z
Z
Z
Z
¼2
½18
¼ Z
¼
½19
¼2
¼
½20
Together with L one is led to consider the dual of its universal enveloping algebra U(L) using the theorem of Milnor and Moore. For this we use the above grading by the loop number. This universal enveloping algebra U(L) is built from the tensor algebra M T¼ Tk; Tk ¼ L L ½21 |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
;
!+ *
k times
½22
½14
This Lie algebra is graded and of finite dimension in each degree. Let us look at a couple of examples for pre-Lie products: ¼ ½15
k
681
+ Z
; ½26
H is a graded commutative Hopf algebra which suffices to describe renormalization theory, as we see in the next section. We have formulated it for the superficially divergent 1PI graphs of the theory with the understanding that the residues of these graphs are in one-to-one correspondence with the terms in the Lagrangian of a given theory. Often, several terms in a Lagrangian correspond to graphs with the same number and type of external legs, but correspond to different form-factor projections of the graph. In such cases, the above approach can be easily adopted considering suitably colored or
682 Hopf Algebra Structure of Renormalizable Quantum Field Theory
labeled graphs. A similar remark applies if one desires to incorporate renormalization of superficially convergent Green functions, which requires nothing more than the consideration of an easily obtained semidirect product of the Lie algebra of superficially divergent graphs with the abelian Lie algebra of superficially convergent graphs. The Principle of Multiplicative Subtraction
The above algebra structures are available once one has decided on the set of 1PI graphs of interest. We now use them toward the renormalization of any such chosen local quantum field theory. From the above, 1PI graphs provide the linear generators of the Hopf algebra H = 1 i = 0 Hi , where Hlin = span( ) and their disjoint union provides the commutative product. Now let be a 1PI graph. We find the Hopf algebra H as described above to have a coproduct explicitly given as : H ! H H: X ðÞ ¼ 1 þ 1 þ = ½27
where the sum is over all unions of 1PI superficially divergent proper subgraphs, and we extend this definition to products of graphs so that we get a bialgebra. While the Lie bracket inserted graphs into each other, the coproduct disentangles them. It is this latter operation which is needed in renormalization theory: we have to render each subgraph finite before we can construct a local counter-term. That is precisely what the Hopf algebra structure maps do. Having a coproduct, two further structure maps of H are immediate: the counit and the antipode. The counit e vanishes on any nontrivial Hopf algebra element, e(1) = 1, e(X) = 0. The antipode is X SðÞ ¼ SðÞ= ½28
We can work out a few coproducts and antipodes as follows: Augð2Þ ð Augð2Þ ð
Þ¼2
½29
Þ¼2
½30
Augð2Þ ð
Þ¼
Augð2Þ ð
Þ¼2
½31 ½32
Augð2Þ ð
Þ¼2
Augð2Þ ð
Þ¼
½33 ½34
We give just one example for an antipode: Sð
Þ¼
þ2
½35
P ˜ Note that for each term in the sum () = i 0(i) 00(i) , we have unique gluing data Gi such that ¼ 00ðiÞ
Gi
0ðiÞ ;
8i
½36
These gluing data describe the necessary bijections to glue the components 0(i) back into 00(i) so as to obtain : using them, we can reassemble the whole from its parts. Each possible gluing can be interpreted as a composition in the insertion operad of Feynman graphs. We have by now obtained a Hopf algebra generated by combinatorial elements, 1PI Feynman graphs. Its existence is automatic from the above choices of interactions and free fields. What remains to be done is a structural analysis of these algebras for the renormalizable theories we are confronted with in four spacetime dimensions. The assertion underlying perturbation theory is the fact that meaningful approximations to physical observable quantities can be found by evaluating these graphs using Feynman rules. First, as disjoint scattering processes give rise to independent amplitudes, one is led to the study of characters of the Hopf algebra, maps : H ! V such that m = mV ( ). Such maps assign to any element in the Hopf algebra an element in a suitable target space V. The study of tree-level amplitudes in lowest-order perturbation theory justifies assigning to each edge a propagator and to each elementary scattering process a vertex, which define the Feynman rules (res()) and the underlying Lagrangian, on the level of residues of these very graphs. Graphs are constructed from edges and vertices which are provided precisely by the residues of those divergent graphs, hence one is led to assign to each Feynman graph an evaluation in terms of an integral over the continuous quantum numbers assigned to edges or vertices, which leads to the familiar integrals over momenta in closed loops mentioned before. Then, with the Feynman rules providing a canonical character , we will have to make one further choice: a renormalization scheme. The need for such a choice is no surprise: after all we are eliminating short-distance singularities in the graphs,
Hopf Algebra Structure of Renormalizable Quantum Field Theory
which renders their remaining finite part ambiguous, albeit in a most interesting manner. Hence, we choose a map R : V ! V, from which we obviously demand that it does not modify the UV-singular structure, and furthermore that it obeys RðxyÞ þ RðxÞRðyÞ ¼ RðRðxÞyÞ þ RðxRðyÞÞ
½37
which guarantees the multiplicativity of renormalization and is at the heart of the Birkhoff decomposition, which emerges below: it tells us that elements in V split into two parallel subalgebras given by the image and kernel of R. Algebras for which such a map exists are known as Rota–Baxter algebras. The role Rota–Baxter algebras play for associative algebras is similar to the role Yang– Baxter algebras play for Lie algebras. The structure of these algebras allows one to connect renormalization theory to integrable systems. In addition, most of the results obtained initially for a specific renormalization scheme, such as minimal subtraction, can also be obtained, in general, upon a structural analysis of the corresponding Rota–Baxter algebras. To see how all the above comes together in renormalization theory, we define a further character SR that deforms S slightly and delivers the counter-term for in the renormalization scheme R: SR ðÞ ¼ RmV ðSR PÞ " # X SR ðÞð=Þ ¼ R½ðÞ R
so that þ SR ðÞ SR ? ðÞ ¼ RðÞ
½43
Here, SR ? is an element in the group of characters of the Hopf algebra, with the group law given by the convolution 1 ? 2 ¼ mV ð1 2 Þ
½44
so that the coproduct, counit, and coinverse (the antipode) give the product, unit, and inverse of this group, as befits a Hopf algebra. This Lie group has the previous Lie algebra L of graph insertions as its Lie algebra: L exponentiates to G. What we have achieved above is a local renormalization of quantum field theory. Let Mr be a monomial in the Lagrangian L of degree !r : Mr ¼ Dr fg
½45
Then one can prove, using the Hochschild cohomology of H: Theorem 3
(Locality) Zr Dr fg ¼ Dr Zr fg
½46
that is, renormalization commutes with infinitesimal spacetime variations of the fields. ½38
which should be compared with the undeformed S ¼ mV ðS X PÞ SðÞð=Þ ¼ ðÞ
683
We can now work out the renormalization of a Feynman graph : ð
Þ¼
I þ I þ2
½39
½47
The fact that R is a Rota–Baxter map ensures that SR is an element of the character group G of the Hopf algebra, SR 2 Spec(G). Note that we have now determined the modified Lagrangian: Zr ¼ SR ðr Þ
½40
The classical results of renormalization theory follow immediately using this group structure: we obtain the renormalization of by the application of a renormalized character SR
? ðÞ ¼
mV ðSR
Þ
½41
operation as and Bogoliubov’s R RðÞ ¼ mV ðSR Þðid PÞðÞ X ¼ ðÞ þ SR ðÞð=Þ
½42
ð
Þ ¼ ð ¼ ð
Þ þ 2SR ð
Þð
h Þ 2R ð
SR ð
Þ ¼ R ð
Þ
R ð
Þ SR ?
i Þ ð
¼ ½id R ð
Þ
½48 Þ
½49
½50
Þ
½51
The formulas [47]–[51] are given in their recursive form. Zimmermann’s original forest formula solving this recursion is obtained when we trace our considerations back to the fact that the coproduct can be written in nonrecursive form as a sum over forests, and similarly for the antipode.
684 Hopf Algebra Structure of Renormalizable Quantum Field Theory
Diffeomorphisms of Physical Parameters In the above, we have effectively obtained a Birkhoff decomposition of the Feynman rules 2 Spec(G) into two characters – R þ = SR ? 2 Spec(G) and R = SR 2 Spec(G) – for any Rota–Baxter map R. Thanks to Atkinson’s theorem, this is possible for any renormalization scheme R. For the minimal subtraction scheme, it amounts to the decomposition of the Laurent series ()(), which has poles of finite order in the regulator , into a part holomorphic at the origin and a part holomorphic at complex infinity. This has a particularly nice geometric interpretation upon considering the Birkhoff decomposition of a loop around the origin, providing the clutching data for the two half-spheres defined by that very loop. Whilst in this manner a satisfying understanding of perturbative renormalization is obtained, the character group G remains rather poorly understood. On the other hand, renormalization can be captured by the study of diffeomorphisms of physical parameters as, by definition, the range of allowed modification in renormalization theory is determined by the variation of the coefficients of ˆ of the underlying Lagrangian monomials (r) X ^ L¼ Zr ðrÞ ½52 r
Thus, one desires to obtain the whole Birkhoff decomposition at the level of diffeomorphisms of the coupling constants. The crucial step toward that goal is to realize the role of a standard quantum field-theoretic formula of the form gnew ¼ gold Zg
½53
where Zg ¼ Q
Zv ½1 e2resðvÞext
pffiffiffiffiffiffi Ze
½54
for some vertex v, which obtains the new coupling in terms of a diffeomorphism of the old. This formula provides, indeed, a Hopf algebra homomorphism from the Hopf algebra of diffeomorphisms to the Hopf algebra of Feynman graphs, regarding Zg (a series over counter-terms for all 1PI graphs with the external leg structure corresponding to the coupling g), in two different ways: it is, at the same time, a formal diffeomorphism in the coupling constant gold and a formal series in Feynman graphs. As a consequence, there are two competing coproducts acting on Zg . That both give the same result defines the required homomorphism,
which transposes to a homomorphism from the largely unknown group of characters of H to the one-dimensional diffeomorphisms of this coupling. In summary, one finds that a couple of basic facts enable one to make a transition from the abstract group of characters of a Hopf algebra of Feynman graphs (which, incidentally, equals the Lie group assigned to the Lie algebra with universal enveloping algebra the dual of this Hopf algebra) to the rather concrete group of diffeomorphisms of physical observables. These steps are given as follows:
Recognize that Z factors are given as counterterms over a formal series of graphs starting with 1, graded by powers of the coupling, hence invertible. Recognize the series Zg as a formal diffeomorphism, with Hopf algebra coefficients. Establish that the two competing Hopf algebra structures of diffeomorphisms and graphs are consistent in the sense of a Hopf algebra homomorphism. Show that this homomorphism transposes to a Lie algebra and hence Lie group homomorphism. The effective coupling geff (") now allows for a Birkhoff decomposition in the space of formal diffeomorphisms. Theorem 4 Let the unrenormalized effective coupling constant geff (") viewed as a formal power series in g be considered as a loop of formal diffeomorphisms and let geff (") = (geff )1 (") geffþ (") be its Birkhoff decomposition in the group of formal diffeomorphisms. Then the loop geff (") is the bare coupling constant and geffþ (0) is the renormalized effective coupling. The above results hold as they stand for any massless theory which provides a single coupling constant. If there are multiple interaction terms in the Lagrangian, one finds similar results relating the group of characters of the corresponding Hopf algebra to the group of formal diffeomorphisms in the multidimensional space of coupling constants.
The Role of Hochschild Cohomology The Hochschild cohomology of the combinatorial Hopf algebras which we discuss here plays three major roles in quantum field theory: 1. it allows one to prove locality from the accompanying filtration by the augmentation degree coming from the kernels ker Aug(k) ;
Hopf Algebra Structure of Renormalizable Quantum Field Theory
2. it allows one to write the quantum equations of motion in terms of the Hopf algebra primitives, elements in Hlin \ { ker Aug(2)=ker Aug(1) }; and 3. it identifies the relevant sub-Hopf algebras formed by time-ordered products. Before we discuss these properties, let us first introduce the relevant Hochschild cohomology.
Bþ ðt1 . . . tn Þ ¼
t1
tn
685
½58
is a Hochschild 1-cocycle, which makes Hrt a Hopf algebra. The resulting coproduct can be described as follows: X ðtÞ ¼ I t þ t I þ Pc ðtÞ Rc ðtÞ ½59 adm c
Hochschild Cohomology of Bialgebras
Let (A, m, I, , ) be a bialgebra, as before. We regard linear maps L : A ! An as n-cochains and define a coboundary map b, b2 = 0 by bL :¼ ðid LÞ þ
n X ð1Þi i L i¼1
þ ð1Þnþ1 L I
½55
where i denotes the coproduct applied to the ith factor in An , which defines the Hochschild cohomology of A. For the case n = 1, for L : A ! A, [55] reduces to bL ¼ ðid LÞ L þ L I
½56
The category of objects (A, C), which consists of a commutative bialgebra A and a Hochschild 1-cocycle C on A, has an initial object (Hrt , Bþ ), where Hrt is the Hopf algebra of (nonplanar) rooted trees, and the closed but nonexact 1-cocycle Bþ grafts a product of rooted trees together at a new root as described below. The higher (n > 1) Hochschild cohomology of Hrt vanishes, but in what follows, the closedness of Bþ will turn out to be crucial. The Hopf Algebra of Rooted Trees
A rooted tree is a simply connected contractible compact graph with a distinguished vertex, the root. A forest is a disjoint union of rooted trees. Isomorphisms of rooted trees or forests are isomorphisms of graphs preserving the distinguished vertex/vertices. Let t be a rooted tree with root o. The choice of o determines an orientation of the edges of t, away from the root, say. Forests are graded by the number of vertices they contain. Let Hrt be the free commutative algebra generated by rooted trees. The commutative product in Hrt corresponds to the disjoint union of trees, such that monomials in Hrt are scalar multiples of forests. We demand that the linear operator Bþ on Hrt , defined by Bþ ðIÞ ¼
½57
where the sum goes over all admissible cuts of the tree t. Such a cut of t is a nonempty set of edges of t that are to be removed. The forest which is disconnected from the root upon removal of those edges is denoted by Pc (t) and the part which remains connected to the root is denoted by Rc (t). A cut c(t) is admissible if, for each vertex l of t, it contains at most one edge on the path from l to the root. This Hopf algebra of nonplanar rooted trees is the universal object after which all such commutative Hopf algebras H providing pairs (H, B), for B a Hochschild 1-cocycle, are formed. Theorem 5 The pair (Hrt , Bþ ), unique up to isomorphism, is universal among all such pairs. In other words, for any pair (H, B) where H is a commutative Hopf algebra and B a closed nonexact 1-cocycle, there exists a unique Hopf algebra morphism Hrt ! H such that B = Bþ . This theorem suggests that we investigate the Hochschild cohomology of the Hopf algebras of 1PI Feynman graphs. It clarifies the structure of 1PI Green functions. The Roles of Hochschild Cohomology
The Hochschild cohomology of the Hopf algebras of 1PI graphs sheds light on the structure of 1PI Green function in at least four different ways:
it gives a coherent proof of locality of counterterms – the very fact that ½Zr ; Dr ¼ 0
½60
means that the coefficients in the Lagrangian remain independent of momenta, and hence the Lagrangian remains a polynomial expression in fields and their derivatives; the quantum equation of motions takes a very succinct form, identifying the Dyson kernels with the primitives of the Hopf algebra; sub-Hopf algebras emerge from the study of the Hochschild cohomology, which connects the representation theory of these Hopf algebras to the structure of theories with internal symmetries; and these Hopf algebras are intimately connected to the structure of transcendental functions, such as
686 Hopf Algebra Structure of Renormalizable Quantum Field Theory
the generalized polylogarithms, which play a prominent role these days ranging from applied particle physics to recent developments in mathematics. To determine the Hochschild 1-cocycles of some Feynman graph Hopf algebra H, one determines first the primitives graphs of the Hopf algebra, which, by definition, fulfill the condition ðÞ ¼ I þ I
½61
Using the pre-Lie product above, one then determines the maps Bþ : H ! Hlin
½62
such that Bþ ðhÞ ¼ Bþ ðhÞ I þ ðid Bþ ÞðhÞ ½63 P where Bþ (h) = n(, h, ). The coefficients n(, h, ) are closely related to the section coefficients noted earlier. this Using the definition of the Bogoliubov map , immediately shows that Z SR ðBþ ðhÞÞ ¼ D ½64 Gi R ðhÞ which proves locality of counter-terms upon recognizing that Bþ increases the augmentation degree. Here, the insertion of the functions for the subgraph is achieved using the relevant gluing data of [36]. To recover the quantum equation of motions from the Hochschild cohomology, one proves that r ¼ 1 þ
X
g B ðX Þ SymðÞ þ
Feynman graphs any longer, but allow one to establish renormalization directly for the sum of all graphs at a given loop order. Hence, they establish a Hopf algebra structure on time-ordered products in momentum space. For theories with internal symmetries, one expects and indeed finds that the existence of these subalgebras establishes relations between graphs that are same as the Slavnov–Taylor identities between the couplings in the Lagrangian.
Outlook Thanks to the Hopf and Lie algebra structures described above, quantum field theory has started to reveal its internal mathematical structure in recent years, which connects it to a motivic theory and arithmetic geometry. Conceptually, quantum field theory has been the most sophisticated means by which a physicist can describe the character of the physical law. We have slowly begun to understanding that, in its short–distance singularities, it encapsulates concepts of matching beauty. We can indeed expect local point-particle quantum field theory to remain a major topic of mathematical physics investigations in the foreseeable future. See also: Bicrossproduct Hopf Algebras and Noncommutative Spacetime; Exact Renormalization Group; Hopf Algebras and q-Deformation Quantum Groups; Number Theory in Physics; Operads; Perturbation Theory and Its Techniques; Renormalization: General Theory; von Neumann Algebras: Introduction, Modular Theory, and Classification Theory.
½65
Further Reading where Y Y v X ¼ e ½1int ½0 e2
½66
v2
has the required solution. Upon application of the Feynman rules, the maps Bþ turn into the integral kernels of the usual Dyson–Schwinger equations. This allows for new nonperturbative approaches which are a current theme of investigation. Finally, we note that the 1-cocycles introduced above allow one to determine sub-Hopf algebras of the form X s r ðcrn Þ ¼ Pðfcj gÞ cj ½67 s
where the cj are defined in eqn [3]. These algebras do not necessitate the considerations of single
Bloch S, Esnault H, and Kreimer D (2005) On Motives associated to graph polynomials. Connes A and Kreimer D (1998) Hopf algebras, renormalization and noncommutative geometry. Communications in Mathematical Physics 199: 203 (arXiv:hep-th/9808042). Connes A and Kreimer D (2001) Renormalization in quantum field theory and the Riemann–Hilbert problem. II: The beta-function, diffeomorphisms and the renormalization group. Communications in Mathematical Physics 216: 215 (arXiv:hep-th/0003188). Connes A and Marcolli M (2004) From physics to number theory via noncommutative geometry. II: Renormalization, the Riemann–Hilbert correspondence, and motivic Galois theory, arXiv:hep-th/0411114. Kreimer D (2003) New mathematical structures in renormalizable quantum field theories. Annals of Physics 303: 179 (arXiv:hep-th/0211136). Kreimer D (2004) The residues of quantum field theory: numbers we should know, arXiv:hep-th/0404090. Kreimer D (2005) Anatomy of a gauge theory.
Hopf Algebras and q-Deformation Quantum Groups 687
Hopf Algebras and q-Deformation Quantum Groups S Majid, Queen Mary, University of London, London, UK ª 2006 Elsevier Ltd. All rights reserved.
Introduction Quantum groups are a remarkable generalization of conventional groups using an algebraic language by now quite well known to mathematical physicists. This language is first and foremost the concept of a ‘‘Hopf algebra.’’ In fact, the axioms of a Hopf algebra are so attractive from a mathematical point of view that they were proposed in the 1940s long before the advent of truly representative examples, which did not come until the 1980s (from mathematical physics). Until then, they were used mainly by mathematicians as a way for redoing group theory and Lie algebra theory in a more uniform way. It is remarkable that at least three points of view lead to the same axioms of a Hopf algebra: 1. Generalized symmetry A generalization of a usual group algebra or enveloping algebra of a Lie algebra that can nevertheless act on other algebraic objects. The structure that controls this is the ‘‘coproduct’’ : H ! H H, while the group or Lie structure is encoded in the algebra H which is typically not changed up to isomorphism. allows H to act on tensor products and this is needed to define what it means, for example, for a product A A ! A of an algebra to be an intertwiner. The usual flip map between two representations V W ! W V is not typically an intertwiner any more, instead that is provided by an R-matrix solving the Yang– Baxter equations (YBE). 2. Noncommutative geometry A generalization of the coordinate algebra of functions on a conventional group to allow noncommutative or ‘‘quantum’’ coordinate algebras. Here the group structure is encoded in a coproduct : H ! H H in a way which would, in the case of functions on a group, be defined by the group product. It is typically not changed, the change being in the algebra. 3. Duality An object that admits observer– observed duality or Fourier transform. Such a duality is known for abelian groups, lost for nonabelian groups but re-emerges for Hopf algebras. If there is to be an algebra with product H H ! H, then there should also be a
‘‘coproduct’’ : H ! H H to maintain the duality symmetry. Then a suitable dual space H is also a Hopf algebra, with the roles of product and coproduct interchanged. In line with these main ideas are three known classes of true quantum groups, and these remain the main types of example at the time of writing: the q-deformed enveloping algebras Uq (g) of Drinfeld and Jimbo, their duals as quantizations of the Drinfeld–Sklyanin Poisson bracket on a simple Lie group (both of these arising from quantum inverse scattering but also in the case of Cq [SU2 ] from C -algebras) and the bicrossproduct quantum groups based on Lie group factorizations (arising from ideas for Planck-scale physics and quantum gravity). The latter are self-dual and hence are both generalized symmetries and noncommutative or quantum geometries at the same time. The impact of such quantum groups has been very far reaching from a mathematician’s point of view, spanning revolutions in the theory of knot and 3-manifold invariants, Poisson geometry, new directions in noncommutative geometry, to name some. In physics they are, at the time of writing, beginning seriously to be applied in a variety of contexts beyond the original ones, such as in bookkeeping overlapping divergences in general quantum field theories, quantum computing, and construction of anyons. This article will mention some of these, but just as groups have many different roles in physics, one can expect that quantum groups and variants of them can and will have diverse roles as well. What follows is a short overview.
Hopf Algebras and First Examples The general theory works over any field k but (to be concrete) we write our examples over C; one can also have examples over, say, the field Z2 of two elements. A Hopf algebra then is: 1. An algebra H with unit which is also a ‘‘coalgebra’’ with counit, that is, there are maps : H ! H H, : H ! k obeying: ð idÞ ¼ ðid Þ ð idÞ ¼ ðid Þ ¼ id 2. , should be algebra homomorphisms. 3. There should be a map S : H ! H called the antipode or ‘‘linearized inverse’’ obeying ðid SÞ ¼ ðS idÞ ¼ 1
688 Hopf Algebras and q-Deformation Quantum Groups
If the third axiom is not obeyed one has a ‘‘quantum semigroup’’ or ‘‘bialgebra.’’ Note also that S looks nothing like a usual inverse and it is not, yet it plays the same role. For example, we can define conjugation or the ‘‘adjoint action’’ of any Hopf algebra on itself by X X Ada ðbÞ ¼ að1Þ bSað2Þ ; a ¼ að1Þ að2Þ where we use here the ‘‘Sweedler notation’’ for a a sum of unspecified pieces in H H. Moreover, if it exists, then S is unique and (it can be shown) S(ab) = (Sb)S(a) for all a, b 2 H, just like an inverse. The self-duality of these axioms is evident from the first one: a coalgebra is just an algebra with its product map H H ! H, unit element (viewed as a map k ! H sending 1 to 1) and the associativity and unity axioms all written backwards. Meanwhile, the middle axiom means in explicit terms (ab) = (a)(b), (ab) = (a)(b) for all a, b 2 H and (1) = 1 1, (1) = 1. This may not look selfdual but it is equivalent to saying that the product and unit are coalgebra homomorphisms. Indeed, if one takes the trouble to write out all the axioms as commutative diagrams, the set of axioms is invariant under arrow reversal. Such arrow reversal can also be concretely implemented, for example, by taking adjoints. Thus, the coproduct dualizes to a map (H H) ! H and since H H (H H) we have a product on the dual H . If the dual space is defined correctly, one also has a coproduct by dualizing the product, etc. One says that two Hopf algebras H, H 0 are ‘‘in duality’’ if their maps are adjoint to each other in such a way. The role of quantum groups as generalized symmetries is typified by the following examples. Thus, let G be a group; then its group algebra CG defined as a vector space (written here over C) with basis identified with G and product given by the group product extended linearly, is a Hopf algebra with g ¼ g g; g ¼ 1; Sg ¼ g1 ;
8g 2 G
Likewise, if g is a Lie algebra, then its universal enveloping algebra U(g) generated by g is a Hopf algebra with ¼ 1 þ 1 ;
¼ 0;
S ¼ ;
8 2 g
The two examples are related if one informally allows exponentials, then g = e has coproduct e ¼ e ¼ e1þ1 ¼ e e using axiom 2 and that 1, 1 commute in the tensor product algebra. The coproduct structures are therefore implicit already in Lie theory and group theory. As for any
Hopf algebra , specifies how the algebra H acts in a tensor product of two representations. For groups the tensor product is diagonal (g acts on each copy), for Lie algebras it is additive (e.g., the addition of angular momenta). In general, the action of a 2 H is defined as the action of a on the tensor product. This has far-reaching consequences. For example, for the product A A ! A of an algebra to be covariant means that H acting before and after the product map gives the same answer, similarly for the unit map where k has the trivial representation afforded by , that is, X h . ðabÞ ¼ ðhð1Þ . aÞðhð2Þ . bÞ; h . 1 ¼ ðhÞ1 for all a, b 2 A and h 2 H. What that means in the case of a group is therefore g . (ab) = (g . a)(g . b) or G acts by automorphisms. What it means for a Lie algebra is . (ab) = ( . a)b þ a( . b), that is, g acts by derivations. This is how Hopf algebra theory unifies group theory and Lie algebra theory and potentially takes us beyond. In another, dual, point of view, if G is a group defined by polynomial equations in Cn , then the Hilbert’s ‘‘nullstellensatz’’ in algebraic geometry says that it corresponds algebraically to a commutative nilpotent-free algebra with n generators, called its ‘‘coordinate algebra’’ H = C[G]. The group product then corresponds to making C[G] into a Hopf algebra. If one replaces C by any field, one has an algebraic group over the field. For example, the group SL2 (C) C4 has coordinate algebra generated by four functions a, b, c, d where a at matrix g 2 SL2 (C) has value g11 the 1,1 entry of the matrix, similarly b(g) = g12 etc. Then C[SL2 ] is the commutative algebra generated by a, b, c, d with the relation ad bc = 1. A little thought about matrix multiplication should convince the reader that a b a b a b ¼ c b c d c d where we have written the operation on each generator as an array and where matrix multiplication is understood (so a = a a þ b c, etc.). The counit and antipode are 1 0 a b ¼ 0 1 c d a b d b S ¼ c d c a One could also let G be a finite group, in which case the algebra C(G) of (say complex-valued) functions on it is more obviously a Hopf algebra with ðaÞðg; hÞ ¼ aðghÞ; ðaÞ ¼ að1Þ; ðSaÞðgÞ ¼ aðg1 Þ
Hopf Algebras and q-Deformation Quantum Groups 689
for any function a 2 C(G). Here we identify C(G) C(G) = C(G G) or functions in two variables on the group. These examples are dually paired with U(g) in the Lie case and CG in the finite case, respectively. In such a coordinate algebra point of view, usual constructions in group theory appear expressed backwards with arrows reversed. So an action of the group appears for such a Hopf algebra H as a ‘‘coaction’’ R : V ! V H (here a right coaction, one can similarly have L a left coaction). It obeys ðR idÞR ¼ ðid ÞR ;
ðid ÞR ¼ id
which are the axioms of an algebra acting written backwards for the coalgebra of H ‘‘coacting.’’ An example is the right action of a group on itself which in the coordinate ring point of view is R = , that is, the coproduct viewed as a right coaction. It is the algebra of H that determines the tensor product of two coactions, so, for example, A is a coaction algebra in this sense if R : A ! A H is a coalgebra and an algebra homomorphism. Similarly, in this coordinate point R of view, an integral on the group means a map : H ! k and right invariance translates into invariance under the right coaction, or Z Z id ¼ 1 There is a theorem that such an integration, if it exists, is unique up to scale. In the finite-dimensional case it always exists, for any field k. At least in this P case, let exp = i ei f i for a basis {ei } of H and {f i } a dual basis. Then an application of the integral is Fourier transform H ! H defined by Z X F ðaÞ ¼ ei a f i i
with properties that one would expect of Fourier transform. The inverse is given similarly the other way up to a normalization factor and using the antipode of H. This is one among the many results from the abstract theory of Hopf algebras, see Sweedler (1969) and Larson and Radford (1988) among others. A given Hopf algebra H does not know which point of view one is taking on it; the axioms of a Hopf algebra include and unify both enveloping and coordinate algebras. So an immediate consequence is that constructions which are usual in one point of view give new constructions when the wrong point of view is taken (put another way, the self-duality of the axioms means that any general theorem has a second theorem for free, given, if we keep the interpretation of H fixed, by reversing all arrows in
the original theorem and its proof). Even the elementary examples above are quite interesting for physics if taken ‘‘upside down’’ in this way. For example, if G is nonabelian, then CG is noncommutative, so it cannot be functions on any actual group. But it is a Hopf algebra, so one could think ^ where G ^ is not a group but of it as being like C(G), ^ = CG. The latter a quantum group defined as C(G) is a well-defined Hopf algebra viewed the wrong way. So this is an application of noncommutative geometry to allow nonabelian Fourier transform F : C(G) ! CG. Similarly, U(g) is noncommutative but one could view it upside down as a quantization of C[g ] = S(g) (the symmetric algebra on g). To do this let us scale the generators of g so that the relations on U(g) have the form = [, ] where is a deformation parameter. Then the Poisson bracket that this algebra quantizes (deforms) is the Kirillov–Kostant one on g where {, } = [, ]. Here , on the left-hand side are regarded as functions on g , while on the right-hand side we take their Lie bracket and then regard the result as a function on g . Examples which have been used successfully in physics include: ½t; xi ¼ ixi ½xi ; xj ¼ i2ijk xk
bicrossproduct model R1;3 3 spin space model R
(summation understood over k). In both cases, we may develop geometry on these algebras using quantum group methods as if they were coordinates on a usual space (see Bicrossproduct Hopf Algebras and Noncommutative Spacetime). They are versions of Rn because the coproduct which expresses the addition law on the noncommutative space is the additive one according to the above. In the second case, setting the Casimir to the value for a spin j is the quadratic relation of a ‘‘fuzzy sphere.’’ As algebras, the latter are just the algebras of (2j þ 1)(2j þ 1) matrices. Going the other way, we can take a classical coordinate ring C[G] and regard it upside down as some kind of group or enveloping algebra but with a nonsymmetric . In the finite group case, an action of C(G) just means a G-grading. Here if an element v of a vector space has G-valued degree jvj then a . v = a(jvj)v is the action of a 2 C(G). Alternatively, this is the same thing as a right coaction of CG, R v = v jvj. Thus, the notion of group representation and group grading are also unified. This is familiar in physics for abelian groups (a U(1) action is the same thing as a Z-grading) but works fine using Hopf algebra methods for nonabelian groups and beyond.
690 Hopf Algebras and q-Deformation Quantum Groups
Returning to axioms, if one wants to speak of real forms and unitary representations, this corresponds, for Hopf algebras, to H a -algebra over C with ð Þ ¼ ð Þ;
S ¼ S1
where (throughout this article) denotes transposition of tensor factors. This requires in particular that S in invertible (which is not assumed for a general Hopf algebra though it does hold in the finitedimensional case and in all examples of interest). Thus, C[SU2 ] denotes the above with a certain structure whereby the matrix of generators is unitary.
q-Deformation Enveloping Algebras For a genuinely representative example of a Hopf algebra, consider, Uq (sl2 ) defined with noncommutative generators and relations, coproduct etc., qh=2 x qh=2 ¼ q1 x qh qh ½xþ ; x ¼ q q1 x ¼ x qh=2 þ qh=2 x qh=2 ¼ qh=2 qh=2 x ¼ 0;
qh=2 ¼ 1
Sx ¼ q1 x ;
Sqh=2 ¼ qh=2
The actual generators here are x , qh=2 but the notation is intended to be suggestive: if h existed and we took the limit q ! 1, we would have the usual enveloping algebra of the Lie algebra sl2 . The quantum group Uq (su2 ) is the same with the -structure h = h, x = x when q is real (there are other possibilities). Two words of warning here. Although some authors write q = eh=2 , the parameter q here has little to do with quantization. In fact, the cases of direct relevance to physics are q2i=(2þk) , where k is the level of the Wess–Zumino–Witten (WZW) model in which this quantum group appears as a generalized symmetry. This quantum group also (first) appeared in the theory of exactly solvable lattice models, namely the Ising model with an applied external magnetic field: q 6¼ 1 is a measure of the resulting nonhomogeneity of the model. Its origins go further back to the algebraic Bethe ansatz and the emergence of the YBE in such models (Baxter 1982). The general Uq (g) emerged from this context in Drinfeld (1987) and Jimbo (1985) and the same remark applies (see Affine Quantum Groups; Yang–Baxter Equations). The second warning is that at least informally (if one works with H and allows formal power series
etc.), the algebra here is isomorphic to usual U(sl2 ), that is, it looks deformed but the true deformation is not here but in the coproduct, which enters into the tensor product of representations. The latter are labeled as usual because the algebra is not really changed, for example, the unitary ones of Uq (su2 ) are labeled by spin. The spin-12 one even looks the same with x , h represented by the standard Pauli matrices. Tensor products of representations start to look different but their multiplicities are the same as classically and if V,W are representations then V W ffi W V. Because the coproduct above is not symmetric in its two factors, this isomorphism V, W = RV, W has RV, W nontrivial. From the formulas given, the reader can compute that 0 1 q 0 0 0 B 0 1 q q1 0 C C R1=2;1=2 ¼ q1=2 B @0 0 1 0A 0 0 0 q in a tensor product basis. For this particular quantum group, and others like it, one finds that these ‘‘R-matrices’’ obey the braid relations as a version of the YBE. As a result, they can and do lead to knot invariants; the one above leads to the Jones knot invariant as a polynomial in q. Briefly, one represents the knot on a plane, assigns R or R1 to each braid crossing and takes a suitable trace (see The Jones Polynomial). Since such features hold in any representation, these matrices are in fact representations of an invertible element R 2 H H provided one allows h as a generator and formal power series: ðqq1 Þef
R ¼ qðhhÞ=2 eq2
; e ¼ xþ qh=2 ; f ¼ qh=2 x
where eq ðxÞ ¼
1 X xm ; ½m q ! m¼0
½m q ¼
1 qm 1q
are the q-exponential and q-integer, respectively. Their proper explanation is in the section ‘‘Braided groups and quantum planes.’’ This R is called the ‘‘universal R-matrix’’ or quasitriangular structure and obeys ¼ Rð ÞR1 ð idÞR ¼ R13 R23 ;
ðid ÞR ¼ R13 R12
and from the axioms of a Hopf algebra, one may deduce that the YBE R12 R13 R23 ¼ R23 R13 R12 hold in the algebra. This induces the YBE for matrices RV, W in the representation V W. Such a
Hopf Algebras and q-Deformation Quantum Groups 691
Hopf algebra is called ‘‘quasitriangular’’ and its representations form a braided category (see Braided and Modular Tensor Categories). Even if R for a quasitriangular Hopf algebra is defined by a power series, the RV, W in finite-dimensional representations are typically actual matrices. Of considerable interest is the special case when q is a primitive nth root of unity. In this case the quasitriangular Hopf algebra uq (sl2 ) has the above generators but the additional relations en ¼ f n ¼ 0;
gn ¼ 1;
g ¼ qh
which render the algebra generated by e, f , g as n3 dimensional. The algebra no longer has a matrix block decomposition (is not semisimple) and not all representations descend to it. For example, if n is odd, then only representations of dimension n descend. Other than this, one has many of the features of a classical enveloping algebra now for this finite-dimensional object. There is evidence that such objects over C are intimately related to classical Lie algebras but over a finite field. Finally, there is a similar theory of Uq (g) for all Lie algebras determined by symmetrizable Cartan matrices {aij }, including affine ones. Here i, j 2 I an indexing set and aij = 2i j=i i 2 {0, 1, 2, . . . } for i 6¼ j, where is a symmetric bilinear form on the root lattice Z[I] generated by I with i i a positive even integer. To be precise, one should also fix a ‘‘root datum’’ in the form of an inclusion Z[I] X of the root lattice into a choice of character lattice X and an inclusion Z[I] Y of the coroot lattice (also labeled by I) into the cocharacter lattice Y (the dual of X). Here the evaluation pairing is required to restrict to hi, ji = aij if i, j 2 I and i is i viewed in the cocharacter lattice Y. We let qi = qii=2 and require q2i 6¼ 1 for all i (or one may consider q as an indeterminate). We have generators ei , f i for i 2 I and invertible ga for a each generator of Y, and the relations f i ga ¼ qha;ii ga f i
ga ei ¼ qha;ii ei ga ; ii=2
ii=2
gi gi j i qi q1 i 1a Xij 1 a ij ð1Þr ðei Þr ej ðei Þ1aij r ¼ 0 r qi r¼0 ½ei ; f j ¼
for all i 6¼ j and an identical set for the {f i }. The coalgebra and antipode are ii=2
ei ¼ ei gi þ 1 ei ii=2
f i ¼ f i 1 þ gi ga ¼ ga ga ; Sga ¼
g1 a ;
fi
ðga Þ ¼ 1;
Sei ¼
ii=2 ei gi ;
The q-Serre relations are those above involving the q-binomial coefficients, defined now using the symmetric q-integers (m)q = (qm qm )=(q q1 ). They have their true explanation as Adðei Þ1aij ðej Þ ¼ 0 where Ad is a braided group adjoint action in the sense of the section ‘‘Braided groups and quantum planes.’’ Notice that while the root generators are modeled on the Lie algebra, the Cartan generators are modeled on the torus of an algebraic group, which contains global information. Thus, the more precise form of Uq (sl2 ) is the e, f, g form with the generator g = qh as above, with Z[I] ( X and Z[I] = Y. Meanwhile Uq (psl2 ) has the square root of this as generator (what we called qh=2 before) with Z[I] = X and Z[I] ( Y where the strict inclusion has 1= 2 in the lattice Z. Note that, in the complex case, SL2 has compact real form SU2 while its quotient, PSL2 , has compact real form SO3 , so these are distinguished at the Hopf algebra level. In general, the root datum has an associated reductive algebraic group which is simply connected when Y = Z[I] and generated by its adjoint representation when X = Z[I]. The complexified character lattice is a sublattice of the more familiar Lie algebra weight lattice and labels representations that extend to the (algebraic) group. Langlands duality interchanges the roles of X, Y. These subtleties are lost when we work over formal power series with q = e=2 and Lie-algebra-like Cartan generators. These objects are mathematically so interesting that some authors define ‘‘quantum groups’’ as nothing more than this particular extension of the theory of Lie algebras, Cartan matrices and root systems. Among the deepest theorems is the existence of the Lusztig–Kashiwara canonical basis which is obtained from q = 0 but valid also at q = 1 (i.e., for classical enveloping algebras) and which has the remarkable property of inducing bases coherently across highest-weight representations. From a physicist’s point of view, however, there are many other Hopf algebras rather more closely connected with actual quantization. Most often, the terms quantum group and Hopf algebra are used interchangeably. There is similarly a reduced version uq (g). The simplest of all possible cases, even simpler than uq (sl2 ), is for what one could call uq (1) with a single generator g and gn ¼ 1; g ¼ g g; g ¼ 1; Sg ¼ g1
ðei Þ ¼ ðf i Þ ¼ 0 i
Sf ¼
ii=2 gi f i
Rq ¼
n1 1X qab ga gb n a;b¼0
692 Hopf Algebras and q-Deformation Quantum Groups
where q is a primitive nth root of unity. The Hopf algebra is the same as the group algebra ^ n ) but the R is nontrivial. A representaCZn = C(Z ^ n -graded space, that is, graded into tion means a Z degrees 0, 1, . . . , n 1. The braiding matrices have the diagonal form RVa , Wb = qab on components of degree a, b, respectively. The braided category generated in this case is the one where anyons live. From this point of view, uq (g) generate the category where nonabelian anyons live. Here Rq2 (in place of q (hh)=2 ) along with an additional eq2 factor as above gives the quasitriangular structure of uq (sl2 ). The physical model here is the rational conformal field theory mentioned above with these anyons as particular bound states. There is a proposal to use them in the construction of quantum computers.
q-Deformation Coordinate Algebras From the coordinate algebra point of view, the corresponding deformation to the one in the last section is the Hopf algebra Cq [SL2 ] with noncommuting generators and relations ca ¼ qac; db ¼ qbd;
ba ¼ qab dc ¼ qcd
bc ¼ cb;
da ad ¼ ðq q1 Þbc
ad q1 bc ¼ 1 The coalgebra has the same matrix form on the generators as for C[SL2 ] and the antipode and -structure (for Cq [SU2 ]) are a c a b d qb S ¼ ¼ b d a c d q1 c Its duality pairing with Uq (sl2 ) is afforded by the 2 2 Pauli-matrix representation of the latter. The Cq [SU2 ] Hopf -algebra may be completed to a C -algebra. One similarly has Cq [G] for all semisimple Lie groups G and their various real forms. From an axiomatic point of view, such quantum groups are ‘‘coquasitriangular’’ in the sense that there is a map R : H H ! k such that X X R að1Þ bð1Þ að2Þ bð2Þ ¼ bð1Þ að1Þ R að2Þ bð2Þ for all a, b 2 H and X
R a cð1Þ R b cð2Þ X Rða bcÞ ¼ R að1Þ c R að2Þ b Rðab cÞ ¼
for all a, b, c 2 H. We also require that R is invertible in a certain sense. These are just the arrow reversal of the axioms of a quasitriangular
structure. In general, for the deformation of a linear algebraic group we will have some n2 generators ti j , now taken to be noncommutative, and with a matrix form of coalgebra ti j ¼ ti k tk j ;
ti j ¼ i j
For the compact real form we will have Sti j = tj i . Moreover, from the first of the above axioms we will have among the relations k
Ri a b t a j tb l ¼ tk b t i a Ra j b l k
where Ri j l = R(ti j tk l ) is a matrix R 2 Mn Mn obeying the YBE. If we take only these quadratic relations, we have the ‘‘Faddier Reshetikhin Takhtajan (FRT) bialgebra’’ A(R) and it can be shown (see Majid 1995) that R extends to a coquasitriangular structure R on it. However, in our case we also have R1i j k l ¼ R Sti j tk l ~ i j k l ¼ R ti j Stk l R ~ = ((Rt2 )1 )t2 (t2 transposition in the second where R factor of Mn ) is called the ‘‘second inverse’’ of R. With these additional matrices, one may define a q-determinant and antipode relations as well (Majid 1995). One may also generate a rigid braided monoidal category and reconstruct a Hopf algebra A(R) from it. In this way, the R-matrix plays a role similar to that of the structure constants of a Lie algebra and can in principle define the quantum group coordinate algebra. Such R-matrices have been classified in low dimension and include multiparameter and other deformations of classical group coordinate algebra as well as other nonstandard quantum groups. In the Cq [G] examples it is not the coalgebra which is essentially deformed but the algebra. We already see this above on the generators but the coproduct of a product of generators may look different. Nonetheless, one can identify the vector space that the products generate with that of C[G] and at least informally with respect to a deformation parameter express the product as a power series in the undeformed product (a -product deformation). For generic values, one still has a Peter–Weyl decomposition Cq [G] = (V V ), where the sum is over irreducibles corepresentations, which can be identified with the classical representations of the algebraic group. One can make the same decomposition for C[G] and identify the matrix blocks V V in order to find this -product. Also, since this is a flat deformation, it follows that the commutator at lowest order defines a Poisson bracket on G, given by fti j ; tk l g ¼ ti a tk b ra j b l ri a k b ta j tb l
Hopf Algebras and q-Deformation Quantum Groups 693
and this Poisson bracket is compatible with the group product G G ! G as a Poisson map (because the Hopf algebra coproduct was an algebra map). Here r is the first order part in the expansion of the R-matrix. A Lie group equipped with a Poisson bracket compatible in this way is called a ‘‘Poisson Lie group.’’ On general functions its Poisson bivector is generated by the first order part r 2 g g in the expansion of R in the q-deformed enveloping algebra. In place of the YBE obeyed by R, we have the ‘‘classical Yang–Baxter equations (CYBE),’’
Group duals
Quantum theory
Abelian groups
Hopf algebras
Monoidal categories
Riemannian geometry
Nonabelian groups
Figure 1 Role of Hopf algebras along the self-dual axis.
½r12 ; r23 þ ½r12 ; r13 þ ½r13 ; r23 ¼ 0 In this way, one may characterize an ‘‘infinitesimal version’’ of Uq (g) as (g, r, ) where : g ! g g is the leading part of and makes the triple into a quasitriangular ‘‘Lie Bialgebra’’ (see Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups). Finally, returning to our example, when q is an nth root of unity, one has the q-Frobenius Hopf algebra homomorphism
C½SL2 ,! Cq ½SL2 n a b a bn 7! n c dn c d
that is, a classical copy sitting inside the quantum group. Quotienting by this means adding the relations an ¼ dn ¼ 1;
bn ¼ cn ¼ 0
which gives the finite-dimensional reduced quantum red group Cred q [SL2 ]. Similarly for other Cq [G]. These reduced quantum groups provide finite noncommutative geometries having the geometric flavor of the classical geometry but where geometry and physics (such as electromagnetic gauge theory modes) are fully computable.
Self-Dual Quantum Groups The arrow-reversibility of the axioms of a quantum group make it possible to search for self-dual quantum groups or for quantum groups which, if not self-dual, have a self-dual form. This leads to the bicrossproduct quantum groups coming from models of quantum gravity (Majid 1988) (see Bicrossproduct Hopf Algebras and Noncommutative Spacetime). The context here is that of Figure 1 which shows how Hopf algebras relate to other objects and to duality in a representation-theoretic sense. Along the central axis, we have put self-dual categories or in physical terms categories admitting Fourier transform. This is clear for abelian Groups where the
^ of an abelian group G is also an abelian dual G group. Below the axis, we have nonabelian groups which we view as toy models of geometries with curvature. Every compact Lie group, for example, has an associated Killing metric. Above the axis, a ^ means to construct unitary nonabelian group dual G representations etc., which we view as toy models of quantum theory. We have seen that Hopf algebras are another self-dual category and provide a framework in which both groups and group duals can be unified (see the section ‘‘Hopf algebras and first examples’’). Thus, G can be viewed as a coordinate Hopf algebra C(G) or C[G] in the finite or Lie cases, ^ as the dual Hopf algebra CG or U(g) as a and G ^ definition of the coordinate algebra ‘‘C(G).’’ Note that ^ G is not merely the set of representations, as these alone are not enough to reconstruct the group (e.g., both S1 and SO3 have the same set). We see that Hopf algebras are a microcosm for the unification of quantum theory and gravity. Hopf algebra duality interchanges the role of position and momentum on the one hand and of quantum and gravitational effects on the other. A self-dual Hopf algebra has both aspects unified and interchanged by the self-duality. One can also ask what the next most general selfdual category of objects is in which to look for more general unifications. One answer here is the category whose objects are themselves categories C equipped with a tensor product (a ‘‘monoidal category’’) and a monoidal functor to a fixed monoidal category V. Motivated by the above, a theorem from the 1980s is that for any such C there is a dual C0 of ‘‘representations in V’’ (Majid 1991a). The dotted arrows in Figure 1 indicate that this may be a setting for more ambitious models than those achieved by Hopf algebras alone. In fact, the C0 construction was one of the ingredients going into the invention of 2-categories a few years later. See also several articles on TQFT (such as Topological Quantum Field Theory: Overview; Axiomatic Approach to Topological Quantum Field Theory; Duality in Topological Quantum Field Theory).
694 Hopf Algebras and q-Deformation Quantum Groups
The simplest self-dual quantum group is C[x] as the Hopf algebra of polynomial functions on a line with additive coproduct. This is dually paired with itself in the form of the enveloping algebra U(gl1 ) = C[p] with pairing hpm ; xn i ¼ ðiÞn m;n n! and similarly for higher-dimensional flat space. In the case of C[x], a basis is xn and from the above the dual basis is (ip)n =n!. Hence the canonical element is exp = eixp so that Hopf algebra Fourier transform on a suitable completion of these algebras reduces to usual Fourier transform. A more nontrivial example (Majid 1988) is given by the ‘‘Planck-scale Hopf algebra’’ C[x] C [p] which has algebra and coalgebra ½p; x ¼ i hð1 e x Þ;
x ¼ x 1 þ 1 x
p ¼ p e x þ 1 p; Sx ¼ x;
x ¼ p ¼ 0 Sp ¼ pe x
The actual generator here should be e x rather than x for an algebraic treatment (otherwise one should allow power series or use C -algebras). The dually paired Hopf algebra has the same form C[p] C[x], with new parameters h0 = 1= h and 0 = h and quantum group Fourier transform connects the two. More details and the general construction of Hopf algebras C[M] U(g) with dual U(m) C[G] are in the article on ‘‘bicrossproduct’’ Hopf algebras (see Bicrossproduct Hopf Algebras and Noncommutative Spacetime). These quantize particles in M moving under momentum Lie group G with Lie algebra g and vice versa. The states of one (in a C -algebra context) lie in the algebra of observables of the other (‘‘observable–state duality’’). The data required are a matched pair of actions of (G, M) on each other. Such equations correspond locally to a factorization of a larger group G ffl M but typically have singularities and other features in keeping with a toy model of Einstein’s equations. There are, by the time of writing, many applications of bicrossproducts beyond the original one, 3 including a Poincare´ quantum group for the R1, mentioned in the section ‘‘Hopf algebras and first examples,’’ with links to Planck-scale physics. There is also a bicrossproduct quantum group C[G ] U(g) canonically associated to any simple Lie algebra g and related to T-duality. The classical data here are Lie bialgebras and solutions of the CYBE as in the section ‘‘q-Deformation coordinate algebras,’’ however there is no known relation with the q-deformation Hopf algebras themselves. Finite group bicrossproducts are also interesting and examples (but not with both actions nontrivial) were already in the works of GI Kac in the 1960s.
These constructions also work when the groups above are themselves Hopf algebras. For example, any finite-dimensional Hopf algebra H has a ‘‘quantum double’’ D(H) = H ffl H op , where the double cross product ffl is by mutual coadjoint actions. The cross-relations between the two subHopf algebras are X X hhð1Þ ; að1Þ ihð2Þ að2Þ ¼ að1Þ hð1Þ hhð2Þ ; að2Þ i for h 2 H and a 2 H . The construction is due to Drinfeld (1987) while the ffl form is due to the author. Moreover, D(H) is quasitriangular with R = exp , the canonical element used in the Fourier transform on H. Its representations consist of vector spaces where H acts and at the same time H op acts or (which makes sense when H is infinite dimensional) where H coacts, in a compatible way. Such objects are called ‘‘crossed modules’’ because when H = CG, one has exactly a linearization of the crossed G-sets of JC Whitehead. They are a special case of the C0 construction mentioned above. Finally, one can also view the q-deformed linear spaces on which quantum groups such as Uq (g) act as self-dual Hopf algebras under an additive coproduct. However, this needs to be as braided groups or Hopf algebras with braid statistics, see the next section. The simplest example here is the ‘‘braided line’’ B = C[x] developed not as above but as a self-dual Hopf algebra with q-statistics. Its ‘‘bosonization’’ gives a self-dual Hopf algebra Uq (bþ ) Uq (sl2 ), and similarly for other Uq (bþ ) Uq (g). Perhaps more surprisingly, the quantum groups Uq (g) and Cq [G] also both have canonical braided group versions (a process called ‘‘transmutation’’) and as such they too are isomorphic. This isomorphism extends the linear isomorphism g ! g afforded by the Killing form of any semisimple Lie algebra. In physical terms, what this means is that there is in q-deformed geometry just one self-dual object Bq (G) with two different scaling limits UðgÞ Bq ðGÞ ! C½G as q ! 1, and the structure of which underlies the deeper structure of Uq (g) and Cq [G] as well.
Braided Groups and Quantum Planes A super quantum group or super-Hopf algebra is not a quantum group or Hopf algebra since the key homomorphism property of : H ! H H is modified: one must use in the target HH the Z2 -graded or super tensor product of super algebras. Here, ða bÞðc dÞ ¼ ð1Þjbjjcj ac bd
Hopf Algebras and q-Deformation Quantum Groups 695
for elements of degree jbj, jcj. Super quantum groups Uq (glm j n ) etc., have been constructed and have an analogous theory to the bosonic versions above. Super spaces in physics are associated to differential forms and in the same way a bicovariant exterior algebra on a quantum group H is generally a super quantum group. Here the exterior algebra is generated on by 1-forms and the coproduct on 1forms is ¼ L þ R Here L, R are the coactions of H on 1-forms induced by the left and right coaction of H on itself. For a true understanding of quantum groups one must, however, go beyond such objects to ‘‘braided groups’’ or Hopf algebras with braid statistics (see Majid 1995). This theory was introduced by the author in the early 1990s as a more systematic method for q-deformation of structures in physics based on q-group covariance. We have seen that a quasitriangular quantum group, or any Hopf algebra through its double, generates a braided category with the flip map replaced by a braiding V, W between any two representations. Anything which is covariant under the quantum group means by definition that it lives in the braided category. Working with such ‘‘braided algebras’’ is similar to working with superalgebras except that one should use in place of the graded transposition in any algebraic construction. In particular, two braided algebras have a natural ‘‘braided tensor product’’ also in the category. In concrete terms, ða bÞðc dÞ ¼ aðb cÞd Then a Hopf algebra in the braided category or braided group is B, an algebra in the category along with a coalgebra and antipode, where : B ! BB is an algebra homomorphism (see Braided and Modular Tensor Categories). Next, we have mentioned in the section ‘‘q-Deformation enveloping algebras’’ that q-algebras generate topological invariants, but we now turn this on its head and use braid diagrams to do qalgebra. We write all operations as flowing down the page, any transpositions in the algebraic construction are expressed as a braid crossing = or its inverse by the reversed braid crossing, and any other operations as nodes. Thus, a product is denoted and a coproduct . Algebraic information ‘‘flows’’ along these ‘‘wires’’ much like the way that information flows along the wiring in a computer, except that under- and over-crossings represent distinct nontrivial operators. (In fact, one may formulate topological quantum computers exactly in this way.) In this notation, tensor
products are denoted by juxtaposition and the trivial object in the category is omitted. In particular, one has the axioms and all general theorems of Hopf algebras at this diagrammatic level. For example, the adjoint action of any braided group B on itself is (see Majid 1995) Δ
B B S
Ad = B
.
In any concrete example, such diagrams turn into R-matrix formulas where = R as explained in the section ‘‘q-Deformation enveloping algebras.’’ A basic example of a braided group is the braided q-plane C2q with generators x, y and relations yx = qxy. Its coproduct is the additive one x = x 1 þ 1 x (and similarly for y) reflecting addition in the plane, but this is extended to products as a braided group with braiding q1=2 R1=2, 1=2 in terms of the R-matrix in the section ‘‘q-Deformation enveloping algebras.’’ The extra factor here means that C2q lives in the braided category of representations of e q (sl2 ) (i.e., with an additional central Uq (gl2 ) = U Uq (1) generator to provide the q1=2 ). More precisely, the category is that of corepresentations of e q [SL2 ]. The coaction in this case is Cq [GL2 ] = C a b R ðx yÞ ¼ ðx yÞ c d where the additional central generator is encoded in the q-determinant (which is no longer set equal to 1). Notice that q1=2 R1=2, 1=2 has eigenvalues q, q1 (one says that it is q-Hecke). Another braided group, associated now to the second eigenvalue is C0q j 2 with generators , and relations = q1 , 2 = 2 = 0. It is the quadratic algebra dual of C2q (Manin 1988). One has natural braided linear spaces for the whole family Cq [G], on which the latter coact after central extension. The general construction is as follows. If V is an object in a braided category (e.g., the fundamental representation of a quantum group), let T(V) be the tensor algebra generated by a basis {ei } of V with no relations and the additive braided coproduct as above. Assume that V has a dual V in the category, and similarly form T(V ) with dual basis generators {f i }. These two braided groups will be dually paired by extending the evaluation map to products, which takes the form of ‘‘braided integers’’ (see Majid 1995) n hf im f i1 ; ej1 ejn i ¼ n;m ½n; !ij11 i jn
½n; ¼ id þ 12 þ 12 23 þ þ 12 n1;n
696 Hopf Algebras and q-Deformation Quantum Groups
We now quotient by the kernels of this pairing to obtain B(V), B(V ) as two nondegenerately paired braided groups. This quotient generates all the relations, which are very often but not necessarily quadratic (in practice, one typically imposes only the quadratic relations to have braided groups with a possibly degenerate pairing). The construction is due to the author. Moreover, we can define partial derivatives on these braided groups by a = 1 a þ ei @ i a þ for any a in the algebra, that is, as an infinitesimal generator of translations under the braided group law; similarly exp, indefinite and Gaussian integration, Fourier transform, etc. The simplest example here is B = C[x] viewed not as a usual Hopf algebra but as a braided group in the category of Z-graded spaces with (x x) = qx x. Also in this example the braided addition law on C[x] is n X n xn ¼ xm xnm m q
Hopf algebra H. These have been extensively applied in physics notably in the construction of inhomogeneous quantum groups. Similar to C2q (but as a -algebra), there is a natural self-dual q-Minkowski space B = R 1,3 which is covariant under Uq g (so1,3 ), q and its bosonization is the q-Poincare´ plus dilations Uq g group R1,3 (so1,3 ). It is not possible to avoid the q dilation here. The double-bosonization extends this to the q-conformal group Uq (so2,4 ). The braided adjoint action becomes the action of conformal translations on R 1,3 q . The construction of q-propagators and q-deformed physics on such q-Minkowski space was achieved in the mid 1990s as one of the main successes of the theory of braided groups. This R 1,3 can be given also as a matrix of q generators, relations, -structure and, a second braided coproduct:
¼ q2 ;
¼ þ ð1 q Þð Þ ¼ þ ð1 q2 Þ
defined by [m]q , and the partial derivative defined by it is the Jackson (1908) q-derivative f ðxÞ f ðqxÞ xð1 qÞ
while eq (x) = eq (x) eq (x) if we allow power series. Such objects occur in the theory of q-special functions (see q-Special Functions). Among deeper theorems (see Majid 1995, 2002), there is a triangular decomposition Uq ðgÞ ¼ Uq ðn Þ
T
Uq ðnþ Þ
where Uq (nþ ) is a braided group and Uq (n ) is dually paired to its opposite. T denotes the torus generators {ga } in the section ‘‘q-Deformation enveloping algebras.’’ More generally, if g0 g is a principal embedding of Lie algebras (given by an inclusion of Dynkin diagrams), then Uq (g) = B Ug Bop q (g0 ) for some additive braided group of additional root generators and its dual. The general construction B H Bop here is ‘‘double bosonization’’ which associates to dual braided groups B, B in the category of representations of some quasitriangular Hopf algebra H, a new quasitriangular Hopf algebra. The simplest example B = C[x] lives in the category of representations of T = Uq (1) in an algebraic form. The dual is another braided line C[p] and C[p] Uq (1) C[x] is a version of Uq (sl2 ). In this way, the braided line C[x] is at the root of all q-deformation quantum groups. An earlier theorem is that for any braided group B covariant under a (co)quasitriangular H, we have its ‘bosonization’ B H. There is a similar ‘‘biproduct’’ if B lives in the category of crossed modules for any
¼
2
m¼0
@f ðxÞ ¼
¼ q2 ;
¼ þ ð1 q2 Þ
¼
1 0 ¼ ; ¼ 0 1
This is in addition to the additive coproduct above. It corresponds to the point of view of Minkowski space as Hermitian 2 2 matrices. Note that is not a -algebra map in the usual sense and indeed Hermitian matrices are not a group under multiplication, but this does form a natural braided bialgebra. If we quotient by the braided determinant relation q2 = 1, we have the unit hyperbo3 loid in R1, which turns out to be the braided group q Bq [SU2 ] mentioned at the end of the previous section (as obtained canonically from Cq [SU2 ]). We now have a braided antipode
q2 þ ð1 q2 Þ q2
S ¼ q2 This was the first nontrivial example of a braided group (Majid 1991b) and we see that it has two q ! 1 limits Uðsu2 Þ Bq ½SU2 ! C½Hyperboloid R1;3 Because most constructions in physics can be uniformly deformed by such methods (including the totally q-antisymmetric tensor), one finds that q provides a new regulator in which infinities in quantum field theory can be in principle be encoded as poles at q = 1. That transmutation from the
Hopf Algebras and q-Deformation Quantum Groups 697
quantum group to its braided version unifies unitary nonabelian symmetries with pseudo-Riemannian geometry is another deeper aspect of relevance to physics. In addition, q-constructions have their original role in quantum integrable systems, at q a root of unity and for infinite-dimensional (affine) Lie algebra deformations.
Quasi-Hopf Algebras Although the braided category of representations of a quantum group has a trivial ‘‘associator’’ V, W, Z : (V W) Z ! V (W Z) between any three objects, a general braided category and the diagrammatic methods of ‘‘braided algebra’’ in the last section do not require this (one simply translates diagrams into algebra by inserting as needed). A more general object that generates such categories as its representations is a ‘‘quasi-Hopf algebra.’’ This is a generalization of Hopf algebras in which the coproduct : H ! H H is not necessarily coassociative. Instead, ðid Þ ¼ ðð idÞ Þ 1 ðid idÞ ¼ 1 234 ðid idÞð Þ 123 ¼ ðid2 Þð Þð id2 Þð Þ for some invertible element 2 H H H. The numbers denote the position in the tensor product and one says that is a 3-cocycle. The axioms for the antipode and quasitriangular structure R are also modified. The tensor product of representations is given as usual by , and the braiding and associator by the actions of R and . This notion, due to Drinfeld (1990), arises when one wishes to write down the quantum groups Uq (g) more explicitly as built on the algebras U(g) (recall that they are isomorphic over formal power series). Thus, for each semisimple g there is a natural (quasitriangular) quasi-Hopf algebra (U(g), , R) where U(g) has the usual Hopf algebra structure, R is an exponential of the split Casimir (or inverse Killing form) in g g and is constructed as a solution of the Knizhnik–Zamolodchikov equations coming out of conformal field theory. This is not Uq (g) but it has an equivalent braided category of representations. Thus, there is an element F 2 U(g)2 (extended over formal power series) such that F ¼ Fð ÞF1 ;
RF ¼ F21 RF1
F F12 ð idÞðFÞ ðid ÞðF
1
1 ÞF23
¼1
recovers Uq (g) as a quasitriangular Hopf algebra built directly on the algebra U(g). The conjugation
operations here (and a similar process regarding the antipode) are a ‘‘Drinfeld twist’’ of a quasi-Hopf algebra, and such twisting by any invertible F such that ð idÞF ¼ ðid ÞF ¼ 1 (a cochain) does not change the representation category up to equivalence. In the present case, the twist transforms into F = 1, that is, into an ordinary Hopf algebra isomorphic over formal power series to Uq (g). Note that in rational conformal field theory the tensor product of representations appears as a finite-dimensional commutative associative algebra (the Verlinde algebra) with integer structure constants N ij k (this comes from the operator-product expansion of primary fields in the theory). This is because one has more precisely a truncated representation category corresponding to q a root of unity, and because we are identifying equivalent representations (so N ij k are the multiplicity in the decomposition of a tensor product of two representations). However, if one wants to know the tensor product decomposition more fully, not just its isomorphism class, this is given in a choice of bases by recoupling matrices. Computation in terms of these shows that the actual tensor product is neither commutative nor associative, but of the form above at least in the case of the WZW model. Hopf algebra theory typically extends to the quasi-Hopf case. For example, given a quasi-Hopf algebra H there is a quantum double D(H) at least in the finite-dimensional case, due to the author. An example is to take H = C(G) and a 3-cocycle on G in the usual sense ðy; z; wÞ ðx; yz; wÞ ðx; y; zÞ ¼ ðx; y; zwÞ ðxy; z; wÞ on elements of G and (x, 1, y) = 1. Then (C(G), ) can be viewed as a quasi-Hopf algebra. Its double D (G) is generated by C(G) as a sub-quasi-Hopf algebra and by elements of G with X x y ¼ yx s ðy; xÞðsÞ; s x ¼ xxsx1 s
x ¼
X ðx; x1 ax; x1 bxÞ ða; b; xÞ ab¼s
ða; x; x1 bxÞ
xa xb in terms of a basis {s } of C(G), the product of G on the right, and
ðx; yÞðsÞ ¼
ðx; y; y1 x1 sxyÞ ðs; x; yÞ ðx; x1 sx; yÞ
698 Hopf Algebras and q-Deformation Quantum Groups
a 2-cocycle on G with values in C(G) (the algebra is a cocycle semidirect product). There is a quasitrianP gular structure R = x x. This quasi-Hopf algebra first appeared in discrete topological quantum field theory related to orbifolds in the work of Dijkgraaf, Pasquier, and Roche. There are further generalizations in the same spirit and which are linked to conformal field theories of more general type; for example, weak (quasi-) Hopf algebras in which 1 6¼ 1 1 but is a projector. These have been related to quantum groupoids. Finally, we mention some applications of twisting outside of the original context. First of all, we are not limited to starting with U(g): starting with any Hopf algebra or quasi-Hopf algebra H we can similarly twist it to another one HF with the same algebra as H and F , RF , F given by conjugation as above. The representation category remains unchanged up to equivalence, so in some sense the twisted object is equivalent. Moreover, if we start with a Hopf algebra H and ask F to be a 2-cocycle in the sense 1 F12 ð idÞðFÞðid ÞðF1 ÞF23 ¼1
then HF will remain a Hopf algebra. It has conjugated antipode (see Majid 1995) SF ðaÞ ¼ UðSaÞU1 ;
U ¼ ðid SÞðFÞ
Many Hopf algebras are twists of more standard ones, for example, the multiparameter quantum groups tend to be twists of the standard Uq (g). Likewise, ‘‘triangular’’ Hopf algebras (where R21 R = 1) tend to be twists of classical group or enveloping algebras. A second application of twists is an approach to quantization. Although it can be applied to H itself, this is more interesting if we think of H as a background quantum group and ask to quantize objects covariant under H. For the sake of discussion, we start with H an ordinary Hopf algebra. We twist this to HF and denote by T the equivalence functor from representations of H to representations of HF . This functor acts as the identity on all objects and all morphisms, but comes with nontrivial isomorphisms cV, W : T (V) T (W) ! T (V W) for any two objects, compatible with bracketting (see Majid 1995). Given any algebraic construction covariant under H, we simply apply the functor T to all aspects of the construction and obtain an equivalent HF -covariant construction. As an example, if A is an H-covariant algebra, then applying T to its product we have T () : T (A A) ! T (A). Using cA, A we obtain a map : T ðAÞ T ðAÞ ! T ðAÞ a b ¼ ðF1 . ða bÞÞ
in terms of the product in A. Thus, we have a new algebra AF built on the same vector space as A but with a modified product. This is called a ‘‘covariant twist’’ of an algebra and should not be confused with the Drinfeld twist above. It is due to the author in the early 1990s. If F is a 2-cocycle, then AF remains associative. The transmutation construction mentioned in the section ‘‘Self-dual 3 quantum groups’’ or the passage from R4q to R 1, q are examples in quantum group theory. Other examples include the standard Moyal product on Rn , also called noncommutative spacetime [x , x ] = i by string theorists (see Bicrossproduct Hopf Algebras and Noncommutative Spacetime). If we do not demand that F is a cocycle, then the algebra AF is still associative but in the target category, which means ða bÞ c ¼ ððÞÞA;A;A ðða bÞ cÞ Such objects are called ‘‘quasialgebras.’’ It may still be that A, A, A happens to be trivial ( F happens to act trivially) so that AF remains associative. This turns out frequently to be the case and many quantizations in physics, including Cq [G] but not limited to q-examples, can be obtained in this way. It means that although they are associative there is a hidden nonassociativity which can surface in other constructions involving . The physical application here is with H = U(g) a classical enveloping algebra, A functions on a classical manifold on which g acts, and a cochain F. In general the resulting quasialgebra will not be associative but rather a quantization of a ‘‘quasi-Poisson manifold’’ obeying fa; fb; cgg þ cylic ¼ 2~ nða b cÞ ~ is the trivector field for the action of the Here n lowest order part of F and the (quasi)Poisson bivector is the leading-order part of F21 F1 . As ~ (and the mentioned, there are many cases where n action of the rest of F ) happens to be trivial. Finally, let us give a discrete example using such quantum group methods. We consider H = C(G) and F 2 C(G G) a cochain. Twisting by this gives HF = (C(G), F ) a quasi-Hopf algebra where F ðx; y; zÞ ¼
Fðy; zÞFððx; yzÞ Fðxy; zÞFðx; yÞ
We take A = CG the group algebra. The action of C(G) on it is the diagonal one. The modified algebra AF therefore has product x y ¼ F1 ðx; yÞxy in terms of the product in G, and will be a quasialgebra if F is not a cocycle. For example, let
Hopf Algebras and q-Deformation Quantum Groups 699
G = (Z2 )3 which we write additively (so elements are 3-vectors with values in Z2 ) and take P x y þy x x þx y x þx x y Fðx; yÞ ¼ ð1Þ i 0), is the most significant, which is the natural approximation of a potential near its minimum, when nondegenerate.
Are the trajectories bounded? Are there periodic trajectories? Is one trajectory dense in its energy surface? Is the energy surface compact?
The solution of these questions could be very difficult. Let us just mention the trivial fact that, if p1 () is compact for some , then, by the conservation of energy law pðxðtÞ; yðtÞÞ ¼ pðy; Þ
½2
the whole trajectory starting of one point (y, ) remains in the bounded set {p1 (p(y, ))} in R2n . This is in particular the case for the harmonic oscillator. Quantum Mechanics
The quantum theory was born dynamics-wise around 1920. It is structurally related to the classical mechanics in a way that we shall describe very briefly. In quantum mechanics, our basic object will be a (possibly nonbounded) self-adjoint operator defined on a dense subspace of a Hilbert space H. In order to simplify the presentation, we shall always take H = L2 (Rn ). This operator can be associated with p by using the techniques of quantization. We choose here to present a procedure, called the Weyl quantization procedure (which was already known in 1928), which under suitable assumptions on p and its derivatives, will be defined for u 2 S(Rn ) by pw ðx; hDx ; hÞuðxÞ ZZ i n ¼ ð2hÞ exp ðx yÞ h xþy p ; ; h uðyÞ dy d 2
½3
The operator pw (x, hDx , h) is called an h-pseudodifferential operator of Weyl symbol p. One can also write Opw h (p) in order to emphasize that it is the operator associated to p by the Weyl quantization. Here h is a parameter which plays the role of the Planck constant. Of course, one has to give a sense to these integrals and this is the object of the theory of the oscillatory integrals. If p = 1, we observe that, by Plancherel’s formula, ZZ i n uðxÞ ¼ ð2hÞ ðx yÞ exp h uðyÞ dy d
½4
702 h-Pseudodifferential Operators and Applications
the associated operator is nothing but the identity operator. A way to rewrite any h-differential operator P a jjm (x)(hDx ) as an h-pseudodifferential operator is to apply it on both sides to [4]. In particular, we observe that if the symbol is p(x, ) = 2 þ V(x), then the operator associated with p by the h-Weyl quantization is the Schro¨dinger operator h2 þ V. Other interesting examples appear naturally in solid state physics. Let us, for example, mention the Harper operator H (or almost-Mathieu; see Helffer and Sjo¨strand (1989) and references therein), whose symbol is the map (x, ) 7! cos þ cos x, and which can also be defined, for u 2 L2 (R), by ðHuÞðxÞ ¼ 12 ðuðx þ hÞ þ uðx hÞÞ þ cos x uðxÞ We shall later recall how to relate the properties of p and those of the associated operator. More precisely, we shall describe under which conditions on p the operator pw (x, hDx ; h) is semibounded, symmetric, essentially self-adjoint, compact, with compact resolvent, trace class, Hilbert–Schmidt (see Robert (1987) for an extensive presentation). But before looking at a more general situation, let us consider the case of the Schro¨dinger operator Sh = h2 þ V(x). If V is (say, continuous) bounded from below, Sh , which is a priori defined on S(Rn ) as a differential operator, admits a unique self-adjoint extension on L2 (Rn ). We are first interested in the nature of the spectrum. If VðxÞ ! þ1 as jxj ! 1, one can show that Sh , more precisely its self-adjoint realization, has compact resolvent and its spectrum consists of a sequence of eigenvalues tending to 1. We are next interested in the asymptotic behavior of these eigenvalues. In the case of the harmonic operator, corresponding to the potential VðxÞ ¼
n X
j x2j
ðwith j > 0Þ
j¼1
The eigenfunction 0 is strictly positive and decays exponentially. Moreover (and here we enter in the semiclassical world), the local decay in a fixed closed set avoiding {0} (which is measured by its L2 -norm) is exponentially small as h ! 0. In particular, this says that the eigenfunction lives asymptotically in the set {V(x) (h)}. This last set can also be understood as the projection by the map (x, ) 7! x of the energy surface, which is classically attached to the eigenvalue (h), that is, {(x, ) 2 R 2n j p(x, ) = (h)}. This is a typical semiclassical statement, which will be true in full generality. From Quantum Mechanics to Classical Mechanics: Semiclassical Mechanics
Before describing the mathematical tools involved in the exploration of the correspondence principle, let us describe a few results which are typical in the semiclassical context. They concern Weyl’s asymptotics and the localization of the eigenfunctions. Weyl’s asymptotics We start with the case of the Schro¨dinger operator Sh , but we emphasize that the h-pseudodifferential techniques are not limited to this situation. We assume that V is a C1 -function on Rn which is semibounded and satisfies inf V < lim VðxÞ jxj!1
The Weyl theorem (which is a basic theorem in spectral theory) implies that the essential spectrum is contained in " # lim VðxÞ; þ1 jxj!1
It is also clear that the spectrum is contained in [inf V, þ1]. In the interval " #
the criterion of compact resolvent is satisfied and the spectrum is described as the set of ðhÞ ¼
n X pffiffiffiffiffi j ð2j þ 1Þh j¼1
for 2 N n . In this case we also have a complete description of the normalized associated eigenfunctions which are constructed recursively starting from first P the pffiffiffiffiffi eigenfunction corresponding to 0 (h) = j j h: ! ! n pffiffiffiffiffi1=4 Y j 1 X pffiffiffiffiffi 2 j xj 0 ðx; hÞ ¼ exp ½5 h 2h j j¼1
I = inf V; lim VðxÞ jxj!1
the spectrum is discrete, that is, it has only isolated eigenvalues with finite multiplicity. For any E in I, it is consequently interesting to look at the counting function Nh (E) of the eigenvalues contained in [inf V, E], Nh ðEÞ ¼ ]fj ðhÞ; j ðhÞ Eg
½6
The main semiclassical result is then Theorem 1
With the previous assumptions, we have: Z lim hn Nh ðEÞ ¼ ð2Þn ðE VðxÞÞn=2 dx
h!0
VðxÞE
h-Pseudodifferential Operators and Applications
The main term in the expansion of Nh (E), which will be denoted by Z Wh ðEÞ :¼ ð2hÞn ðE VðxÞÞn=2 dx VðxÞE
is called the Weyl term. It has an analog for the analysis of the counting function for Laplacians on compact manifolds (see Quantum Ergodicity and Mixing of Eigenfunctions and references therein), but let us emphasize that here E is fixed and that one looks at the asymptotics as h ! 0. In the other case, h is fixed and one looks at the asymptotics as E ! þ1 (note that on a compact manifold and for the Laplacian, the formula Nh (E) = N1 (E=h2 ) permits switching between these cases). Although this formula is rather old (first as a folk theorem), many efforts have been made by mathematicians for analyzing the remainder (see Robert (1987), Ivrii (1998) and references therein) Nh (E) Wh (E), whose behavior is again related to classical analysis. When E is not a critical value of V, hnþ1 (Nh (E) Wh (E)) can be shown to be bounded but it appears to be o(1) if the measure of the periodic points for the flow is 0 (see Ivrii (1998)). Beyond the analysis of the counting function, one is also interested (e.g., in questions concerning the ground-state energy of an atom with a large number of particles, N, satisfying the Pauli exclusion principle (see Stability of Matter)) in other quantities like the Riesz means, which are defined, for a given s 0, by X Nhs ðEÞ ¼ ðE j Þsþ j
The case s = 0 corresponds to the counting function. It is then natural to ask for the asymptotic behavior as h ! 0 of these functions. We have, for example, the following result (Helffer–Robert, Ivrii–Sigal, and Ivrii; see Robert (1987) and Ivrii (1998)), which is written here in a more Hamiltonian version, when E is not a critical value of V, ! Z Nhs ðEÞ ¼ ð2hÞn
ðpE ðx; ÞÞs dx d
pE ðx;Þ0
þ Oðhinfð1þs;2Þ Þ with pE (x, ) = 2 þ V(x) E. Uncertainty principle and Weyl term The Weyl term can be heuristically understood in the following way. According to the uncertainty principle, a ‘‘quantum’’ particle should occupy at least a volume
703
with the measure of order hn in the phase space P dx d (proportional to ( nj= 1 dj ^ dxj )n ). This guess is a consequence of the inequality !1=2 Z h 2 2 2 kuk ðx x0 Þ juj dx 2 R 2 !1=2 Z h d ; 8u 2 SðRÞ i dx 0 u dx R expressing the noncommutation of the operators ((h=i)d/dx 0 ) and (multiplication by) (x x0 ). When kuk = 1 and x0 (mean position) R and 20 (mean momentum) are defined by x := 0 R xjuj dx and R (x) dx, this inequality expresses 0 := (h=i) R u0 (x) u the impossibility for a quantum particle to have a simultaneous small localization in position and momentum. Consequently, the maximal number of ‘‘quantum’’ particles which can live in the region {pE (x, ) 0} is approximately (up to some universal multiplicative constant) the volume of this region divided by (2h)n . Lieb–Thirring inequalities and Scott’s conjecture In the case of regular potentials, we have seen that the quantityRhn Nhs (E) was asymptotically equal as h ! 0 sþn=2 to Lcl dx). For other quess, n ( V(x)E (E V(x)) tions occurring in atomic physics (see Stability of Matter), one is more interested in the existence of universal constants Ms, n such that ! Z sþn=2 n s ðE VðxÞÞ dx h Nh ðEÞ Ms;n VðxÞE
for any V and any h. The best Ms, n (which exists if s þ n=2 > 0) is denoted by Ls, n (for s = 0; this is called the Cwickel–Lieb–Rozenblium inequality). The semiclassical result gives the inequality Ls, n Lcl s, n . A still open question is the so-called Lieb–Thirring conjecture: do we have L1, 3 = Lcl 1, 3 ? This is related to the question of the stability of the matter (see Stability of Matter). The last results in this direction have been obtained quite recently by A Laptev and T Weidl, who show, for example, the equality for s 3=2. The control, when s = 1, of a second term (for more singular potentials) for Nhs (E) was the object of the Scott conjecture, which was solved recently in many important cases by Hughes, Siedentop–Weikard, Ivrii–Sigal, and Feffermann–Secco (see Ivrii (1998), Stability of Matter, and references therein). Localization of the eigenfunctions The localization property was already observed on the specific case of the harmonic oscillator. But this was a consequence
704 h-Pseudodifferential Operators and Applications
of an explicit description of the eigenfunctions. This is quite important to have a good description of the decay of the eigenfunctions (as h ! 0) outside the classically permitted region without having to know an explicit formula. Various approaches can be used. The first one fits very well in the case of the Schro¨dinger operator (more generally to h-pseudodifferential operators with symbols admitting holomorphic extensions in the variable) and gives exponential decay. This is based on the so-called Agmon estimates (developed in the semiclassical context by Helffer– Sjo¨strand and Simon). We shall not say more about this approach, which is the starting point of the analysis of the tunneling (see Helffer (1988), Dimassi and Sjo¨strand (1999), and Martinez (2002)). The second one is an elementary application of the h-pseudodifferential formalism which will be described later and leads, for example, to the following statement. Let E in I and let ((hj ), (hj ) (x)) be a sequence of spectral pairs in I L2 (Rn ), where hj ! 0 as j ! þ1, (hj ) ! E, and x 7! (hj ) (x) is an L2 -normalized eigenfunction associated with (hj ). Let be a relatively compact set in Rn such that ¼; V 1 ð 1; EÞ \ Then, there exists, for all integer N, a constant CN, such that k ðhj Þ kL2 ðÞ CN; h N j A third one uses the notion of frequency set and will be discussed later (see also the book of Martinez (2002) for what can be done with the Fourier–Bros– Iagolnitzer transform as developed by J Sjo¨strand).
Brief Introduction to the h-Pseudodifferential Calculus For fixed h, the pseudodifferential calculus has a long story starting in its modern form in the 1960s. A rather achieved version of the calculus is presented in Ho¨rmander (1984). We will emphasize here on the semiclassical aspect of the calculus, that is, on the dependence of the calculus on the parameter h > 0. h-Pseudodifferential Calculus
Basic calculus: the class S0 We shall mainly discuss the most simple one called the S0 calculus. Let us first say that the S0 calculus is sufficient once we have suitably (micro)-localized the problem (e.g., by the functional calculus). Note that it is also sufficient for the local analysis of many problems occurring on compact manifolds.
This class of symbols p is simply defined by the conditions: j@x @ pðx; Þj C;
½7
for all (, ) 2 Nn Nn . The symbols can possibly be h-dependent. With this symbol, one can associate an h-pseudodifferential operator by [3]. This operator is a continuous operator on S(Rn ) but can also be defined by duality on S 0 (R n ). The first basic analytical result is the Calderon– Vaillancourt theorem (see Ho¨rmander (1984)) establishing the L2 -continuity. We also mention that if p is in L2 (R 2n ), the associated operator is Hilbert– Schmidt. One can also give conditions on p implying the trace-class property (replace the uniform control in [7] by a control in L1 ). The second important property is the existence of a calculus. If a is in S0 and b is in S0 then the composition aw (x, hDx ) bw (x, hDx ) of the two operators is a pseudodifferential operator associated with an h-dependent symbol c in S0 : aw ðx; hDx Þ bw ðx; hDx Þ ¼ cw ðx; hDx ; hÞ We see here that we immediately meet symbols admitting expansions in powers of h, which we shall call regular symbols, in the sense that they admit expansions of the type X aðx; ; hÞ aj ðx; Þh j j
bðx; ; hÞ
X
bj ðx; Þh j
j
In this case the Weyl symbol c of the composition has a similar expansion: ih ðDx D Dy D Þ cðx; ; hÞ exp 2 ðaðx; ; hÞ bðy; ; hÞÞ x¼y; ¼
The symbol a0 is called the principal symbol. At the level of principal symbols, the rule is simply that the principal symbol of aw bw is the product of the principal symbols of aw and bw: c0 = a0 b0 . Another important property is the following correspondence between commutator of two operators and Poisson brackets. The principal symbol of the commutator (1=h)(aw bw bw aw ) is (1=i){a0 , b0 }, where {f , g} is the Poisson bracket of f and g: ff ; ggðx; Þ ¼ Hf g X ¼ @j f @xj g @xj f @j g j
h-Pseudodifferential Operators and Applications
About global classes The class S0 is far from being sufficient for analyzing the global spectral problem and we refer the reader to Ho¨rmander (1984) or Robert (1987) for an extensive presentation of the theory and for the discussion of other quantizations. Our initial operators (think of the harmonic oscillator) do not belong to these classes of pseudodifferential operators. We are consequently obliged to construct more general classes including these examples in order to realize this localization. Once such a class is introduced, one of the main points to consider is the existence of a quasi-inverse (or parametrix) for a suitably defined elliptic operator of positive order. Following Beals–Feffermann (see also the most general Ho¨ rmander calculus in Ho¨ rmander (1984) and references therein), we introduce a scale function (possibly h-dependent; typically, m(x, ; h) = h m0 (x, )) (x, ) 7! m(x, ; h) 1 and C strictly positive weight functions and such that 1. All these functions are strictly positive and should satisfy additional conditions on their variation and growth. The class of symbols Sreg (m, , ) is defined by jDx D pðx; ; hÞj C; mðx; ; hÞ ðx; Þjj ðx; Þj j These apparently complicated estimates permit actually the control of the variation of the symbol in reference balls defined by
705
in L(L2 ) is O(h1 ) (or simply O(h) at the first step). qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi With other weights like = = 1 þ jxj2 þ jj2 , we invert P modulo a remainder, which has, in addition, a distribution kernel in the Schwartz space S(Rn Rn ). The invertibility modulo a compact operator (which implies the Fredholm property) is a consequence of the assumption lim
jxjþjj!þ1
ðx; Þðx; Þ ¼ þ1
The proof is rather easy, once the formalism of composition and the notion of principal symbol have been understood. One can indeed start from the operator Q0 of symbol 1=p and observe that Q0 P = I þ R1 holds, with R1 in OpwP (S((h= ), , )). The j operator (I þ R1 )1 Q0 ( j 0 (1)j R1 )Q0 gives essentially the solution. Essential Self-Adjointness and Semiboundedness
We now sketch two applications of this calculus in spectral theory. We shall usually consider in our applications an h-pseudodifferential operator P, whose Weyl symbol p is regular, that is, admitting an asymptotic expansion: X hj pj ðx; Þ ðH0Þ pðx; ; hÞ j 0
2 ðx0 ; 0 Þjx x0 j2 þ 2 ðx0 ; 0 Þj 0 j2 c
(We refer to Robert (1987), Ho¨rmander (1984), and Dimassi and Sjo¨strand (1999) for a more precise formulation). Moreover, we assume that
Elliptic theory As noted above, the main point is to have a large class of invertible operators, such that the inverses are also in the class. This is what we call an elliptic theory and the typical statement is:
ðH1Þ ðx; Þ 7! pðx; ; hÞ 2 R
Theorem 2 Let P be an h-pseudodifferential operator associated with a symbol p in Sreg (m, , ). We assume that it is elliptic in the sense that 1/p belongs to Sreg (1=m, , ). Then there exists an h-pseudodifferential operator Q with symbol in Sreg (1=m, , ), such that QP ¼ I þ R;
PQ ¼ I þ S
The remainders R and S are pseudodifferential operators with symbols in ! \ h N S ; ; N These remainders are called ‘‘regularizing.’’ Note that this notion depends strongly on the choice of the class of pseudodifferential operators! When = = 1, we are just inverting modulo a remainder whose norm
This implies, as can be immediately seen from [3], that pw is symmetric (= formally self-adjoint): hpw u; viL2 ¼ hu; pw viL2 ;
8u; v 2 SðRn Þ
The third assumption is that the principal symbol is bounded from below (and there is no restriction to assume that it is positive) ðH2Þ
p0 ðx; Þ 0
This assumption implies that the operator itself is bounded from below. This result belongs to the family of the so-called Garding inequalities. More precisely, the assumption (there are other quantizations, e.g., the anti-Wick quantization, for which this result becomes trivial, the difference between the two quantizations being O(h)) will basically give, if m 1, the existence of a constant C such that, for any u 2 S(Rn ), hPu; uiL2 L2 C hjjujj2
706 h-Pseudodifferential Operators and Applications
Everything is proved if m(x, ) = (p0 þ 1) is a scale function, if pj and their derivatives are controlled by (p0 þ 1): ðH3Þ j@x @ pj ðx;Þj C; ;j ðp0 þ 1Þ ðx;Þjjj ðx;Þjj j n
n
for all (, ) 2 N N , and if there is a suitable control of the family (N 2 N) of symbols ðNþ1Þ N X j pj h p
ðp0 þ 1Þh
j¼0
Under these assumptions, the main result is that P is, for h small enough, essentially self-adjoint. This means that the operator which was initially defined on S(R n ) by the pseudodifferential operator with symbol p admits a unique self-adjoint extension.
It is well known by the spectral theorem for a selfadjoint operator P that a functional calculus exists for Borel functions. What is important here is to find a class of functions (actually essentially C1 0 ) such that f (P) is a pseudodifferential operator in the same class as P with simple rules of computation for the principal symbol. We are starting from the general formula (see Dimassi and Sjo¨strand (1999))
!0
pf ;0 ¼ f ðp0 Þ pf ;1 ¼ p1 f 0 ðp0 Þ pf ;j ¼
2j1 X
ZZ
@ ~f ðx; yÞðz PÞ1 dx dy z jIm zj @
which is true for any self-adjoint operator and any f ~ in C1 0 (R). Here the function (x, y) 7! f (x, y) (note that z = x þ iy) is a compactly supported, almost analytic extension of f to C. This means that ~f = f on R and that for any N 2 N there exists a constant CN such that ~ N @f ðzÞ CN Im z @z The main result due to Helffer–Robert (see also Dimassi and Sjo¨strand (1999) and references therein) is that, for P an h-regular pseudodifferential operator satisfying (H0)–(H3) and f in C1 0 (R), the operator f (P) is an h-pseudodifferential operator, whose Weyl symbol pf (x, ; h) admits a formal expansion in powers of h: X pf ðx; ; hÞ hj pf ;j ðx; Þ j 0
ð1Þk ðk!Þ1 dj;k f ðkÞ ðp0 Þ;
8j 2
k¼1
where the dj, k are universal polynomial functions of the symbols @x @ p‘ , with jj þ j j þ ‘ j. The main point in the proof is that we can construct, for Im z 6¼ 0, a parametrix (= approximate inverse) for (P z) with a nice control as Im z ! 0. The constants controlling the estimates on the symbols are exploding as Im z ! 0 but the choice of the almost analytic extension of f absorbs any negative power of jIm zj. As a consequence, we get that if, for some interval I and some 0 > 0, ðH4Þ
The Functional Calculus
f ðPÞ ¼ 1 limþ
with
p1 0 ðI þ ½ 0 ; 0 Þ is compact
then the spectrum is, for h small enough, discrete in I. In particular, we get that, if p0 (x, ) ! þ1 as jxj þ jj ! þ1, then the spectrum of Ph is discrete (Ph has compact resolvent). Under the assumption (H4), we get more precisely the following theorem. Theorem 3 Let P be an h-regular pseudodifferential operator satisfying (H0)–(H4), with I = [E1 , E2 ], then, for any g in C1 0 ([E1 , E2 ]), we have the following expansion in powers of h: tr½gðPðhÞÞ hn
X
hj Tj ðgÞ;
as h ! 0
j 0
where g 7! Tj (g) are distributions in D0 (]E1 , E2 [). In particular, we have T0 ðgÞ ¼ ð2Þn T1 ðgÞ ¼ ð2Þn
ZZ ZZ
gðp0 ðx; ÞÞ dx d g0 ðp0 ðx; ÞÞp1 ðx; Þ dx d
This theorem is just obtained by integration of the preceding one, because in these cases the trace of a trace-class pseudo-differential operator Opw (a) is given by the integral of the symbol a over R2n = Rnx Rn . According to [3], the distribution kernel is given by the oscillatory integral: Z i n exp ðx yÞ Kðx; y; hÞ ¼ ð2hÞ h Rn
x þ y a ; ; h d ½8 2
h-Pseudodifferential Operators and Applications
and the trace of Opw (a) is the integral over R n of the restriction to the diagonal of the distribution kernel: Z Kðx; xÞ ¼ ð2hÞn aðx; ; hÞ d Rn
Of course, one could think of using the theorem with g, the characteristic function of an interval, in order to get, for example, the behavior of the counting function attached to this interval. This is of course not directly possible and this will be obtained only through Tauberian theorems (Ho¨rmander (1968), (1984), Ivrii (1998)) and at the price of additional errors. Let us, however, remark that, if the function g is not regular, then the length of the expansion depends on the regularity of g. So it will not be surprising that, by looking at the Riesz means, we shall get a better expansion when s is large. Anyway, one basic interest of functional calculus is to permit a localization in the energy of the operator. For a general h-pseudodifferential operator, it could be difficult to approximate an operator like exp(itP=h) by suitable Fourier integral operators but approximate exp(itP=h)f (P) for suitable compactly supported f could be easier. Another interest is that for suitable f (possibly h-dependent) the operator f (P) could have better properties than the initial operator. This idea will, for example, be applied for the theorem concerning clustering. It appears, in particular, very powerful in dimension 1, where we can in some interval of energy find a function t 7! f (t; h) admitting an expansion in powers of h such that f (P; h) has the spectrum of the harmonic oscillator. This is a way to get the Bohr– Sommerfeld conditions (see Helffer–Robert (1987), together with Maslov (1972), Leray (1981), or the thesis of A Voros in 1977), which reads: f ðn ðhÞ; hÞ ð2n þ 1Þh
707
We shall denote by t the graph of t which is a Lagrangian submanifold (which means that at each point m of the manifold the restriction of the symplectic two-form to Tm t P is 0) for the 2-form P on R2n R2n : j dj ^ dyj j dj ^ dxj . When the projection (y, , x, ) 7! (, x) gives a local system of coordinates for t (and this will always be the case for (, x) in a compact set and t small enough), one easily finds, using the Lagrangian character of t , a function (, x) 7! St (x, ) such that t ¼ fy; ; x; j y ¼ @ St ; ¼ @x St g This function is only defined modulo an arbitrary function of t. In order to get a more natural choice, we consider the Lagrangian submanifold in R2n R 2n R 2 defined as ¼ fy; ; x; ; t; jðx; Þ ¼ t ðy; Þ; ¼ p0 ðx; Þg
½9
The parametrization of , by its projection (y, , x, , t, ) 7! (, x, t), will now give a natural function (, x, t) 7! S(t, x, ) = St (x, ) describing by ¼ fy; ; x; ; t; j ¼ @x S; y ¼ @ S; ¼ @t Sg
½10
We observe that we can choose Sð0; x; Þ ¼ x
½11
and that S is automatically a solution of the Hamilton– Jacobi equation ð@t SÞðt; x; Þ þ p0 ðx; @x Sðt; x; ÞÞ ¼ 0
½12
also called the eiconal equation. We also observe the following property (by comparison of [9] and [10]): t ð@ Sðt; x; Þ; Þ ¼ ðx; @x Sðt; x; ÞÞ
modulo Oðh1 Þ
We have actually an explicit expression of S(t, x, ) in term of the inverse y(t, x, ) of the map y 7! x(t, y, ):
h-Fourier Integral Operators and Evolution Operators Classical Mechanics
Sðt;x;Þ ¼ yðt; x; Þ Z t hX i ðs; y; Þ ð@i pÞðxðs; y; Þ; ðs; y;ÞÞ þ 0
Let us come back to the Hamilton equations [1]. The local existence of solutions is well known. If, in addition, we assume (H4), the energy conservation law implies global existence for these solutions, if the initial data (y, ) belong to p1 (I). We recall that (y, ) 7! t (y, ) = (x(t, y, ), (t, y, )) defines for any t a canonical transformation, that is, a diffeomorphism respecting the symplectic 2-form: X ¼ dj ^ dxj j
i
i pðy; Þ ds=y¼yðt;x;Þ For the harmonic oscillator, easy computations give
pðx; Þ ¼ 12 ð2 þ x2 Þ;
Hp ¼ ð; xÞ
t
ðy; Þ ¼ ðy cos t þ sin t; y sin t þ cos tÞ and Sðt; x; Þ ¼ 12ðx2 þ 2 Þ ðtan tÞ þ
x cos t
708 h-Pseudodifferential Operators and Applications Fourier Integral Operators
Quantum Evolution
We have already given in [8] the distribution kernel of an h-pseudodifferential operator. It appears useful to generalize this point of view by considering more generally objects defined similarly as Z i r Kðx; y;hÞ ¼ ð2hÞ exp ðx;y;Þ aðx; y; ;hÞ d h RN
We just sketch how one approximates the operator exp (itP=h) by an FIO. The formal construction is probably rather old (Maslov 1972, Fedoryuk and Maslov 1981) but the rigorous approach with estimates of the remainders was first considered by J Chazarain with rather strong assumptions. It has been later realized that we need only a local approximation of this operator and everything becomes easier. The first approach followed by Helffer–Robert (see Robert (1987)) is to localize in energy, within the functional calculus associated to the operator P. If I is an interval and is with compact support in I, it appears to be easier to approximate exp (itP=h)(P) when P satisfies (H4) in a neighborhood of I. We do not need any more assumptions at 1 and the composition by (P) localizes the construction. Although this construction is simple because we remain within a functional calculus which involves only functions of P, it is not always sufficient to localize in energy. We have then to localize through more general h-pseudodifferential operators and consider exp (itP=h)aw (x, hDx ), where a is a symbol with compact support. We shall quickly develop the first approach. The result is that one can approximate U (t) := (P) exp (itP=h) by a Fourier integral operator of the form Z osc i n K ðt; x;y;hÞ ¼ ð2hÞ exp ðSðt;x;Þ y Þ h d ðt; x;; hÞ d P with d j d,j hj , in order to have 1 ðPÞ exp itP K ðtÞ 2 ¼ Oðh Þ h LðL Þ
There are a lot of examples entering in this framework. The representation of the metaplectic group in L2 (Rn ) appears to be in this class, with the specificity that the phase is quadratic (Guillemin and Sternberg (1977)). A quite elementary case corresponds to the case when N = 0 and (x, y)= x y. No variable is present and so no integration appears with respect to . When a = 1, this defines essentially the Fourier transform. Under suitable conditions on and a, one can show that the associated operators are continuous on S(Rn ) (this is, of course, the case for the Fourier transform). This was done by Asada and Fujiwara, who transpose the theory developed by Ho¨rmander (1971) in this context, and we should also mention the older (but more formal) work by Maslov (1972) (see also Leray (1981)). We actually do not need it in the semiclassical context because the case when the amplitude is with compact support is sufficient. The basic object is first to look, thinking of the stationary-phase theorem, which gives the main contribution as h ! 0 in this ‘‘formal integral’’ (see Stationary Phase Approximation), at the critical set C :
C ¼ fðx; y; Þ 2 Rn Rn RN jð@ Þðx; y; Þ ¼ 0g In the case of a pseudodifferential operator, we find that it is included in {x = y}. Then we associate the canonical object, which is a Lagrangian submanifold called and defined as ¼ fðx; ; y; Þj9 s:t: ¼ rx ðx; y; Þ; ¼ ry ðx; y; Þ; r ðx; y; Þ ¼ 0g The assumptions on (which are omitted here) are given in order to get that is a regular manifold at least on the support of a. The associated operators are called Fourier integral operators (FIOs). L. Ho¨rmander (1971, 1984) has developed a general and more intrinsic machinery but with a homogeneity condition on the phase which is irrelevant in the semiclassical context. This theory permits also the reduction to normal forms for Hamiltonians in continuation of what can be done in classical mechanics.
Writing that U (t) is a solution of (hDt þ P)U = 0, (U )(0) = (P), and expanding in powers of h, one gets a sequence of equations permitting to determine recursively the symbols. The first one was analyzed in [12] and reads, in the case when P =h2 þ V: ð@t SÞðt; x; Þ þ jrx Sðt; x; Þj2 þ VðxÞ ¼ 0 with the initial condition S(0, x, ) = x . This has been solved for t small enough. The other equations are called transport equations. The first one is, for a(t, x, ) = d, 0 (t, x, ), @t a þ ð@ p0 Þðx; @x Sðt; x; ÞÞ @x a þ ca ¼ f with initial condition a(0, x, ) = (p0 (x, )). This type of equation is easily solved by integration along the integral curves of the vector field @t þ (@ p0 )(x, @x S(t, x, )) @x .
h-Pseudodifferential Operators and Applications
j ! 1, x 7! (hj ) (x) is an associated eigenfunction to (hj ) with norm 1. Then
Applications The Frequency Set
One has already met the question of localization of the eigenfunctions. It appears important to give this localization, not only in position (in domain of R ) but directly in the phase space. This can be described by the notion of frequency set attached to a bounded family uh of functions in L2 (R ) (or more generally of distributions in S 0 (R )). Here h belongs to an interval (0, h0 ] or more generally to a subset of Rþ having 0 as accumulation point. Definition 4 We shall say that (x0 , 0 ) 2 R R does not belong to the frequency set of the family uh and write (x0 , 0 ) 62 FS(uh ), if there exists a compactly supported function equal to 1 in a neighborhood of x0 and a neighborhood V of 0 in which the h-Fourier transform of uh satisfies, as h ! 0, Z
709
ix exp ðxÞuh ðxÞ dx ¼ Oðh1 Þ in V h
For example, the frequency set FS(uh ) of uh (x) = (x) exp (i (x)=h) with compactly supported is contained in {(x, ) j x 2 supp , = rx (x)}, and the frequency set of the coherent state, iðx yÞ h ! ðx yÞ2 exp h
x 7! y;;h ðxÞ ¼ h=4 exp
is reduced to a point (y, ). In this semiclassical context, this notion seems to have been introduced by Guillemin and Sternberg (1977) and is further discussed in the book of Robert (1987) (see references therein). This is the semiclassical analog of the well-known notion of wave front set of a distribution introduced by Ho¨rmander (1984) in the C1 -category for describing the singularities of a distribution, but note that a major difference is that the frequency set is attached to a family. If P is an h-pseudodifferential operator with symbol in S0 , it is possible, as a consequence of the elliptic theory, to prove that: FS(Pu(h) ) FS(u(h) ). For an FIO F attached to a canonical relation , we get similarly: FS(Fu(h) ) (FS(u(h) )). We also get a microlocal version of the localization result for the eigenfunctions mentioned in the first section (using again the parametrix construction). Theorem 5 Let E be in I and let ((hj ), (hj ) (x)) be a sequence in I L2 (Rn ), where (hj ) ! E and hj ! 0 as
FSð ðhj Þ Þ p1 0 ðEÞ Moreover, the frequency set of the family (hj ) is invariant under the Hamiltonian flow t . The last statement in the above theorem is the analog of the theorem on the propagation of singularities for the solution of a partial differential equation (PDE) (see Ho¨rmander (1984)) and is a consequence of the Egorov theorem, which will be presented in the next subsection. Another remarkable property is that (see, e.g., the report on the lecture of T Paul in Rauch and Simon (1997), say, in the case of dimension 1, when P is a harmonic oscillator, then exp (it=hP) y, ; h is a coherent state attached to t (y, ). Egorov’s Theorem
Egorov’s theorem plays a central role in the classical theory of PDE by permitting to reduce the study of general differential operators to the study of simpler model operators, the simplest one being @=@xn (see Ho¨rmander (1984)). We use it here in a simple form, given in the semiclassical context by Robert (1987), and which will play an important role in the study of ergodic situations (see Quantum Ergodicity and Mixing of Eigenfunctions, and references therein). The theorem is the following: Theorem 6 Let P satisfy assumptions (H0)–(H3). For all a’s in S0 with compact support and all t 2 R, we have
t
t exp i P aw ðx; hDx Þ exp i P h h w at ðx; hDx Þ 2 ¼ OðhÞ LðL Þ
where at ðx; Þ ¼ að t ðx; ÞÞ and t is the flow of Hp0 , where p0 is the principal symbol of P. The proof is based on the study of the operator exp (iðt=hÞP) aw (x, hDx ) exp (iðt=hÞP), which appears as the composition of three FIOs. But the Lagrangian manifold associated with this composition is the graph of the identity, and this is consequently a pseudodifferential operator whose ‘‘principal’’ symbol can be computed modulo O(h) as a( t (x, )). As an immediate consequence, FS(exp (itP=h)uh ) = t (FS(uh )).
710 h-Pseudodifferential Operators and Applications The Poisson Relation
We start from the harmonic oscillator ! 2 1 2 d 2 h þx HðhÞ ¼ 2 dx2 Its spectrum is given by (n þ 1=2)h (n 0). Its symbol is a0 (x, ) = (1=2)(2 þ x2 ) and the corresponding flow, for any strictly positive level E, is periodic with primitive period 2. The quantity we are interested in is X Sh ðtÞ :¼ j þ 12 h exp it j þ 12 j2N
Using the classical Poisson relation, X X ^f ðkÞ expðikxÞ ¼ ð2Þ f ðx þ 2kÞ k2Z
k2Z
one shows rather easily that the frequency set of S is FS ðSh Þ ¼ fð2k; Þj > 0;
2 supp ; k 2 Zg [ ðR f0g This admits the following generalization, initiated in this context by Chazarain. Theorem 7 Let P satisfy (H0)–(H4). Let be a function with compact support in I and let t 7! f (t; h) be the family of distributions defined by itP ðPÞ f ðt; hÞ ¼ tr exp h Then FS(f ) is contained in fðt; Þj 2 supp ðÞ and 9ðx; Þ s:t: p0 ðx; Þ ¼ ; t ðx; Þ ¼ ðx; Þg According to the definition, we have to study Z it exp ðtÞf ðt; hÞdt h This takes the form Z i cðt; x; Þ exp ðt þ Sðt; x; Þ xÞdt dx d h and can be analyzed by a nonstationary-phase theorem, in order to determine for which value of
the quantity is O(h1 ). Gutzwiller’s Formula
The Gutzwiller formula was established formally by Gutzwiller (1971). It then appears in the context of high-energy spectral asymptotics in contributions of Colin de Verdie`re, Chazarain, and Duistermaat
and Guillemin (see Duistermaat and Guillemin (1975), Ho¨rmander (1984), Guillemin and Sternberg (1977); see also Semi-Classical Spectra and Closed Orbits and Quantum Ergodicity and Mixing of Eigenfunctions). In the semiclassical context, the simplest statement (cf. Chazarain, Helffer–Robert, Guillemin–Uribe, Meinrencken, Paul–Uribe, Dozias, Combescure–Ralston– Robert – see Robert (1987), Rauch and Simon (1997), Dimassi and Sjo¨strand (1999), and in the recent article by Combescure et al. (1999) for techniques involving coherent states) can be presented in the following way. For a noncritical E, we introduce the energy surface WE = {w 2 T Rn j p0 (w) = E}. Let P(h) an h-pseudodifferential operator satisfying (H0)–(H4), with I = {E}. We also assume that (Cl) The restriction of the flow tp0 to WE is clean. (A flow t , associated with a C1 -vector field X on a manifold W, is called clean if the two following properties are satisfied: the set = {(t, w) 2 R W j t (w) = w} is a submanifold of R W; in each point = (t, w) of , the tangent space to is given by T = {( , v) 2 R Tw W j X(w) þ (D t )(w) v = v}.) Then there exists a sequence of distributions k 2 D0 (R), such that, for all 2 S(R) with compactly supported Fourier transform, we have the asymptotic expansion in powers of h: X ðh1 ðj ðhÞ EÞÞ j ðhÞ2½E 0 =2;Eþ 0 =2
1 X
^ nþ1þj j ð Þh
½13
j¼0
Moreover, the supports of the distributions are contained in the set of the periods of the periodic trajectories of the flow contained in WE . Actually, the proof gives more information on the structure of the different distributions. Let us just write the formula for 0 : ! Z d n=2 dxd 0 0 ¼ ðÞ d pðx;Þ ¼E
where 0 is the Dirac measure at 0. Clustering of Eigenvalues
We shall mention one typical result due to Chazarain– Helffer–Robert in this context, but inspired by previous results obtained for the Laplacian on compact manifolds (see Semi-Classical Spectra and Closed Orbits, Quantum Ergodicity and Mixing of Eigenfunctions and references therein, including Chazarain, Duistermaat–Guillemin, and Colin de Verdie`re).
h-Pseudodifferential Operators and Applications
Clustering means that the spectrum is concentrated around a specific sequence tending to 1. This was observed in the case of the Laplacian on the sphere Sn1 by explicit computations. Here we assume that, with I = [E1 ,E2 ], the conditions (H0)–(H4) are satisfied and that
(H5) [E1 , E2 ] does not meet the set of critical values of p0 .
(H6) 8E 2 [E1 , E2 þ ], WE is connected. (H7) 8E 2 [E1 , E2 þ ], the Hamiltonian flow associated with p0 is periodic, with period T(E) > 0, on WE (with T(E) bounded). (H8) 8E 2 [E1 , E2 þ ], the subprincipal p1 vanishes on WE . Then, under these conditions, one first observes that for a suitable C1 -function f defined in a neighborhood of [E1 , E2 ], the period of the Hamiltonian flow associated with f (p0 ) can be chosen as constant and equal to 2. Extending the function f suitably, one can then state the following result of Chazarain–Helffer–Robert: Theorem 8 There exists h0 and C such that, for 0 < h h0 , [ ðf ðPðhÞÞÞ \ ½E1 ; E2 Ik ðhÞ k2Z
where S h þ kh Ch2 ; Ik ðhÞ ¼ 2 4 S h þ kh þ Ch2 2 4 Z S¼ dx 2E
for some (hence for any) periodic trajectory of period 2, and is the Maslov index of this trajectory. Moreover, one can compute the multiplicity, in each of the intervals Ik . The property remains true (e.g., Dozias proved this (see Rauch and Simon (1997)) in the case when the assumption is made only for one energy E, but in intervals [E ah, E þ ah], where a can be large but h is small enough. Remark 1 These results appear first in the context of high energy for Laplacians on compact manifolds. After illuminating contributions by physicists like Balian– Bloch, the main ideas (see the presentation in SemiClassical Spectra and Closed Orbits) appear in the works of Colin de Verdie`re, Chazarain, Duistermaat– Guillemin (1975), and Weinstein (see also Ho¨rmander (1984) and Quantum Ergodicity and Mixing of Eigenfunctions). The proof given in the semiclassical context is actually more general (it contains the case of the
711
Laplacian on a Riemannian manifold) and shows that the results are true for general Hamiltonians. Remark 2 (the case of dimension 1). In this particular case, the flow is periodic and the above theorems gives the localization of the problem predicted by the Bohr–Sommerfeld relations and the computation of the multiplicity gives nk (h) = 1 for h small enough. This point of view was developed by Helffer–Robert (1987) (see Semi-Classical Spectra and Closed Orbits). Similar properties have been extended to the case of integrable systems by Colin de Verdie`re in the high-energy context and in the semiclassical context by Charbonnel and Ivrii (see Ivrii (1998), Dimassi and Sjo¨strand (1999), and references therein). Remark 3 Another interesting application of semiclassical analysis concerns the Schnirelman theorem treating the case when the flow is ergodic. We refer the reader to Quantum Ergodicity and Mixing of Eigenfunctions for references and to Helffer– Martinez–Robert (see Rauch and Simon (1997) for references) for the specific statement for general Hamiltonians in semiclassical analysis.
Conclusions and Suggestions for Further Reading In this brief survey we have tried to present some of the foundational techniques appearing in the ‘‘mathematical’’ semiclassical analysis. Of course, this is very limited, and semiclassical methods go far beyond the verification of the correspondence principle. One can refer to semiclassical analysis for many other problems where the same analysis (with a small parameter h) is relevant but where h is no more the Planck constant. This could be a flux (Harper’s equation) or the inverse of a flux, the inverse of a mass (Born–Oppenheimer’s approximation), of an energy, or of a number of particles. We have not developed this point of view here. The books given in the bibliography will allow the reader to discover other fields. The books by Robert (1987), Helffer (1988) and Dimassi and Sjo¨strand (1999) present the basic statements of the theory. The book by Martinez (2002) is more ‘‘microlocal’’ in spirit. The lectures published in Rauch and Simon (1997) give a rather good idea of the state of art in the middle of the 1990s, and we also refer the reader to other articles in this encyclopedia for the presentation of the resonances (see Resonances), spectral problems connected with ergodicity (see Quantum Ergodicity and Mixing of Eigenfunctions), Kolmogorov–Arnol’d–Moser theory (see Normal Forms and Semi-Classical Approximation), and trace formulas (see Semi-Classical Spectra and Closed Orbits). The book by Ivrii (1998) gives the
712 Hubbard Model
most sophisticated theorems on the counting functions (including boundaries, singularities, . . .) but is only written for specialists. See also: Normal Forms and Semiclassical Approximation; Quantum Ergodicity and Mixing of Eigenfunctions; Resonances; Schro¨dinger Operators; Semiclassical Spectra and Closed Orbits; Stability of Matter; Stationary Phase Approximation.
Further Reading Combescure M, Ralston J, and Robert D (1999) A proof of the Gutzwiller semi-classical trace formula using coherent states decomposition. Communications in Mathematical Physics 202(2): 463–480. Dimassi M and Sjo¨strand J (1999) Spectral Asymptotics in the SemiClassical Limit, London Mathematical Society Lecture Notes Series, vol. 268. Cambridge: Cambridge University Press. Duistermaat JJ (1973) Fourier Integral Operators. Courant Institut Mathematical Society. New York: New York University. Duistermaat JJ and Guillemin VW (1975) The spectrum of positive elliptic operators and periodic bicharacteristics. Inventiones Mathematicae 29: 39–79. Fedoryuk MV and Maslov VP (1981) Semi-Classical Approximation in Quantum Mechanics. Dordrecht: Reidel. Guillemin V and Sternberg S (1977) Geometric Asymptotics, American Mathematical Surveys, No. 14. Providence, RI: American Mathematical Society. Gutzwiller M (1971) Periodic orbits and classical quantization conditions. Journal of Mathematical Physics 12: 343–358.
Helffer B (1988) Introduction to the Semi-Classical Analysis for the Schro¨dinger Operator and Applications, Springer Lecture Notes in Mathematics, No. 1336. Berlin: Springer. Helffer B and Sjo¨strand J (1984) Multiple wells in the semiclassical limit I. Communications in Partial Differential Equations 9(4): 337–408. Helffer B and Sjo¨strand J (1989) Analyse semi-classique pour l’e´quation de Harper III. Me´moire de la SMF, No. 39. Supple´ment du Bulletin de la SMF, Tome 117, Fasc.4. Ho¨rmander L (1968) The spectral function of an elliptic operator. Acta Mathematica 121: 193–218. Ho¨rmander L (1971) Fourier integral operators I. Acta Mathematica 127: 79–183. Ho¨rmander L (1979) The Weyl calculus of pseudodifferential operators. Communications Pure and Applied Mathematics 32: 359–443. Ho¨rmander L (1984) The Analysis of Linear Partial Differential Operators, Grundlehren der Mathematischen Wissenschaften. Berlin: Springer. Ivrii V (1998) Microlocal analysis and precise spectral asymptotics, Springer Monographs in Mathematics. Berlin: Springer. Leray J (1981) Lagrangian Analysis and Quantum Mechanics. A Mathematical Structure Related to Asymptotic Expansions and the Maslov Index. Cambridge: MIT Press. (English translation by Carolyn Schroeder.) Martinez A (2002) An Introduction to Semi-Classical and Microlocal Analysis. Universitext. New York: Springer-Verlag. Maslov VP (1972) The´orie des perturbations et me´thodes asymptotiques. Paris: Dunod Gauthier-Villars. Rauch J and Simon B (eds.) (1997) Quasiclassical Methods, The IMA Volumes in Mathematics and its Applications, vol. 95. New York: Springer-Verlag. Robert D (1987) Autour de l’approximation semi-classique, Progress in Mathematics, No. 68. Boston, MA: Birkha¨user.
Hubbard Model H Tasaki, Gakushuin University, Tokyo, Japan ª 2006 Elsevier Ltd. All rights reserved.
Definitions The Hubbard model is a standard theoretical model for strongly interacting electrons in a solid. It is a minimum model which takes into account both quantum many-body effects and strong nonlinear interaction between electrons. Here we review rigorous results on the Hubbard model, placing main emphasis on magnetic properties of the ground states. Let the lattice be a finite set whose elements x, y, . . . 2 are called sites. Physically speaking, each site corresponds to an atomic site in a crystal. The Hubbard model is based on the simplest tightbinding description of electrons (Figure 1), where a single state is associated with each site. For each x 2 and 2 {", #}, we define the creation and the annihilation operators cyx, and cx, , respectively, for an electron at site x with
spin . (Ay is the adjoint or the Hermitian conjugate of A.) These operators satisfy the canonical anticommutation relations n o cyx; ; cy; ¼ x;y ; n o ½1 cyx; ; cyy; ¼ fcx; ; cy; g ¼ 0 for any x, y 2 and , = ", # , where {A, B} = AB þ BA. The number operator is defined by nx; ¼ cyx; cx;
½2
which has eigenvalues 0 and 1. The Hilbert space of the model is constructed as follows. Let vac be a normalized vector state which satisfies cx, vac = 0 for any x 2 and = ", #. Physically, vac corresponds to a state where there are no electrons in the system. For arbitrary subsets " , # , we define ! ! Y y Y y cx;" cx;# vac ½3 " ;# ¼ x2"
x2#
Hubbard Model
(a)
(b)
(c)
(d) Figure 1 A highly schematic figure which explains the philosophy of tight-binding description. (a) A single atom which has multiple electrons in different orbits. (b) When atoms come together to form a solid, electrons in the black orbits become itinerant, while those in the light gray orbits are still localized at the original atomic sites. Electrons in the gray orbits are mostly localized around the atomic sites, but tunnel to nearby gray orbits with nonnegligible probabilities. (c) We only consider electrons in the gray orbits, which are expected to play essential roles in determining various aspects of low-energy physics of the system. (d) If the gray orbit is nondegenerate, we get a lattice model in which electrons live on lattice sites and hop from one site to another. In a simplified treatment of a metal, the black and the gray orbits correspond to the 4s and the 3d bands, respectively.
in which sites in " are occupied by up-spin electrons and sites in # by down-spin electrons. We fix the electron number Ne , which is an integer satisfying 0 < Ne 2jj. (We denote by jSj the number of elements in a set S.) The Hilbert space for the system with Ne electrons is spanned by the basis states [3] with all subsets " and # such that j" j þ j# j = Ne . ^tot = (^ ^(2) ^(3) We define total spin operators S S(1) tot , Stot , Stot ) by 1X y ðÞ ^ Stot ¼ c ðpðÞ Þ; cx; ½4 2 x2 x; ;¼";#
for = 1, 2, and 3. 0 ð1Þ p ¼ 1 1 pð3Þ ¼ 0
Here p() are the Pauli matrices 1 0 i ð2Þ ; p ¼ 0 i 0 ½5 0 1
^tot are the generators of global SU(2) The operators S rotations of the spin space. As usual, we denote the ^tot )2 as Stot (Stot þ 1). The maximum eigenvalue of (S possible value of Stot is Smax = Ne =2 when Ne jj, and Smax = jj (Ne =2) when Ne jj. The most general Hamiltonian of the Hubbard model is X X H¼ tx;y cyx; cy; þ Ux nx;" nx;# ½6 x;y2 ¼";#
x2
713
Here the first term describes quantum-mechanical motion of electrons which hop around the lattice according to the amplitude tx, y = ty, x 2 R. Usually, tx, y is nonnegligible only when the two sites x and y are close to each other. The second term represents nonlinear interaction between electrons. There is an increase in energy by Ux 2 R when the site x is occupied by both up-spin electron and down-spin electron. We usually set Ux > 0 to mimic (screened) Coulomb interaction between electrons. The Hamiltonian H commutes with the total spin operator ^S() tot for = 1, 2, and 3. One can thus ^tot )2 and H. investigate simultaneous eigenstates of (S For Stot in the allowed range, we denote by Emin (Stot ) the lowest possible energy among the states which ^tot )2 = Stot (Stot þ 1). satisfy (S
Wave–Particle Dualism in the Hubbard Model It is illuminating to examine the eigenstates of the Hamiltonian [6] for the following two special cases. First suppose that one has Ux = 0 for all x 2 , that is, the model has no interactions. For (i) i = 1, 2, . . . , jj, let = ( (i) be the x )x2 2 C single-electron eigenstate, which is the solution of the Schro¨dinger equation X ðiÞ tx;y ðiÞ for any x 2 ½7 y ¼ i x y2
We order the energy eigenvalues as i iþ1 . By defining creation operator by P the corresponding y ayi, = x2 (i) x cx, , we see that, for any subsets I" , I# 2 {1, 2, . . . , jj} such that jI" j þ jI# j = Ne , the state 0 10 1 Y y Y y I" ;I# ¼@ ai;" A@ ai;" Avac ½8 i2I"
i2I#
is an eigenstate P of H P(with Ux = 0) with the eigenvalue E = i2I" i þ i2I# i . The ground states are obtained by choosing I" , I# which minimize E. In particular, when Ne is even and the single-electron eigenenergies i are nondegenerate, the ground state is unique and written as ! N e =2 Y y y GS ¼ ai;" ai;# vac ½9 i¼1
The fact that this ground state has the minimum possible spin Stot = 0 is known as Pauli paramagnetism. We have seen that the Hamiltonian H with Ux = 0 can be diagonalized by using single-electron
714 Hubbard Model (i) eigenstates . When (tx, y ) has a translation invariance, each (i) behaves as a ‘‘wave.’’ We can say that the noninteracting models can be understood in terms of the wave picture of electrons. Next suppose that tx, y = 0 for all x, y 2 , that is, the electrons do not hop. Then the Hamiltonian [6] is readily diagonalized in terms of the basis state [3], where P the corresponding eigenvalue is simply E = x2" \# Ux . In this case, the model is best understood in the particle picture of electrons. We thus see that the wave–particle dualism manifests itself in the Hubbard model in an essential manner. When both the first and the second terms in the Hamiltonian [6] are present, there takes place a ‘‘competition’’ between wave-like nature and particlelike nature of electrons. The competition generates rich nontrivial phenomena including antiferromagnetism, ferromagnetism, metal–insulator transition, and (probably) superconductivity. To investigate these phenomena is a major motivation in the study of the Hubbard model.
One-Dimensional Model The Hubbard model defined on a simple onedimensional lattice is easier to study. But it does not exhibit truly nontrivial behavior as the following classical theorem of Lieb and Mattis suggests. Theorem 1 Consider a Hubbard model on a onedimensional lattice = {1, 2, . . . , N} with open boundary conditions. We assume that tx, y 6¼ 0 if jx yj = 1, and tx, y = 0 if jx yj > 1. tx, x 2 R and Ux 2 R are arbitrary. Then one has Emin (Stot ) < Emin (Stot þ 1) for any Stot = 0, 1, . . . , Smax 1 (or Stot = 1=2, 3=2, . . . , Smax 1). As a consequence, one finds that the ground states always have Stot = 0 (or Stot = 1=2) as in the noninteracting models. The translation invariant model with tx, y = t if jx yj = 1, tx, y = 0 if jx yj 6¼ 1, and Ux = U can be solved by using the Bethe ansatz, as was first shown by Lieb and Wu. It was found that the model is insulating for all U > 0, and there is no metal– insulator transition. (A metal–insulator transition is expected to take place in higher dimensions.) Earlier works on the Bethe ansatz were based on the assumption that the Bethe ansatz equation gives the true ground states. Recently, the existence and the uniqueness of the Bethe ansatz solution for the ground state of a finite system was proved by Goldbaum.
Half-Filled Systems The system in which the electron number Ne is identical to the number of sites jj is said to be half-filled. Many (but not all) physical systems can be modeled as a half-filled Hubbard model. Based on a heuristic perturbation theory, low-energy properties of half-filled models with large U are expected to be similar to those of Heisenberg antiferromagnetic spin systems. There is no electrical conduction, and the spin degrees of freedom may show antiferromagnetic long-range order in the ground states. This expectation is partly justified by the following theorem due to Lieb. A Hubbard model is said to be bipartite if the lattice can be decomposed into a disjoint union of two sublattices as = A [ B (with A \ B = ;), and it holds that tx, y = 0 whenever x, y 2 A or x, y 2 B. In other words, only hopping between different sublattices is allowed. Theorem 2 Consider a bipartite Hubbard model. We assume jj is even, and the whole is connected through nonvanishing tx, y . We also assume Ux = U > 0 for any x 2 . Then the ground states of the model are nondegenerate apart from the trivial spin degeneracy, and have total spin Stot = jjAj jBjj=2. It also holds that Emin (Stot ) < Emin (Stot þ 1) for any Stot jjAj jBjj=2. The theorem implies that, as far as the total spin is concerned, the half-filled Hubbard model behaves exactly as the Heisenberg antiferromagnet. But the existence of antiferromagnetic ordering has not been proved in any version of the Hubbard model. To see another implication of Theorem 2, take the so-called CuO lattice in Figure 2. Here the A and B sublattices consist of black and white sites, respectively. One has jAj = jj=3 and jBj = 2jj=3. Then the theorem implies that the ground state of the corresponding Hubbard model has total spin
Figure 2 An example (the so-called CuO lattice) of a bipartite lattice in which the numbers of sites in two sublattices are different. Lieb’s theorem implies that the half-filled Hubbard model defined on this lattice exhibits ferrimagnetism.
Hubbard Model
715
Stot = jjAj jBjj=2 = jj=6. Since the total spin magnetic moment of the system is proportional to the number of sites jj, we conclude that the model exhibits ferrimagnetism, a weaker version of ferromagnetism. Another interesting result for the half-filled models is the following uniform density theorem by Lieb, Loss, and McCann. Theorem 3 Consider a bipartite Hubbard model. tx, y 2 R, Ux 2 R are arbitrary. Suppose that the ground states are n-fold degenerate, and let (i) GS (i = 1, . . . , n) be mutually orthogonal normalized ground states. the correlation function D E P Define y y (i) by (x, y) = n1 ni= 1 (i) , (c c þ c c ) y, " y, # x, " x, # GS GS . (h,i is the inner product.) Then for any x, y 2 A or x, y 2 B, one has (x, y) = x, y . It is interesting that the density (x, x) in the ground state is always unity though the hopping matrix and interactions can be highly nonuniform.
Ferromagnetism Ferromagnetism is an interesting phenomenon in which the majority of the spins in the system align in the same direction. One of the original motivations to study the Hubbard model was to understand the origin of ferromagnetism in an idealized situation. Let us recall that neither the hopping term nor the interaction term in the Hamiltonian [6] favors ferromagnetism (or any other magnetic order). One must deal with the interplay between the two terms to have ferromagnetism. Here we review three rigorous examples of saturated ferromagnetism in the Hubbard model. Saturated ferromagnetism is the strongest form of ferromagnetism where the ground state has Stot = Smax . The first example is due to Nagaoka and Thouless. Theorem 4 Take an arbitrary finite lattice , and let Ne = jj 1. Assume that tx, y 0 for any x 6¼ y, and let Ux ! 1 for all x 2 . (Taking the limit Ux ! 1 is equivalent to inhibiting x from being occupied by two electrons.) Then among the ground states of the model, there exist states with total spin Stot = Smax (= Ne =2). If the system further satisfies the connectivity condition (see below), then the ground states have Stot = Smax (= Ne =2) and are nondegenerate apart from the trivial spin degeneracy. The connectivity condition is a simple condition which holds in most of the lattices in two or higher dimensions, including the square lattice, the triangular lattice, or the cubic lattice. To be precise the condition requires that ‘‘by starting from any
Figure 3 The Hubbard model on the kagome´ lattice is a typical example which exhibits flat-band ferromagnetism.
electron configuration on the lattice and by moving around the hole along nonvanishing tx, y , one can get any other electron configuration.’’ The requirements that Ux ! 1 and Ne = jj 1 are indeed rather pathological. We still do not know if the ferromagnetism extends to more realistic situations. Heuristic studies indicate that the issue is highly delicate. A completely different class of rigorous examples of ferromagnetism was found by Mielke. Take, for example, the kagome´ lattice of Figure 3, and define a Hubbard model by setting tx, y = t < 0 when x and y are neighboring, tx, y = 0 otherwise, and Ux = U 0 for any x 2 . Then the corresponding single-electron Schro¨dinger equation [7] has a peculiar feature that its ground states are {(jj=3) þ 1}-fold degenerate. This huge degeneracy corresponds to the fact that the lowest-energy band of the model is completely dispersionless (or flat). Theorem 5 Consider the Hubbard model on the kagome´ lattice with Ne = (jj=3) þ 1. For any U > 0, the ground states have Stot = Smax (= Ne =2) and are nondegenerate apart from the trivial spin degeneracy. There are similar examples in higher dimensions. Ferromagnetism observed in these models is called flat-band ferromagnetism. The above examples of ferromagnetism have either singular interaction (Ux ! 1) or singular dispersion relation (highly degenerate single-electron ground states). Tasaki found a class of Hubbard models which are free from such singularities, and exhibit ferromagnetism. For simplicity, we concentrate on the simplest model in one dimension. There are similar examples in higher dimensions. Take the one-dimensional lattice = {1, 2, . . . , N} with N sites (where N is an even integer), and impose a periodic boundary condition by identifying the site N þ 1 with the site 1. The hopping matrix is defined by setting tx, xþ1 = txþ1, x = t0 for any x 2 , tx, xþ2 = txþ2, x = t for even x, tx, xþ2 = txþ2, x = s for odd x, and tx, y = 0
716 Hyperbolic Billiards
t t′
t t′
t′
t t′
–s
t′
t′
–s
Figure 4 An example of nonsingular Hubbard model which exhibits saturated ferromagnetism.
otherwise. Here t > 0 and s > 0 are independent parameters, but the parameter t0 is determined as pffiffiffi 0 t = 2(t þ s). As can be seen from Figure 4, electrons are allowed to hop to next-nearest neighbors. Thus, Theorem 1 does not apply. The single-electron ground states are not degenerate unless s = 0. We set Ux = U > 0 for any x 2 , and fix the electron number as Ne = N=2. In terms of filling factor, this corresponds to the quarter filling. Theorem 6 Suppose that the two dimensionless parameters t/s and U=s are sufficiently large. Then the ground states have Stot = Smax (= N=4) and are nondegenerate apart from the trivial spin degeneracy. The theorem is valid, for example, when t=s 4.5 if U=s = 50, and t=s 2.6 if U=s = 100. It is crucial that the statement of the theorem is valid only when the interaction U is sufficiently large. In the same model, it is also proved that low-lying excitation above the ground state has a normal dispersion relation of a spin-wave excitation. We would like to point out that one can learn more details about the Hubbard model and further
rigorous results from the review articles (Lieb 1995, Tasaki 1998a, Tasaki 1998b). One can also find references for most of the results discussed here in these review articles, especially in Lieb (1995). As for the latest results which are not included in the above reviews, see recent publications, for example, Lieb and Wu (2003), Tasaki (2003), and Goldbaum (2005), and references therein.
Further Reading Goldbaum PS (2005) Existence of solutions to the Bethe Ansatz Equations for the 1D Hubbard model: finite lattice and thermodynamic limit. Communications in Mathematical Physics 258: 317–337 (cond-mat/0403736). Lieb EH (1995) The Hubbard model – some rigorous results and open problems. In: Iagolnitzer D (ed.) Proceedings of the XIth International Congress of Mathematical Physics, Paris, 1994. pp. 392–412. International Press. cond-mat/9311033. Lieb EH and Wu FY (2003) The one-dimensional Hubbard model: a reminiscence. Physica A 321: 1–27 (cond-mat/ 0207529). Tasaki H (1998a) The Hubbard model – an introduction and selected rigorous results. Journal of Physics: Condensed Matter 10: 4353–4378 (cond-mat/9512169). Tasaki H (1998b) From Nagaoka’s ferromagnetism to flat-band ferromagnetism and beyond – an introduction to ferromagnetism in the Hubbard model. Prog. Theor. Phys. 99: 489–548 (cond-mat/9712219). Tasaki H (2003) Ferromagnetism in the Hubbard model: a constructive approach. Communications in Mathematical Physics 242: 445–472 (cond-mat/0301071).
Hydrodynamic Equations see Interacting Particle Systems and Hydrodynamic Equations
Hyperbolic Billiards Maciej P Wojtkowski, University of Arizona, Tucson, AZ, USA and Institute of Mathematics PAN, Warsaw, Poland ª 2006 Published by Elsevier Ltd.
Introduction Billiards are a class of dynamical systems with appealingly simple description. A point particle moves with constant velocity in a box of arbitrary dimension (the billiard table) and reflects elastically from the boundary (the component of velocity
perpendicular to the boundary is reversed and the parallel component is preserved). Mathematically, it is a class of Hamiltonian systems with collisions defined by symplectic maps on the boundary of the phase space. The billiard dynamics defines a oneparameter group of maps t of the phase space which preserve the Lebesgue measure, and are in general only measurable due to discontinuities. The boundaries of the box are made up of pieces, concave, convex, and flat. Discontinuities occur at the orbits tangent to concave pieces of the boundary of the box. The orbits hitting two adjacent pieces (‘‘corners’’) cannot be naturally
Hyperbolic Billiards
continued, which is another source of discontinuities. These singularities are not too severe so that the flow has well-defined Lyapunov exponents and Pesin structural theory is applicable (Katok and Strelcyn 1986). A billiard system is called hyperbolic if it has nonzero Lyapunov exponents on a subset of positive Lebesgue measure, and completely hyperbolic if all of its Lyapunov exponents are nonzero almost everywhere, except for one zero exponent in the direction of the flow. Billiards in smooth strictly convex domains have no singularities, but no such examples are known to be hyperbolic. In general, billiards exhibit mixed behavior just like other Hamiltonian systems; there are invariant tori intertwined with ‘‘chaotic sea.’’ In hyperbolic billiards, stable behavior is excluded by the choice of the pieces in the boundary of the box, arbitrary concave pieces and special convex ones, and their particular placement. Thus, hyperbolicity is achieved by design, as in optical instruments. It was established by Turaev and Rom-Kedar (1998) that complete hyperbolicity may be lost under generic singular perturbation of the billiard system to a smooth Hamiltonian system. Hyperbolicity is the universal mechanism for random behavior in deterministic dynamical systems. Under suitable additional assumptions, it leads to ergodicity, mixing, K-property, Bernoulli property, decay of correlations, central-limit theorem, and other stochastic properties. Hyperbolic billiards provide a natural class of examples for which these properties were studied. In this article we restrict ourselves to hyperbolicity itself. The most prominent example of a hyperbolic billiard is the gas of hard spheres. This way of looking at the system was developed in the groundbreaking papers of Sinai (see Chernov and Sinai (1987) for an exhaustive list of references). The collection of papers (Sza´sz 2000) contains more up-to-date information. Another source on hyperbolic billiards is the book by Chernov and Markarian (2005). The books by Kozlov and Treschev (1990), and by Tabachnikov (1995) provide broad surveys of billiards from different perspectives.
Jacobi Fields and Monotonicity The key to understanding hyperbolicity in billiards lies in two essentially equivalent descriptions of infinitesimal families of trajectories. The basic notion is that of a Jacobi field along a billiard trajectory. Let (t, u) be a family of billiard trajectories, where t is time and u is a parameter,
717
juj < . A Jacobi field along (t, 0) is defined by J(t) = @=@uju = 0 . Jacobi fields form a finite-dimensional vector space which can be identified with the tangent to the phase space at points along the trajectory. They contain the same information as the derivatives of the billiard flow Dt . In particular, the Lyapunov exponents are the exponential rates of growth of Jacobi fields. Jacobi fields split naturally into parallel and perpendicular components to the trajectory, each of them a Jacobi field in its own right. The parallel Jacobi field carries the zero Lyapunov exponent. In the rest we discuss only the perpendicular Jacobi fields. Between collisions the Jacobi fields satisfy the differential equation J00 = 0, hence J(t) = J(0) þ tJ0 (0). At a collision a Jacobi field undergoes a change by the map Jðtcþ Þ ¼ RJðtc Þ J0 ðtcþ Þ ¼ RJ0 ðtc Þ þ P KPJðtcþ Þ
½1
where J(tc ) and J(tcþ ) are Jacobi fields immediately before and after collision, K is the shape operator of the piece of the boundary (K = rn, n is the inside unit normal to the boundary), and P is the projection along the velocity vector from the hyperplane perpendicular to the orbit to the hyperplane tangent to the boundary. Finally, R is the orthogonal reflection in the hyperplane tangent to the boundary. Perpendicular Jacobi fields at a point of a trajectory can be identified with a subspace of the tangent to the phase space, the subspace perpendicular to the phase trajectory. To measure the growth/decay of Jacobi fields, we introduce a quadratic form on the tangent spaces, or equivalently on Jacobi fields, Q( J, J0 ) = < J, J0> . Evaluation of Q on a Jacobi field is a function of time Q(t). Between collisions we have Q(t2 ) Q(t1 ) for t2 t1 (monotonicity). By [1] the monotonicity at the collisions, that is, Q(tcþ ) Q(tc ) is equivalent to the positive semidefiniteness of the shape operator K 0, it holds for concave pieces of the boundary. If K > 0 at a point of collision with the boundary, then for ( J, J0 ) 6¼ (0, 0), we have Q(t2 ) > Q(t1 ) (strict monotonicity), assuming that the collision occurred between time t1 and t2 . In billiards with concave pieces of the boundary, where K 0, K 6¼ 0, strict monotonicity may still occur after sufficiently many reflections (eventual strict monotonicity, or ESM). Such billiards are called semidispersing, and the gas of hard spheres is an example.
718 Hyperbolic Billiards
The role of monotonicity is revealed in the following: Theorem 1 (Wojtkowski 1991). If a system is eventually strictly monotone (ESM), except on a set of orbits of zero measure, then it is completely hyperbolic. The theorem applies to billiard systems. It can be generalized and applied to other systems, not even Hamiltonian (see Wojtkowski (2001) for precise formulations, references and the history of this idea). The difficulty in applying the above theorem to the gas of hard spheres lies in the gap between monotonicity and strict monotonicity. There are many orbits on which strict monotonicity is never attained (parabolic orbits). Establishing that the family of parabolic orbits has measure zero (or better yet codimension 2) is a formidable task. It was brought to conclusion in the work of Sima´nyi (2002).
Wave Fronts and Monotonicity There is a geometric formulation of monotonicity (which historically preceded the one given above). Let us consider a local wave front, that is, a local hypersurface W(0) perpendicular to a trajectory (t) at t = 0. Let us consider further all billiard trajectories perpendicular to W(0). The points on these trajectories at time t form a local hypersurface W(t) perpendicular again to the trajectory (warning: at exceptional moments of time the wave front W(t) may be singular). Infinitesimally wave fronts are described by the shape operator U = rn, where n is the unit normal field. U is a symmetric operator on the hyperplane tangent to the wave front (and perpendicular to the trajectory (t). The evolution of infinitesimal wave fronts is described by the formulas UðtÞ ¼ ðtI þ Uð0Þ1 Þ1 Uðtcþ Þ ¼ RUðtc ÞR þ P KP
without collisions at a collision
½2
It follows that between collisions a wave front that is initially convex (i.e., diverging, or U > 0) will stay convex. Moreover, any wave front after a sufficiently long run without collisions will become convex (after which the normal curvatures of the wave front will be decaying). The second part of [2] shows that after a reflection in a strictly concave boundary a convex wave front becomes strictly convex (and its normal curvatures increase). These properties are equivalent to (strict) monotonicity as formulated above. Indeed, in the language of Jacobi
fields an infinitesimal wave front represents a linear subspace in the space of perpendicular Jacobi fields, that is, the tangent space. (Furthermore, it is a Lagrangian subspace with respect to the standard symplectic form.) We can follow individual Jacobi fields or whole subspaces of them. It explains the parallel of [1] and [2]. The form Q allows the introduction of positive and negative Jacobi fields and positive and negative Lagrangian subspaces. An infinitesimal convex wave front represents a positive Lagrangian subspace. Monotonicity is equivalent to the property that a positive Lagrangian subspace stays positive under the dynamics (it may appear that there is a loss of information in formulas [2] compared to [1], but actually they are equivalent due to the symplectic nature of the dynamics (Wojtkowski 2001).
Design of Hyperbolic Billiards In view of [2] it seems that a convex piece in the boundary (K < 0) excludes monotonicity. There are two ways around this obstacle. First, we could change the quadratic form Q at the convex boundary. Second, we can treat convex pieces as ‘‘black boxes’’ and look only at incoming and outgoing trajectories. Although the second strategy seems more restrictive, all the examples constructed to date fit the black box scenario, and we will present it in more detail. To understand this approach, let us consider a billiard table with flat pieces of the boundary and exactly one convex piece. A trajectory in such a billiard experiences visits to the convex piece separated by arbitrary long sequences of reflections in flat pieces, which do not affect the geometry of a wave front at all. Hence, whatever is the geometry of a wave front emerging from the curved piece it will become convex and very flat by the time it comes back to the curved piece of the boundary again. Hence, it follows, at least heuristically, that we must study the complete passage through the convex piece of the boundary, regarding its effect on convex, and especially flat, wave fronts. Important difference between convex and concave pieces is that a trajectory has usually several consecutive reflections in the same convex piece; moreover, the number of such reflections is unbounded. A finite billiard trajectory is called ‘‘complete’’ if it contains reflections in one and the same piece of the boundary, and it is preceded and followed by reflections in other pieces. Definition A complete trajectory is (strictly) z-monotone if for every nonzero Jacobi field the
Hyperbolic Billiards
value of the form Q (increases) does not decrease between the point at the distance z before the first reflection and the point at the distance z after the last reflection. A complete trajectory is parabolic if there is a nonzero Jacobi field J such that J0 vanishes before the first and after the last reflection. In the language of wave fronts, a complete trajectory is z-monotone if every diverging wave front at a distance at least z from the first reflection becomes diverging after the last reflection at the distance z, or earlier. It turns out that the only obstruction to monotonicity of complete trajectories is parabolicity. More precisely, if a complete trajectory is not parabolic then it is z-monotone for some z > 0. It follows from Theorem 1 that we get a completely hyperbolic billiard if we put together curved pieces with no complete parabolic trajectories and some flat pieces, in such a way that for every two consecutive complete trajectories, being z1 - and z2 -monotone, respectively, the distance from the last reflection in the first trajectory to the first reflection in the second one is bigger than z1 þ z2 . Indeed, we can put together the midpoints of trajectories leaving one curved piece and hitting another one into the Poincare´ section of the billiard flow and we obtain immediately ESM for the return map. We can formulate somewhat informally two principles for the design of hyperbolic billiards. 1. No parabolic trajectories Convex pieces of the boundary cannot have complete parabolic trajectories. 2. Separation There must be enough separation (in space or in time through reflections in flat pieces) between strictly z-monotone trajectories according to the values of z. All of the examples of hyperbolic billiards constructed up to now are designed according to these principles.
Hyperbolic Billiards in Dimension 2 Checking the absence of parabolic trajectories is nontrivial due to the unbounded number of reflections in complete trajectories close to tangency. It was accomplished so far only in integrable, or near integrable examples, with the exception of convex scattering pieces described in the following. Billiards in dimension 2 are understood best. First of all, there is yet another way of describing infinitesimal families of nearby trajectories. Every
719
infinitesimal family of rays in the plane has a point of focusing (in linear approximation), possibly at infinity. This point of focusing contains the same information as the curvature of a wave front (it is the center of curvature, rather than curvature itself) and it has the advantage that it does not change between collisions. The focusing points before and after a reflection are related by the familiar mirror equation of the geometric optics: 1 1 2 þ ¼ f0 f1 d where f0 , f1 are the signed distances of the points of focusing to the reflection point, d = r cos , r being the radius of curvature of the boundary piece (r > 0 for a strictly convex piece), and the angle of incidence. The mirror equation is just the two dimensional version of [2]. It is instructive to consider an arc of a circle. A billiard in a disk is integrable due to its rotational symmetry. Let J be a Jacobi field obtained by rotation of a trajectory. This family of trajectories (‘‘the rotational family’’) is focused exactly in the middle between two consecutive reflections (that is where J vanishes). It follows further from the mirror equation that a parallel family of orbits is focused at a distance d=2 after the reflection, and any family focusing somewhere between the parallel family and the rotational family will focus at a distance somewhere between d=2 and d, not only after the first reflection, but also after arbitrary long sequence of reflections. Hence, any complete trajectory in an arc of a circle is z-monotone, where 2z is the length of a single segment of the trajectory and strictly z0 -monotone for any z0 > z. Two arcs of a circle separated by parallel segments form the stadium of Bunimovich (1979). Lazutkin (1973) showed that billiards in smooth strictly convex domains are near integrable near the boundary. Donnay (1991) applied Lazutkin’s coordinates to establish that for an arbitrary strictly convex arc the situation near the boundary is similar to that in a circle, that is, complete trajectories near tangency are z-monotone, where z is of the order of the length of a single segment. In particular, no near tangent complete trajectory can be parabolic. Hence, this crucial calculation shows that if a strictly convex arc has no parabolic trajectories then any sufficiently small perturbation also has no parabolic trajectories. It follows further that any sufficiently small piece of a given strictly convex arc has no parabolic trajectories. It turns out that in dimension 2, complete parabolic trajectories are also z-monotone for some
720 Hyperbolic Billiards
z > 0 (but clearly not strictly monotone) (Wojtkowski 2005). However, they are still an obstacle to complete hyperbolicity because in general nearby complete trajectories are z-monotone without a bound for the values of z, so that no separation of convex pieces is sufficient. Integrability of the elliptic billiard allows one to establish strict monotonicity of trajectories in the semi-ellipse with endpoints on the longer axis, Wojtkowski 1986. Donnay (1991) showed that also the semi-ellipse with endpoints on the shorter axis has no parabolic trajectories provided that the pffiffiffi eccentricity is less than 2 =2. As the eccentricity pffiffiffi goes to 2=2 the separation required to produce a hyperbolic billiard goes to infinity. Markarian et al. (1996) obtained explicitly the separation of the elliptic pieces needed for hyperbolicity, when the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi eccentricity is smaller than 2 2=2. It follows from the mirror equation that a trajectory with one reflection in a convex piece is always strictly z-monotone for z > d. Hence, if for any two consecutive reflections in convex pieces with respective values of d equal to d1 and d2 , the distance between reflections exceeds d1 þ d2 , then the billiard is completely hyperbolic. For one convex piece this condition, called convex scattering, turns out to be equivalent to d2 r=ds2 < 0, where s is the arc length (Wojtkowski 1986). This leads to examples of hyperbolic billiards with one convex piece of the boundary, like the domain bounded by the cardioid. Also, any complete trajectory in a convex scattering piece is strictly z-monotone for z bigger than the maximum of the values of d for the first and the last segment of the trajectory. This allows to find easily the explicit separation of convex scattering pieces guaranteeing hyperbolicity.
Hyperbolic Billiards in Higher Dimensions In higher dimensions, only two constructions of hyperbolic billiards with convex pieces in the boundary are known. The first construction by Bunimovich (1988), involves a piece of a sphere whose angular size, as seen from the center, does not exceed =2 (Wojtkowski 1990, 2005, Bunimovich and Rehacek 1998). The second construction by Papenbrock (2000) uses two cylinders, at 90 with respect to each other to destroy integrability (Wojtkowski 2005). In both cases, the successful treatment is based on integrability of the billiard systems bounded by a sphere or a cylinder. In both of these constructions, trajectories need to be cut into strictly monotone pieces of unbounded lengths. In the case of spherical caps, complete
trajectories are z-monotone with unbounded value of z and the geometry of the billiard table is used to separate them in time by sufficiently many reflections in flat pieces of the boundary (Wojtkowski 2005). In the case of cylinders, trajectories are cut by consecutive returns to a Poincare´ section in the middle of the billiard table.
Soft Billiards The same ideas of monotonicity and strict monotonicity are applicable to soft billiards, where specular reflections are replaced by scatterers in which the point particle is subjected to the action of a spherically symmetric potential. As in ordinary billiards, we compare the wave fronts along trajectories before entering and after leaving scatterers. Again, in the absence of parabolic trajectories sufficient separation of the scatterers produces a completely hyperbolic system. The conditions on the potential that guarantee the absence of parabolic trajectories were obtained by Donnay and Liverani (1991) in the two-dimensional case and by Ba´lint and To´th (2006) in higher dimensions. The complete integrability of the motion of a point particle in a spherically symmetric potential is crucial in the derivation of these conditions (Wojtkowski 2005). See also: Billiards in Bounded Convex Domains; Ergodic Theory; Hamiltonian Systems: Stability and Instability Theory; Hyperbolic Dynamical Systems; Polygonal Billiards; Random Matrix Theory in Physics.
Further Reading Ba´lint P and To´th IP (2006) Hyperbolicity in multi-dimensional Hamiltonian systems with applications to soft billiards. Nonlinearity (to appear). Bunimovich LA (1979) On the ergodic properties of nowhere dispersing billiards. Communications in Mathematical Physics 65: 295–312. Bunimovich LA (1988) Many-dimensional nowhere dispersing billiards with chaotic behavior. Physica D 33: 58–64. Bunimovich LA and Rehacek J (1998) How high-dimensional stadia look like. Communications in Mathematical Physics 197: 277–301. Chernov NI and Sinai YaG (1987) Ergodic properties of some systems of 2-dimensional disks and 3-dimensional spheres. Russian Mathematical Surveys 42: 181–207. Chernov NI and Markarian R (2005) Billiards. Providence, RI: American Mathematical Society. Donnay V (1991) Using integrability to produce chaos: billiards with positive entropy. Communications in Mathematical Physics 141: 225–257. Donnay V and Liverani C (1991) Potentials on the two-torus for which the Hamiltonian flow is ergodic. Communications in Mathematical Physics 135: 267–302.
Hyperbolic Dynamical Systems 721 Katok A and Strelcyn JM (1986) (with the collaboration of F. Ledrappier and F. Przytycki) Invariant Manifolds, Entropy and Billiards; Smooth Maps with Singularities, Lecture Notes in Mathematics, 1222. Springer. Kozlov VV and Treschev DV (1990) Billiards. A Genetic Introduction to the Dynamics of Systems with Impacts. Providence, RI: American Mathematical Society. Lazutkin VF (1973) On the existence of caustics for the billiard ball problem in a convex domain. Mathematics of the USSRIzvestiya 7: 185–215. Markarian R, Oliffson Kamphorst S, and Pinto de Carvalho S (1996) Chaotic properties of the elliptical stadium. Communications in Mathematical Physics 174: 661–679. Papenbrock T (2000) Numerical study of a three dimensional generalized stadium billiard. Physical Review E 61: 4626–4628. Sima´nyi N (2002) The complete hyperbolicity of cylindric billiards. Ergodic Theory and Dynamical Systems 22: 281–302. Sza´sz D (ed.) (2000) Hard Ball Systems and the Lorentz Gas. Encyclopaedia of Mathematical Sciences, 101. Berlin: Springer.
Tabachnikov S (1995) Billiards. Soc. Math. Paris, France. Turaev D and Rom-Kedar V (1998) Elliptic islands appearing in near-ergodic flows. Nonlinearity 11: 575–600. Wojtkowski MP (1986) Principles for the design of billiards with nonvanishing Lyapunov exponents. Communications in Mathematical Physics 105: 391–414. Wojtkowski MP (1990) Linearly stable orbits in 3-dimensional billiards. Communications in Mathematical Physics 129: 319–327. Wojtkowski MP (1991) Systems of classical interacting particles with nonvanishing Lyapunov exponents. In: Arnol’d L, Crauel H, and Eckmann J-P (eds.) Lyapunov Exponents, Proceedings, Oberwolfach 1990, Lecture Notes in Mathematics, 1486. 243–262. Wojtkowski MP (2001) Monotonicity, J-algebra of Potapov and Lyapunov exponents. In: Smooth Ergodic Theory and Its Applications, pp. 499–521. Proc. Symp. Pure Math. American Mathematical Society. Wojtkowski MP (2005) Design of hyperbolic billiards. Preprint.
Hyperbolic Dynamical Systems B Hasselblatt, Tufts University, Medford, MA, USA ª 2006 Elsevier Ltd. All rights reserved.
Introduction Division of Smooth Dynamical Systems
Linear maps can be elliptic (complex diagonalizable with all eigenvalues on the unit circle), parabolic (all eigenvalues on the unit circle but some Jordan blocks of size at least 2), or hyperbolic (no eigenvalues on the unit circle), and for differentiable dynamical systems, that is, smooth maps or flows, one can roughly make an analogous subdivision (see Hasselblatt and Katok 2002, p. 100f). The linear maps not covered by these alternatives are those with some eigenvalues on the unit circle and others off it; the corresponding class of ‘‘partially hyperbolic’’ dynamical systems is usually considered in the context of hyperbolic dynamical systems with a view to studying phenomena wherein the hyperbolic behavior dominates. Thus, elliptic dynamical systems are more or less similar to isometries, with orbit separation constant or at most oscillatory but without persistent growth. KAM theory deals with elliptic systems, establishing that much of the ellipticity in an integrable Hamiltonian system persists under perturbation. Parabolic systems may have polynomial orbit separation produced by a local ‘‘shear’’ phenomenon; billiards in polygonal domains are an example of this. Hyperbolic dynamical systems are characterized by exponential divergence of orbits. They are of interest because of the complexity
of their orbit structure with respect to both topological and statistical behavior. Specifically, the stretching (corresponding to eigenvalues outside the unit circle in the case of linear maps) combined with the folding necessitated by compactness of the phase space produces not only highly sensitive dependence of orbit asymptotics on initial conditions, but also a close intertwining of different behaviors. On the one hand, there is a dense set of periodic points, on the other hand, an abundance of dense orbits. While there are only finitely many periodic points of a given period, their number grows exponentially as a function of the period. The entropy of these systems is positive, which indicates that the overall complexity of the orbit structure grows exponentially as a function of the length of time for which it is being tracked. In effect, the behavior of orbits is so intricate as to be quasirandom, which makes it natural to use statistical methods to describe these systems. History of Hyperbolic Dynamical Systems
One strand of the history of hyperbolic dynamical systems leads back to the question of the stability of the solar system and to Poincare´, in whose prize memoir on the three-body problem the possibility of ‘‘homoclinic tangles’’ first presented itself. For Poincare´, this was important because the resulting complexity demonstrates that this system is not integrable. We describe below how hyperbolic dynamics arises in this situation (see Figure 3).
722 Hyperbolic Dynamical Systems
Another strand emerged about a decade later with Hadamard’s study of geodesic flows (free particle motion) on negatively curved surfaces. Hadamard noted that these exhibit the kind of sensitive dependence on initial conditions as well as the pseudorandom behavior that are central features of hyperbolic dynamics. This subject was developed much further after the advent of ergodic theory, with the Boltzmann ergodic hypothesis as an important motivation: work by numerous mathematicians, principally Hedlund and Hopf, showed that free particle motion on a negatively curved surface provides examples of ergodic mechanical systems. More than two decades later, in the 1960s, Anosov and Sinai overcame a fundamental technical hurdle and established that this is indeed the case in arbitrary dimension. This was done in the more general context of a class of dynamical systems known now as Anosov systems, which were axiomatically defined and systematically studied for the first time during this period of research in Moscow. A greater class of dynamical systems exhibiting chaotic behavior was introduced by Smale in his seminal 1967 paper under the name of Axiom-A systems. This class includes the hyperbolic dynamics arising from homoclinic tangles, see Figure 3 (see Homoclinic Phenomena). Smale’s motivation was his program of classifying dynamical systems under topological conjugacy, and the consequent search for structurally stable systems. Today, AxiomA (and Anosov) systems are valued as idealized models of chaos: while the conditions defining Axiom A are too stringent to include many real-life examples, it is recognized that they have features shared in various forms by most chaotic systems. Here, we concentrate on the discrete-time context to keep notations lighter. Partial hyperbolicity was introduced in the 1970s and has proved that a limited amount of hyperbolicity in a dynamical system can produce much of the global complexity (such as ergodicity or the presence of dense orbits) exhibited by hyperbolic systems, and can do so in a robust way. Here one imposes uniform conditions, but expansion and contraction are not assumed to occur in all directions. Stable ergodicity has been an important subject of research in the last decade. Nonuniform hyperbolicity weakens hyperbolicity by allowing the contraction and expansion rates to be nonuniform. This was motivated by examples of systems with hyperbolicity where expansion or contraction can be arbitrarily weak or absent in places, such as the He´non attractor, and by situations where hyperbolicity coexists with singularities, such as for (semi)dispersing billiards (see Hyperbolic Billiards).
With respect to both uniformly and nonuniformly hyperbolic systems, dimension theory has been a subject of much interest (computations and estimates of the fractal dimension of attractors and hyperbolic sets, which is deeply connected to dynamical properties of the system). A different weakening of hyperbolicity, the presence of a dominated splitting, has been of interest from the a viewpoint to stability and classification of diffeomorphisms. The study of hyperbolic dynamics has always had interactions with other sciences and other areas of mathematics. In the natural and social sciences, this is the study of chaotic motions of just about any kind. Examples of applications in related areas of mathematics are geometric rigidity (an interaction with differential geometry) and rigidity of group actions.
Uniformly Hyperbolic Dynamical Systems Definitions
Let f be a smooth invertible map. A compact invariant set of f is said to be ‘‘hyperbolic’’ if at every point in this set, the tangent space splits into a direct sum of two subspaces Eu and Es with the property that these subspaces are invariant under the differential df, that is, df (x)Eu (x) = Eu (f (x)), df (x)Es (x) = Eu (f (x)), and that df expands vectors in Eu and contracts vectors in Es , that is, there are constants 0 < < 1 < , c > 0 such that if v 2 Es (x) for some x, then kdf n vk cn kvk for n = 1, 2, . . . , and if v 2 Eu (x) for some x, then kdf n vk cn kvk for n = 1, 2, . . . . If Eu = {0} in the definition above, then the invariant set is made up of attracting fixed points or periodic orbits. Similarly, if Es = {0}, then the orbits are repelling. If neither subspace is trivial, then the behavior is locally ‘‘saddle-like,’’ that is to say, relative to the orbit of a point x, most nearby orbits diverge exponentially fast in both forward and backward time. This is why hyperbolicity is a mathematical notion of chaos. An Anosov diffeomorphism is a smooth invertible map of a compact manifold with the property that the entire space is a hyperbolic set. Axiom A, which is a larger class, focuses on the part of the system that is not transient. More precisely, a point x in the phase space is said to be ‘‘nonwandering’’ if every neighborhood U of x contains an orbit that returns to U. A map is said to satisfy Axiom A if its nonwandering set is hyperbolic and contains a dense set of periodic points.
Hyperbolic Dynamical Systems 723
Definitions in the continuous-time case are analogous: f above is replaced by the time-t-maps of the flow, and the tangent spaces now decompose into Eu E0 Es where E0 , which is one dimensional, represents the direction of the flow lines. A geometric way of detecting (indeed, defining) hyperbolicity is via the cone criterion: at every point there is a cone that is mapped by the differential into the interior of the corresponding cone at the image point, and a ‘‘complementary’’ cone family behaves similarly for the inverse. Many continuous structures associated with a hyperbolic dynamical system are, in fact, Ho¨lder continuous. (For a function g on a metric space this is defined as the existence of C, > 0 such that d(g(x), g(y)) Cd(x, y) whenever x, y are sufficiently close to each other.) In the present article, almost every assertion of continuity could be replaced by one of Ho¨lder continuity. This notion is natural in this context because xn ! y exponentially fast implies that g(xn ) ! g(y) exponentially fast if g is Ho¨lder continuous.
Structure and Properties Stable and Unstable Manifolds, Local Product Structure
Anosov and Axiom-A systems are defined by the behavior of the differential. Corresponding to the linear structures left invariant by df are nonlinear structures, namely ‘‘stable manifolds’’ tangent to Es and ‘‘unstable manifolds’’ tangent to Eu . Thus, associated with an Anosov map are two families of invariant manifolds, each one of which fills up the entire phase space; they are sometimes called the stable and unstable ‘‘foliations.’’ The leaves of these foliations are transverse at each point, that is, they intersect at positive angles, forming a kind of (topological) coordinate system. The map f expands distances along the leaves of one of these foliations and contracts distances along the leaves of the other. For Axiom-A systems, one has a similar local product structure or ‘‘coordinate system’’ at each point in the nonwandering set, but the picture is local, and there are gaps: the stable and unstable leaves do not necessarily fill out open sets in the phase space. There is much interest in determining the fractal dimension (box-counting or Hausdorff, say) of hyperbolic sets. So far the best dimension estimates have been made for stable slices, that is, for the intersection of a stable leaf with the hyperbolic set, and for unstable slices. Because the local coordinate systems describing the local product structure are
only known to be continuous, it is not known in general whether the sum of these stable and unstable dimensions gives the dimension of the hyperbolic set (we don’t even know whether all stable slices have the same fractal dimension). The problem is that an -Ho¨lder-continuous map can change dimensions by a factor of or 1=. But there is evidence to suggest that something like this ‘‘dimension product structure’’ may often be true – this has been established for a class of solenoids.
Transitivity and Spectral Decomposition
In addition to these local structures, Axiom-A systems have a global structure theorem known as ‘‘spectral decomposition.’’ It says that the nonwandering set of every Axiom-A map can be written as X1 [ [ Xr where the Xi are disjoint closed invariant sets on which f is topologically transitive, that is, has a dense orbit. The Xi are called ‘‘basic sets.’’ Each SXi can be decomposed further into a finite union Xi, j , where each Xi, j is invariant and topologically mixing under some iterate of f. (Topological transitivity and mixing are irreducibility conditions; transitivity means that there is no proper open invariant subset, and topological mixing says that given two open sets, from some time onward the images of one will always intersect the other.) This decomposition is reminiscent of the corresponding result for finitestate Markov chains.
Stability
One of the reasons why hyperbolic sets are important is their ‘‘robustness’’: they cannot be perturbed away. More precisely, let f be a map with a hyperbolic set which is locally maximal, that is, it is the largest invariant set in some neighborhood U. Then for every map g that is C1 -near f, the largest invariant set 0 of g in U is again hyperbolic; moreover, f restricted to is ‘‘topologically conjugate’’ to g restricted to 0 . This is mathematical shorthand for saying that not only are the two sets and 0 topologically indistinguishable, but the orbit structure of f on is indistinguishable from that of g on 0 . The phenomenon above brings us to the idea of ‘‘structural stability.’’ A map f is said to be structurally stable if every map g C1 -near f is topologically conjugate to f (on the entire phase space). It turns out that a map is structurally stable if and only if it satisfies Axiom A and an additional condition called strong transversality.
724 Hyperbolic Dynamical Systems Chains and Shadowing
We discuss next the idea of pseudo-orbits versus real orbits. Letting d( , ) be the metric, a sequence of points x0 , x1 , x2 , . . . in the phase space is called an ‘‘"-pseudo-orbit’’ or a ‘‘chain’’ of f if d(f (xi ), xiþ1 ) < " for every i. Computer-generated orbits, for example, are pseudo-orbits due to round-off errors. A fact of consequence to people performing numerical experiments is that in hyperbolic systems, small errors at each step get magnified exponentially fast. For example, if the expansion rate is 3 or more, then an "-error made at one step is at least tripled at each subsequent step, that is, after only O(j log "j) iterates, the error is O(1), and the pseudo-orbit bears no relation to the real one. There is, however, a theorem that says that every pseudo-orbit is ‘‘shadowed’’ by a real one. More precisely, given a hyperbolic set, there is a constant C such that if x0 , x1 , x2 , . . . is an "-pseudo-orbit, then there is a phase point z such that d(xi , f i (z)) < C" for all i. Thus, paradoxical as it may first seem, this result asserts that on hyperbolic sets, each pseudo-orbit approximates a real orbit, even though it may deviate considerably from the one with the same initial condition. The shadowing orbit corresponding to a biinfinite pseudo-orbit is, in fact, unique. From this, one deduces easily the following Closing Lemma: For any hyperbolic set, there is a constant C such that the following holds: Every finite orbit segment x, f (x), . . . , f n1 (x) that nearly closes up, that is, d(x, f n1 (x)) < " for some small ", lies within
View more...
Comments